Veritas Cluster Server-VCS

Veritas Cluster Server (VCS) HOWTO:===================================$Id: VCS-HOWTO,v 1.25 2002/09/30 20:05:38 pzi Exp $Copyright (c) Peter Ziobrzynski, [email protected]

Contents:---------- Copyright- Thanks- Overview- VCS installation- Summary of cluster queries- Summary of basic cluster operations- Changing cluster configuration- Configuration of a test group and test resource type- Installation of a test agent for a test resource- Home directories service group configuration- NIS service groups configuration- Time synchronization services - ClearCase configuration

Copyright:----------

This HOWTO document may be reproduced and distributed in whole or inpart, in any medium physical or electronic, as long as this copyrightnotice is retained on all copies. Commercial redistribution is allowedand encouraged; however, the author would like to be notified of anysuch distributions.

All translations, derivative works, or aggregate works incorporatingany this HOWTO document must be covered under this copyright notice.That is, you may not produce a derivative work from a HOWTO and imposeadditional restrictions on its distribution. Exceptions to these rulesmay be granted under certain conditions.

In short, I wish to promote dissemination of this information throughas many channels as possible. However, I do wish to retain copyrighton this HOWTO document, and would like to be notified of any plans toredistribute the HOWTO.

If you have questions, please contact me: Peter Ziobrzynski

Thanks:-------

- Veritas Software provided numerous consultations that lead to thecluster configuration described in this document.

- Parts of this document are based on the work I have done forKestrel Solutions, Inc.

- Basis Inc. for assisting in selecting hardware components and helpin resolving installation problems.

- comp.sys.sun.admin Usenet community.

Overview:---------

This document describes the configuration of a two or more node SolarisCluster using Veritas Cluster Server VCS 1.1.2 on Solaris 2.6. Numberof standard UNIX services are configured as Cluster Service Groups:user home directories, NIS naming services, time synchronization (NTP).In addition a popular Software Configuration Management system fromRational - ClearCase is configured as a set of cluster service groups.

Configuration of various software components in the formof a cluster Service Group allows for high availability of the applicationas well as load balancing (fail-over or switch-over). Beside that clusterconfiguration allows to free a node in the network for upgrades, testingor reconfiguration and then bring it back to service very quickly withlittle or no additional work.

- Cluster topology.

The cluster topology used here is called clustered pairs. Two nodesshare disk on a single shared SCSI bus. Both computers and the disk are connected in a chain on a SCSI bus. Both differential or fast-wideSCSI buses can be used. Each SCSI host adapter in each node is assigneddifferent SCSI id (called initiator id) so both computers can coexiston the same bus.

+ Two Node Cluster with single disk:

Node Node| / | /| /| /|/ Disk

A single shared disk can be replaced by two disks each on its privateSCSI bus connecting both cluster nodes. This allows for disk mirroringacross disks and SCSI buses.Note: the disk here can be understood as disk array or a disk pack.

+ Two Node Cluster with disk pair:

Node Node|\ /|| \ / || \ || / \ ||/ \| Disk Disk

Single pair can be extended by chaining additional node and connectingit to the pair by additional disks and SCSI buses. One or more nodescan be added creating N node configuration. The perimeter nodes havetwo SCSI host adapters while the middle nodes have four.

+ Three Node Cluster:

Node Node Node|\ /| |\ /|| \ / | | \ / || \ | | \ || / \ | | / \ ||/ \| |/ \| Disk Disk Disk Disk

+ N Node Cluster:

Node Node Node Node|\ /| |\ /|\ /|| \ / | | \ / | \ / || \ | | \ | ...\ || / \ | | / \ | / \ ||/ \| |/ \|/ \| Disk Disk Disk Disk Disk

- Disk configuration.

Management of the shared storage of the cluster is performed with theVeritas Volume Manager (VM). The VM controls which disks on the sharedSCSI bus are assigned (owned) to which system. In Volume Manager disksare grouped into disk groups and as a group can be assigned for accessfrom one of the systems. The assignment can be changed quickly allowingfor cluster fail/switch-over. Disks that compose disk group can bescattered across multiple disk enclosures (packs, arrays) and SCSIbuses. We used this feature to create disk groups that contains VMvolumes mirrored across devices. Below is a schematics of 3 clusternodes connected by SCSI busses to 4 disk packs (we use Sun Multipacks).

The Node 0 is connected to Disk Pack 0 and Node 1 on one SCSI bus andto Disk Pack 1 and Node 1 on second SCSI bus. Disks 0 in Pack 0 and 1are put into Disk group 0, disks 1 in Pack 0 and 1 are put into Diskgroup 1 and so on for all the disks in the Packs. We have 4 9 GB disksin each Pack so we have 4 Disk groups between Node 0 and 1 that can beswitched from one node to the other.

Node 1 is interfacing the the Node 2 in the same way as with the Node 0.Two disk packs Pack 2 and Pack 3 are configured with disk groups 4, 5,6 and 7 as a shared storage between the nodes. We have a total of 8 diskgroups in the cluster. Groups 0-3 can be visible from Node 0 or 1 andgroups 4-7 from Node 1 and 2. Node 1 is in a privileged situation andcan access all disk groups.

Node 0 Node 1 Node 2 ... Node N------- ------------------- ------|\ /| |\ /|| \ / | | \ / || \ / | | \ / || \ / | | \ / || \ / | | \ / || \ / | | \ / || \ / | | \ / || \ | | \ || / \ | | / \ || / \ | | / \ || / \ | | / \ || / \ | | / \ || / \ | | / \ || / \ | | / \ ||/ \| |/ \|Disk Pack 0:Disk Pack 1:Disk Pack 2:Disk Pack 3:

Disk group 0:Disk group 4:+----------------------++------------------------+| Disk0Disk0 || Disk0Disk0 |+----------------------++------------------------+Disk group 1:Disk group 5:+----------------------++------------------------+| Disk1Disk1 || Disk1Disk1 |+----------------------++------------------------+Disk group 2:Disk group 6:+----------------------++------------------------+| Disk2Disk2 || Disk2Disk2 |+----------------------++------------------------+Disk group 3:Disk group 7:+----------------------++------------------------+| Disk3Disk3 || Disk3Disk3 |+----------------------++------------------------+

- Hardware details:

Below is a detailed listing of the hardware configuration of twonodes. Sun part numbers are included so you can order it directlyform Sunstore and put it on your Visa:

- E250:+ Base: A26-AA+ 2xCPU: X1194A+ 2x256MB RAM: X7004A,+ 4xUltraSCSI 9.1GB hard drive: X5234A+ 100BaseT Fast/Wide UltraSCSI PCI adapter: X1032A+ Quad Fastethernet controller PCI adapter: X1034A

- MultiPack:+ 4x9.1GB 10000RPM disk + Storedge Mulitpack: SG-XDSK040C-36G

- Connections:

+ SCSI:E250: E250:X1032A-------SCSI----->MultipackMultipackHUBHUBHUBHUB /dev/null 2>&1 ;;

- Add entry to your c-shell environment:

set vcs = /opt/VRTSvcssetenv MANPATH ${MANPATH}:$vcs/manset path = ( $vcs/bin $path )

- To remove the VCS software:NOTE: required if demo installation fails.

# sh /opt/VRTSvcs/wizards/config/quick_start -b# rsh bar_c 'sh /opt/VRTSvcs/wizards/config/quick_start -b'# pkgrm VRTScsga VRTSgab VRTSllt VRTSperl VRTSvcs VRTSvcswz clsp# rm -rf /etc/VRTSvcs /var/VRTSvcs# init 6

- Configure /.rhosts on both nodes to allow each node transparent rshroot access to the other:

/.rhosts:

foo_cbar_c

- Run quick start script from one of the nodes:NOTE: must run from /usr/openwin/bin/xterm - other xterms cause terminalemulation problems

# /usr/openwin/bin/xterm sh /opt/VRTSvcs/wizards/config/quick_start

Select hme0 and qfe0 network links for GAB and LLT connections. The script will ask twice for the links interface names. Link 1 is hme0and link2 is qfe0 for both foo_c and bar_c nodes.

You should see the heartbeat pings on the interconnection hubs.

The wizard creates LLT and GAB configuration files in /etc/llttab,/etc/gabtab and llthosts on each system:

On foo_c:

/etc/llttab:

set-node foo_clink hme0 /dev/hme:0link qfe1 /dev/qfe:1start

On bar_c:

/etc/llttab:

set-node bar_clink hme0 /dev/hme:0link qfe1 /dev/qfe:1start

/etc/gabtab:

/sbin/gabconfig -c -n2

/etc/llthosts:

0 foo_c1 bar_c

The LLT and GAB communication is started by rc scripts S70llt and S92gabinstalled in /etc/rc2.d.

- We can configure private interconnect by hand creating above files.

- Check basic installation:

+ status of the gab:

# gabconfig -a

GAB Port Memberships===============================================================Port a gen 1e4c0001 membership 01 Port h gen dd080001 membership 01

+ status of the link:

# lltstat -n

LLT node information:Node State Links * 0 foo_c OPEN 2 1 bar_c OPEN 2

+ node parameters:

# hasys -display

- Set/update VCS super user password:

+ add root user:

# haconf -makerw# hauser -add rootpassword:...# haconf -dump -makero

+ change root password:

# haconf -makerw# hauser -update rootpassword:...# haconf -dump -makero

- Configure demo NFS service groups:

NOTE: You have to fix the VCS wizards first: The wizard perl scriptshave a bug that makes the core dump in the middle of filling outconfiguration forms. The solution is to provide shell wrapper for onebinary and avoid running it with specific set of parameters. Do thefollowing in VCS-1.1.2 :

# cd /opt/VRTSvcs/bin # mkdir tmp# mv iou tmp# cat iou#!/bin/shecho "[$@]" >> /tmp/,.iou.logcase "$@" in'-c 20 9 -g 2 2 3 -l 0 3') echo "skip bug" >> /tmp/,.iou.log ;;*) /opt/VRTSvcs/bin/tmp/iou "$@" ;;esacEOF# chmod 755 iou

+ Create NFS mount point directories on both systems:

# mkdir /export1 /export2

+ Run the wizard on foo_c node:

NOTE: must run from /usr/openwin/bin/xterm - other xterms causeterminal emulation problems

# /usr/openwin/bin/xterm sh /opt/VRTSvcs/wizards/services/quick_nfs

Select for groupx:- public network device: qfe2- group name: groupx- IP: 192.168.1.53- VM disk group: cluster1- volume: vol01- mount point: /export1- options: rw- file system: vxfs

Select for groupy:- public network device: qfe2- group name: groupy- IP: 192.168.1.54- VM disk group: cluster2- volume: vol01- mount point: /export2- options: rw- file system: vxfs

You should see: Congratulations!...

The /etc/VRTSvcs/conf/config directory should have main.cf andtypes.cf files configured.

+ Reboot both systems:

# init 6

Summary of cluster queries:----------------------------

- Cluster queries:

+ list cluster status summary:

# hastatus -summary

-- SYSTEM STATE-- System State Frozen

A foo_c RUNNING 0 A bar_c RUNNING 0

-- GROUP STATE-- Group System Probed AutoDisabled State

B groupx foo_c Y N ONLINE B groupx bar_c Y N OFFLINE B groupy foo_c Y N OFFLINE B groupy bar_c Y N ONLINE

+ list cluster attributes:

# haclus -display#Attribute ValueClusterName my_vcsCompareRSM 0CounterInterval 5DumpingMembership 0Factor runque 5 memory 1 disk 10 cpu 25 network 5GlobalCounter 16862GroupLimit 200LinkMonitoring 0LoadSampling 0LogSize 33554432MajorVersion 1MaxFactor runque 100 memory 10 disk 100 cpu 100 network 100MinorVersion 10PrintMsg 0ReadOnly 1ResourceLimit 5000SourceFile ./main.cfTypeLimit 100UserNames root cDgqS68RlRP4k

- Resource queries:

+ list resources:

# hares -listcluster1 foo_ccluster1 bar_cIP_192_168_1_53 foo_cIP_192_168_1_53 bar_c...

+ list resource dependencies:

# hares -dep#Group Parent Childgroupx IP_192_168_1_53 groupx_qfe1groupx IP_192_168_1_53 nfs_export1groupx export1 cluster1_vol01groupx nfs_export1 NFS_groupx_16groupx nfs_export1 export1groupx cluster1_vol01 cluster1groupy IP_192_168_1_54 groupy_qfe1groupy IP_192_168_1_54 nfs_export2groupy export2 cluster2_vol01groupy nfs_export2 NFS_groupy_16groupy nfs_export2 export2groupy cluster2_v cluster2

+ list attributes of a resource:# hares -display export1 #Resource Attribute System Valueexport1 ConfidenceLevel foo_c 100export1 ConfidenceLevel bar_c 0export1 Probed foo_c 1export1 Probed bar_c 1export1 State foo_c ONLINEexport1 State bar_c OFFLINEexport1 ArgListValues foo_c /export1 /dev/vx/dsk/cluster1/vol01 vxfs rw ""...

- Groups queries:

+ list groups:

# hagrp -listgroupx foo_cgroupx bar_cgroupy foo_cgroupy bar_c

+ list group resources:

# hagrp -resources groupxcluster1IP_192_168_1_53export1NFS_groupx_16groupx_qfe1nfs_export1cluster1_vol01

+ list group dependencies:

# hagrp -dep groupx

+ list of group attributes:

# hagrp -display groupx#Group Attribute System Valuegroupx AutoFailOver global 1groupx AutoStart global 1groupx AutoStartList global foo_cgroupx FailOverPolicy global Prioritygroupx Frozen global 0groupx IntentOnline global 1groupx ManualOps global 1groupx OnlineRetryInterval global 0groupx OnlineRetryLimit global 0groupx Parallel global 0groupx PreOnline global 0groupx PrintTree global 1groupx SourceFile global ./main.cfgroupx SystemList global foo_c 0 bar_c 1groupx SystemZones global groupx TFrozen global 0groupx TriggerEvent global 1groupx UserIntGlobal global 0groupx UserStrGlobal global groupx AutoDisabled foo_c 0groupx AutoDisabled bar_c 0groupx Enabled foo_c 1groupx Enabled bar_c 1groupx ProbesPending foo_c 0groupx ProbesPending bar_c 0groupx State foo_c |ONLINE|groupx State bar_c |OFFLINE|groupx UserIntLocal foo_c 0groupx UserIntLocal bar_c 0groupx UserStrLocal foo_c groupx UserStrLocal bar_c

- Node queries:

+ list nodes in the cluster:

# hasys -listfoo_cbar_c

+ list node attributes:

# hasys -display bar_c#System Attribute Valuebar_c AgentsStopped 1bar_c ConfigBlockCount 54bar_c ConfigCheckSum 48400bar_c ConfigDiskState CURRENTbar_c ConfigFile /etc/VRTSvcs/conf/configbar_c ConfigInfoCnt 0bar_c ConfigModDate Wed Mar 29 13:46:19 2000bar_c DiskHbDown bar_c Frozen 0bar_c GUIIPAddr bar_c LinkHbDown bar_c Load 0bar_c LoadRaw runque 0 memory 0 disk 0 cpu 0 network 0bar_c MajorVersion 1bar_c MinorVersion 10bar_c NodeId 1bar_c OnGrpCnt 1bar_c SourceFile ./main.cfbar_c SysName bar_cbar_c SysState RUNNINGbar_c TFrozen 0bar_c UserInt 0bar_c UserStr

- Resource types queries:

+ list resource types:# hatype -listCLARiiONDiskDiskGroupElifNoneFileNoneFileOnOffFileOnOnlyIPIPMultiNICMountMultiNICANFSNICPhantomProcessProxyServiceGroupHBShareVolume

+ list all resources of a given type:# hatype -resources DiskGroupcluster1cluster2

+ list attributes of the given type:# hatype -display IP#Type Attribute ValueIP AgentFailedOn IP AgentReplyTimeout 130IP AgentStartTimeout 60IP ArgList Device Address NetMask Options ArpDelay IfconfigTwiceIP AttrChangedTimeout 60IP CleanTimeout 60IP CloseTimeout 60IP ConfInterval 600IP LogLevel errorIP MonitorIfOffline 1IP MonitorInterval 60IP MonitorTimeout 60IP NameRule IP_ + resource.AddressIP NumThreads 10IP OfflineTimeout 300IP OnlineRetryLimit 0IP OnlineTimeout 300IP OnlineWaitLimit 2IP OpenTimeout 60IP Operations OnOffIP RestartLimit 0IP SourceFile ./types.cfIP ToleranceLimit 0- Agents queries:

+ list agents:# haagent -listCLARiiONDiskDiskGroupElifNoneFileNoneFileOnOffFileOnOnlyIPIPMultiNICMountMultiNICANFSNICPhantomProcessProxyServiceGroupHBShareVolume

+ list status of an agent:# haagent -display IP#Agent Attribute ValueIP AgentFile IP Faults 0IP Running YesIP Started Yes

Summary of basic cluster operations:------------------------------------

- Cluster Start/Stop:

+ stop VCS on all systems:# hastop -all

+ stop VCS on bar_c and move all groups out:# hastop -sys bar_c -evacuate

+ start VCS on local system:# hastart

- Users:+ add gui root user:# haconf -makerw# hauser -add root# haconf -dump -makero- Group:

+ group start, stop:# hagrp -offline groupx -sys foo_c# hagrp -online groupx -sys foo_c

+ switch a group to other system:# hagrp -switch groupx -to bar_c

+ freeze a group:# hagrp -freeze groupx

+ unfreeze a group:# hagrp -unfreeze groupx

+ enable a group:# hagrp -enable groupx

+ disable a group:# hagrp -disable groupx

+ enable resources a group:# hagrp -enableresources groupx

+ disable resources a group:# hagrp -disableresources groupx

+ flush a group:# hagrp -flush groupx -sys bar_c

- Node:

+ feeze node:# hasys -freeze bar_c

+ thaw node:# hasys -unfreeze bar_c

- Resources:

+ online a resouce:# hares -online IP_192_168_1_54 -sys bar_c

+ offline a resouce:# hares -offline IP_192_168_1_54 -sys bar_c

+ offline a resouce and propagte to children:# hares -offprop IP_192_168_1_54 -sys bar_c

+ probe a resouce:# hares -probe IP_192_168_1_54 -sys bar_c

+ clear faulted resource:# hares -clear IP_192_168_1_54 -sys bar_c

- Agents:

+ start agent:# haagent -start IP -sys bar_c

+ stop agent:# haagent -stop IP -sys bar_c

- Reboot a node with evacuation of all service groups:(groupy is running on bar_c)

# hastop -sys bar_c -evacuate# init 6# hagrp -switch groupy -to bar_c

Changing cluster configuration:--------------------------------

You cannot edit configuration files directly while thecluster is running. This can be done only if cluster is down.The configuration files are in: /etc/VRTSvcs/conf/config

To change the configuartion you can:

+ use hagui+ stop the cluster (hastop), edit main.cf and types.cf directly,regenerate main.cmd (hacf -generate .) and start the cluster (hastart)+ use the following command line based procedure on running cluster

To change the cluster while it is running do this:

- Dump current cluster configuration to files and generate main.cmd file:

# haconf -dump# hacf -generate .# hacf -verify .

- Create new configuration directory:

# mkdir -p ../new

- Copy existing *.cf files in there:

# cp main.cf types.cf ../new

- Add new stuff to it:

# vi main.cf types.cf

- Regenerate the main.cmd file with low level commands:

# cd ../new# hacf -generate .# hacf -verify .

- Catch the diffs:

# diff ../config/main.cmd main.cmd > ,.cmd

- Prepend this to the top of the file to make config rw:

# haconf -makerw

- Append the command to make configuration ro:

# haconf -dump -makero

- Apply the diffs you need:

# sh -x ,.cmd

Cluster logging:-----------------------------------------------------

VCS logs all activities into /var/VRTSvcs/log directory.The most important log is the engine log engine.log_A.Each agent also has its own log file.

The logging parameters can be displayed with halog command:

# halog -infoLog on hades_c: path = /var/VRTSvcs/log/engine.log_A maxsize = 33554432 bytes tags = ABCDE

Configuration of a test group and test resource type:=======================================================

To get comfortable with the cluster configuration it is useful tocreate your own group that uses your own resource. Example belowdemonstrates configuration of a "do nothing" group with one resourceof our own type.

- Add group test with one resource test. Add this to/etc/VRTSvcs/conf/config/new/types.cf:

type Test (str TesterNameRule = resource.Nameint IntAttrstr StringAttrstr VectorAttr[]str AssocAttr{}static str ArgList[] = { IntAttr, StringAttr, VectorAttr, AssocAttr })

- Add this to /etc/VRTSvcs/conf/config/new/main.cf:

group test ( SystemList = { foo_c, bar_c } AutoStartList = { foo_c } )

Test test ( IntAttr = 100 StringAttr = "Testing 1 2 3" VectorAttr = { one, two, three } AssocAttr = { one = 1, two = 2 } )

- Run the hacf -generate and diff as above. Edit it to get ,.cmd file:

haconf -makerw

hatype -add Testhatype -modify Test SourceFile "./types.cf"haattr -add Test Tester -stringhatype -modify Test NameRule "resource.Name"haattr -add Test IntAttr -integerhaattr -add Test StringAttr -stringhaattr -add Test VectorAttr -string -vectorhaattr -add Test AssocAttr -string -assochatype -modify Test ArgList IntAttr StringAttr VectorAttr AssocAttrhatype -modify Test LogLevel errorhatype -modify Test MonitorIfOffline 1hatype -modify Test AttrChangedTimeout 60hatype -modify Test CloseTimeout 60hatype -modify Test CleanTimeout 60hatype -modify Test ConfInterval 600hatype -modify Test MonitorInterval 60hatype -modify Test MonitorTimeout 60hatype -modify Test NumThreads 10hatype -modify Test OfflineTimeout 300hatype -modify Test OnlineRetryLimit 0hatype -modify Test OnlineTimeout 300hatype -modify Test OnlineWaitLimit 2hatype -modify Test OpenTimeout 60hatype -modify Test RestartLimit 0hatype -modify Test ToleranceLimit 0hatype -modify Test AgentStartTimeout 60hatype -modify Test AgentReplyTimeout 130hatype -modify Test Operations OnOffhaattr -default Test AutoStart 1haattr -default Test Critical 1haattr -default Test Enabled 1haattr -default Test TriggerEvent 0hagrp -add testhagrp -modify test SystemList foo_c 0 bar_c 1hagrp -modify test AutoStartList foo_chagrp -modify test SourceFile "./main.cf"hares -add test Test testhares -modify test Enabled 1hares -modify test IntAttr 100hares -modify test StringAttr "Testing 1 2 3"hares -modify test VectorAttr one two threehares -modify test AssocAttr one 1 two 2

haconf -dump -makero

- Feed it to sh:

# sh -x ,.cmd

- Both group test and resource Test should be added to the cluster

Installation of a test agent for a test resource:-------------------------------------------------This agent does not start or monitor any specific resource. It justmaintains its persistent state in ,.on file. This can be used as atemplate for other agents that perform some real work.

- in /opt/VRTSvcs/bin create Test directory

# cd /opt/VRTSvcs/bin# mkdir Test

- link in the precompiled agent binary for script implemented methods:

# cd Test# ln -s ../ScriptAgent TestAgent

- create dummy agent scripts in /opt/VRTSvcs/bin/Test:(make then executable - chmod 755 ...)

online:#!/bin/shecho "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/logecho yes > /opt/VRTSvcs/bin/Test/,.onoffline:

#!/bin/shecho "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/logecho no > /opt/VRTSvcs/bin/Test/,.onopen:#!/bin/shecho "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/logclose:#!/bin/shecho "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/logshutdown:#!/bin/shecho "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/logclean:#!/bin/shecho "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/logmonitor:

#!/bin/shecho "`date` $0 $@" >> /opt/VRTSvcs/bin/Test/logcase "`cat /opt/VRTSvcs/bin/Test/,.on`" inno) exit 100 ;;*) exit 101 ;;esac

- start the agent:

# haagent -start Test -sys foo_c

- distribute the agent code to other nodes:

# cd /opt/VRTSvcs/bin/# rsync -av --rsync-path=/opt/pub/bin/rsync Test bar_cs/bin

- start test group:

# hagrp -online test -sys foo_c

Note:

Distribution or synchronization of the agent code is very important forcluster intergrity. If the agents differ on various cluster nodesunpredictible things can happen. I maintain a shell script in theveritas agent directory (/opt/VRTSvcs/bin) to distribute code of allagents I work on:

#!/bin/shset -xmkdir -p /tmp/vcsfor dest in hades_c:/opt/VRTSvcs/bin /tmp/vcs;dorsync -av --rsync-path=/opt/pub/bin/rsync --exclude=log --exclude=,.on ,.sync CCViews CCVOBReg CCVOBMount ClearCase Test CCRegistry NISMaster NISClient $dest done cd /tmp tar cvf vcs.tar vcs

Home directories service group configuration:=============================================

We configure home directories to be a service group consisting of an IP addressand the directory containing all home directories.Users can consistently connect (telnet, rsh, etc.) to the logical IP and expectto find thier home directories local on the system.The directory that we use is the source directory for the automounterthat mounts all directories as needed on the /home subdirectoies. We putdirectories on the /cluster3/homes directory and mount it with /etc/auto_home:

* localhost:/cluster3/homes/&

We assume that all required user accounts are configured on all clusternodes. This can be done by hand rdisting the /etc/passwd and group filesor by using NIS. We used both methods and NIS one is described below.All resources of the group are standard VCS supplied ones so we do nothave to implement any agent code for additional resources.

Group 'homes' has the following resource (types in brackets):

homes:

IP_homes (IP)||vvshare_homes (Share)qfe1_homes (NIC) | vmount_homes (Mount)|vvolume_homes (Volume)|vdgroup_homes (DiskGroup)

The service group definition for this group is as follows (main.cf):

group homes ( SystemList = { bar_c, foo_c } AutoStartList = { bar_c } )

DiskGroup dgroup_homes ( DiskGroup = cluster3 )

IP IP_homes ( Device = qfe2 Address = "192.168.1.55" )

Mount mount_homes ( MountPoint = "/cluster3" BlockDevice = "/dev/vx/dsk/cluster3/vol01" FSType = vxfs MountOpt = rw )

Share share_homes ( PathName = "/cluster3" Options = "-o rw=localhost" OnlineNFSRestart = 0 OfflineNFSRestart = 0 )

NIC qfe2_homes ( Device = qfe2 NetworkType = ether )

Volume volume_homes ( Volume = vol01 DiskGroup = cluster3 )

IP_homes requires qfe2_homes IP_homes requires share_homes mount_homes requires volume_homes share_homes requires mount_homes volume_homes requires dgroup_homes

NIS service group configuration:=================================

NIS is configured as two service groups: one for the NIS Master serverand the other for the NIS clients. The server is configured to store allNIS source data files on the shared storage in /cluster1/yp directory.We copied the follwing files to /cluster1/yp:

auto_home ethers mail.aliases netmasks protocols servicesauto_master group netgroup networks publickey timezonebootparams hosts netid passwd rpc

The makefile in /var/yp required some changes to reflect different thendefalt /etc location of source files. Also the use of sendmail to generatenew aliases while the NIS service was in the process of starting up washanging and we had to remove it from the stardart map generatetion.The limitation here is that the new mail aliases can only be added whenthe NIS is completely running. The follwing diffs have been applied to/var/yp/Makefile:

*** Makefile- Sun May 14 23:33:33 2000--- Makefile.var.yp Fri May 5 07:38:02 2000****************** 13,19 **** # resolver for hosts not in the current domain. #B=-b B=! DIR =/etc # # If the passwd, shadow and/or adjunct files used by rpc.yppasswdd # live in directory other than /etc then you'll need to change the--- 13,19 ---- # resolver for hosts not in the current domain. #B=-b B=! DIR =/cluster1/yp # # If the passwd, shadow and/or adjunct files used by rpc.yppasswdd # live in directory other than /etc then you'll need to change the****************** 21,30 **** # DO NOT indent the line, however, since /etc/init.d/yp attempts # to find it with grep "^PWDIR" ... #! PWDIR =/etc DOM = `domainname` NOPUSH = ""! ALIASES = /etc/mail/aliases YPDIR=/usr/lib/netsvc/yp SBINDIR=/usr/sbin YPDBDIR=/var/yp--- 21,30 ---- # DO NOT indent the line, however, since /etc/init.d/yp attempts # to find it with grep "^PWDIR" ... #! PWDIR =/cluster1/yp DOM = `domainname` NOPUSH = ""! ALIASES = /cluster1/yp/mail.aliases YPDIR=/usr/lib/netsvc/yp SBINDIR=/usr/sbin YPDBDIR=/var/yp****************** 45,51 **** else $(MAKE) $(MFLAGS) -k all NOPUSH=$(NOPUSH);fi all: passwd group hosts ethers networks rpc services protocols \! netgroup bootparams aliases publickey netid netmasks c2secure \ timezone auto.master auto.home c2secure:--- 45,51 ---- else $(MAKE) $(MFLAGS) -k all NOPUSH=$(NOPUSH);fi all: passwd group hosts ethers networks rpc services protocols \! netgroup bootparams publickey netid netmasks \ timezone auto.master auto.home

c2secure:****************** 187,193 **** @cp $(ALIASES) $(YPDBDIR)/$(DOM)/mail.aliases; @/usr/lib/sendmail -bi -oA$(YPDBDIR)/$(DOM)/mail.aliases; $(MKALIAS) $(YPDBDIR)/$(DOM)/mail.aliases $(YPDBDIR)/$(DOM)/mail.byaddr;- @rm $(YPDBDIR)/$(DOM)/mail.aliases; @touch aliases.time; @echo "updated aliases"; @if [ ! $(NOPUSH) ]; then $(YPPUSH) -d $(DOM) mail.aliases; fi--- 187,192 ----

We need only one master server so only one instance of this service groupis allowed on the cluster (group is not parallel).

Group 'nis_master' has the following resources (types in brackets):

nis_master:

master_NIS (NISMaster)|vmount_NIS (Mount)|vvolume_NIS (Volume)|vdgroup_NIS (DiskGroup)

The client service group is designed to configure domain name on thenode and then start ypbind in a broadcast mode. We need NIS client torun on every node so it is designed as parallel group. Clients cannotfunction without Master server running somewhere on the cluster networkso we include dependency between client and master service groups as'online global'.The client group unconfigures NIS completely from the node when it isshotdown. This may seem radical but it is required for consistencywith the startup.

To allow master group to come on line we also include in this groupautomatic configuration of the domain name.

The nis_master group is defined as follows (main.cf):

group nis_master ( SystemList = { bar_c, foo_c } AutoStartList = { bar_c } )

DiskGroup dgroup_NIS ( DiskGroup = cluster1 )

Mount mount_NIS ( MountPoint = "/cluster1" BlockDevice = "/dev/vx/dsk/cluster1/vol01" FSType = vxfs MountOpt = rw )

NISMaster master_NIS ( Source = "/cluster1/yp" Domain = mydomain )

Volume volume_NIS ( Volume = vol01 DiskGroup = cluster1 )

master_NIS requires mount_NIS mount_NIS requires volume_NIS volume_NIS requires dgroup_NIS

Group 'nis_client' has the following resource (types in brackets):

nis_client:

client_NIS (NISClient)

The nis_client group is defined as follows (main.cf):

group nis_client ( SystemList = { bar_c, foo_c } Parallel = 1 AutoStartList = { bar_c, foo_c } )

NISClient client_NIS ( Domain = mydomain )

requires group nis_master online global

Both master and client service group use custom built resource andcorrespnding agent code. The resource are defined as follows (in types.cf):

type NISClient ( static str ArgList[] = { Domain } NameRule = resource.Name str Domain)

type NISMaster ( static str ArgList[] = { Source, Domain } NameRule = resource.Name str Source str Domain)

The agents code for NISMaster:

- online:

Time synchronization services (xntp):======================================,,,

ClearCase configuration:=========================

ClearCase is a client server system prividing so called multi-vesion filesystem functionality. The mvfs file systems are used to track contentsof files, directories, symbolic links, in versions of so called elements.Elements are stored in VOBs (mvfs objects) and are looked at using Viewsobjects. Information about objects like their location, permissions,etc. is stored in distributed database called registry. For ClearCase tobe configured on a system the Registry, VOB and View server processeshave to be started. VOBs and Views store their data in a regulardirectory trees. The VOB and View storage directories can be located onthe shared storage of the cluster and cluster service groups configuredto mount it and start needed server processes.

We configured ClearCase as a set of four service groups: ccregistry, views,vobs_group_mnt, vobs_group_reg. Each node in the cluster must have astandard ClearCase installed and configured into the same region. Allviews and VOBs need to be configured to use their storage directorieson the cluster shared storage. In our case we used /cluster2/viewstorefor views storate directory and /cluster4/vobstore for VOBstorage directory. All VOBs must be public.

The licensing of clearcase in the cluster is resolved by configuringeach node in the cluster as the license server for itself. This is doneby transfering all your licenses from one node to the other and stillkeeping the other license server. Since this may be a streatch of thelicensing agreement you may want to use a separate license server outsideof the cluster.

Groups and resources:---------------------

All four service groups (ccregistry, views, vobs_group_mnt, vobs_group_reg)perform a specialized clearcase function that can be isolated to a singlenode of the cluster. All nodes of the cluster run the basic clearcaseinstallation and this is performed by the resource type named ClearCase.Each of the service groups includes this resource.The ccregistry service group transfers clearcase master registry serverto a particular cluster node. This is performed by the specialized resource of type CCRegistry.The Views are handled by service groups that include specialized resourceof type CCViews. This resource registeres and starts all views sharing thesame storage directory.The VOBs functionality is handled by two separate service groups: one thatregistres a VOB on the cluster node and the other that mounts it on the sameor other cluster node. The VOB registration is performed by the specializedresource of type CCVOBReg and the VOB mounting by the resource of typeCCVOBMount.Detailed description of each service group and their resources follows:

ccregistry service group:--------------------------

The ccregistry group is responsible for configuring a cluster node asa primary registry server and if necessary unconfiguring it from anyother nodes on the cluster. All nodes in the cluster are configured asregistry backup servers that store a copy of the primary registry data.The /var/adm/atria/rgy/rgy_hosts.conf has to be configured with allcluster nodes as backups:

/var/adm/atria/rgy/rgy_hosts.conf:

foo_cfoo_c bar_c

This group uses two custom resources: ccregistry_primary andccase_ccregistry. The ccase_ccregistry is of type ClearCase and isresponsible for starting basic ClearCase services. No views or VOBs areconfigured at this point. Other service groups will do that later. Theccregistry_primary resource is changing configuration files to configurea host as primary registry server.

ccregistry:

ccregistry_primary (CCRegistry)||vccase_ccregistry (ClearCase)

The ccregistry group is defined as follows (main.cf):

group ccregistry ( SystemList = { bar_c, foo_c } AutoStartList = { bar_c } )

CCRegistry ccregistry_primary ( )

ClearCase ccase_ccregistry ( )

ccregistry_primary requires ccase_ccregistry

The custom resource for the group CCRegistry and ClearCase are definedas (in types.cf):

type CCRegistry ( static str ArgList[] = { } NameRule = resource.Name)

type ClearCase ( static str ArgList[] = { } NameRule = resource.Name static str Operations = OnOnly)

ClearCase resource implemenation:

The ClearCase 'online' agent is responsible for configuring registryconfiguration files and starting ClearCase servers. Configration is donein such a way that clearcase runs only one registry master server in thecluster. The /var/adm/atria/rgy/rgy_hosts.conf file is configured to usethe current node as the master only if no clearcase is running on othercluster nodes. If clearcase service group is detected in on-line stateanywhere in the cluster the current node is started as the registry backupserver. It is assumed that the other node claimed the master registrystatus already. The master status file /var/adm/atria/rgy/rgy_svr.confis updated to indicate the current node status. After the registryconfiguration files are prepared the standard clearcase startup scriptis run /usr/atria/etc/atria_start.

ClearCase/online agent:

> #!/bin/sh> # ClearCase online:> if [ -r /view/.specdev ];then> # Running:> exit 0> fi> > this=`hostname`> primary=`head -1 /var/adm/atria/rgy/rgy_hosts.conf`> backups=`tail +2 /var/adm/atria/rgy/rgy_hosts.conf`> master=`cat /var/adm/atria/rgy/rgy_svr.conf`> > online=> for host in $backups;do> if [ "$host" != "$this" ];then> stat=`hagrp -state ccregistry -sys $host | grep ONLINE | wc -l`> if [ $stat -gt 0 ];then> online=$host> break> fi> fi> done> > if [ "$this" = "$primary" -a "$online" != "" ];then> # Erase master status:> cp /dev/null /var/adm/atria/rgy/rgy_svr.conf> > # Create configuraion file with this host as a master:> cat $online> $backups> EOF> fi> > # Normal ClearCase startup:> /bin/sh -x /usr/atria/etc/atria_start start

The ClearCase resource is configured not to use 'offline' but only'shutdown' agent. The 'offline' could be dangerous for clearcase ifVCS missed the monitor detection and decided to restart it.The 'shutdown' ClearCase agent stops all clearcase servers using standardclearcase shutdown script (/usr/atria/etc/atria_start).

ClearCase/shutdown:

> #!/bin/sh> # ClearCase shutdown:> # Normal ClearCase shutdown:> /bin/sh -x /usr/atria/etc/atria_start stop

ClearCase/monitor:

> #!/bin/sh> # ClearCase monitor:> if [ -r /view/.specdev ];then> # Running:> exit 110> else> # Not running:> exit 100> fi

CCRegistry resource implemenation:

This resource verifies if the current node is configured as the registrymaster server and if not performs switch-over from other node to this one.The complication here was the sequence of events: when switchingover a group from one node to the other the VCS engine first offlines iton the node that is on-line and then brings it on-line on the one thatwas offline.With registry switch-over the sequence of events has to be reversed: thedestination node must fist perform the rgy_switchover and transfer themaster status to itself while the master is up and next the old mastermust be shutdown and configured as backup.

For this sequence to be implemented the offline agent (that is called first on the current primary) does not perform the switchoverbut only marks the intent of the master to be transfered by creatinga marker file ,.offline in the agent directory. The monitor scriptthat is called next on the current master is reporting the primarybeing down if it finds the ,.offline marker.

CCRegistry/offline:

> #!/bin/sh> # CCRegistry offline:> if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then> # No albd_server - no vobs:> exit 1> fi> > this=`hostname`> primary=`head -1 /var/adm/atria/rgy/rgy_hosts.conf`> backups=`tail +2 /var/adm/atria/rgy/rgy_hosts.conf`> > if [ "$this" != "$primary" ];then> # This host is not configured as primary - do nothing:> exit 1> fi> > touch /opt/VRTSvcs/bin/CCRegistry/,.offline> > exit 0

Next the online agent on the target node performs the actual switch-overusing rgy_switchover.Next the monitor script in the following iteration on the old primary seesthat the primary was tranfered by looking into the rgy_hosts.conf file andthen removes the ,.offline marker.

> #!/bin/sh> # CCRegistry online:> if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then> # No albd_server - no vobs:> exit 1> fi> > this=`hostname`> primary=`head -1 /var/adm/atria/rgy/rgy_hosts.conf`> backups=`tail +2 /var/adm/atria/rgy/rgy_hosts.conf`> > if [ "$this" = "$primary" ];then> # This host is already configured as primary - do nothing:> exit 1> fi> > # Check if this host if on the backup list - if not do nothing.> # Only backups can become primary.> > contine=0> for backup in $backups; do> if [ "$backup" = "$this" ];then> continue=1> fi> done> if [ $continue -eq 0 ];then> exit 1> fi> > > # Check if backup data exists. If not do nothing:> if [ ! -d /var/adm/atria/rgy/backup ];then> exit 1> fi> > # Check how old the backup data is. If it is to old do nothing:> # ,,,> > > # Put the backup on line and switch hosts. Change from $primary to $this host.> # Assign last $backup host in backup list as backup:> > /usr/atria/etc/rgy_switchover -backup "$backups" $primary $this> > touch /opt/VRTSvcs/bin/CCRegistry/,.online> > exit 0

Sometimes the rgy_switchover running on the target node does not completethe registry transfer and the operation has to be retried. To do thisthe online agent leaves an ,.online marker in the agent directory rightafter the rgy_switchover is run. Next the monitor agent looks for the,.online marker and if it finds it it retries the rgy_switchover.As soon as the monitor agent detects that the configuration files havebeen properly updated and the switch-over was completed it removes the,.online marker.

To maintain integrity of the agent operation the open and close agentsremove both marker files (,.online and ,.offline) that may have beenleft there from the previous malfunctioning or crashed system.

CCRegstry/open:

> cd /opt/VRTSvcs/bin/CCRegistry> rm -f ,.offline ,.online

CCRegstry/close:

> cd /opt/VRTSvcs/bin/CCRegistry> rm -f ,.offline ,.online

vobs__reg service group:---------------------------------The first step in configuring clearcase is to create, register andmount VOBs. This source group is designed to register a set of VOBsthat use a specific storage directory. All VOBs that are located in agiven directory are registered on the current cluster node. The is the parameter that should be replace with the uniqe name indicatinga group of VOBs. We used this name to consitently name Veritas VolumeManager disk group, mount point directory and a collection of clusterresources desinged to provide VOBs infrastructure. The vobs__regservice group is built of the following resources:

- ccase__reg resrouce of type ClearCase. This resource powers upclearcase on the cluster node and makes it ready for use. See above thedetailed description of this group implementation.

- ccvobs__reg resource of type CCVOBReg. This is a custom resourcethat registers given set of VOBs identified by given VOB tags, storagedirectory.

- mount__reg resource of type Mount. This resource mounts givenVeritas Volume on a directory.

- volume__reg resrouce of type Volume. This resource starts indicatedVeritas Volume in a given disk group.

- dgroup__reg resource of type DiskGroup onlines given Vertasdisk group.

Here is the dependency diagram of the resrouces of this group:

ccvobs__reg(CCVOBReg)||vvccase__reg (ClearCase) mount__reg (Mount)|vvolume__reg(Volume)|vdgroup__reg(DiskGroup)

There can be many instances of this service group - one for each collectionof VOBs. Each set can be managed separately onlining it on variouscluster nodes and providing load balancing functionality.One of our implemetations used "cluster4" for the name of the .We named Veritas disk group "cluster4", the VOBs storage directory/cluster4/vobstore. Here is the example definition of the vobs_cluster4_reggroup (in main.cf):

group vobs_cluster4_reg ( SystemList = { foo_c, bar_c } AutoStartList = { foo_c } ) CCVOBReg ccvobs_cluster4_reg ( Storage = "/cluster4/vobstore" CCPassword = foobar ) ClearCase ccase_cluster4_reg ( ) DiskGroup dgroup_cluster4_reg ( DiskGroup = cluster4 )

Mount mount_cluster4_reg ( MountPoint = "/cluster4" BlockDevice = "/dev/vx/dsk/cluster4/vol01" FSType = vxfs MountOpt = rw )

Volume volume_cluster4_reg ( Volume = vol01 DiskGroup = cluster4 )

requires group ccregistry online global

ccvobs_cluster4_reg requires ccase_cluster4_reg ccvobs_cluster4_reg requires mount_cluster4_reg mount_cluster4_reg requires volume_cluster4_reg volume_cluster4_reg requires dgroup_cluster4_reg

CCVOBReg resource implementation:

The resource type is defined as follows:

type CCVOBReg ( static str ArgList[] = { CCPassword, Storage } NameRule = resource.Name str Storage str CCPassword)

The CCPasswd is the ClearCase registry password. The Storage is thedirectory where VOBs storage directories are located.

The online agent checks the storage directory and uses basenames of alldirectory entries with suffix .vbs as the VOB's tags to register. First we try to unmount, remove tags, unregister and kill VOB's servers.Removing of tags is done with the send-expect engine (expect) running'ct rmtag' command so we can interactively provide registry password.When the VOB previous instance is cleaned up it is registered and tagged.

> #!/bin/sh> # CCVOBReg online:>> shift> pass=$1> shift> vobstorage=$1> > if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then> # No albd_server - no views:> exit 1> fi> > # Handle all VOBs created in the VOB storage directory:> if [ ! -d $vobstorage ];then> exit> fi> > for tag in `cd $vobstorage; ls | sed 's/.vbs//'`;do> storage=$vobstorage/$tag.vbs> > # Try to cleanup first:> cleartool lsvob /vobs/$tag> status=$?> if [ $status -eq 0 ];then> cleartool umount /vobs/$tag> > expect -f - expect "Registry password:"> send "$pass\n"> expect eof> EOF> > cleartool unregister -vob $storage> > pids=`ps -ef | grep vob | grep "$storage" | grep -v grep | awk '> { print $2 }'`> > for pid in $pids;do> kill -9 $pid> done> fi> > # Now register:> cleartool register -vob $storage> cleartool mktag -vob -pas $pass -public -tag /vobs/$tag $storage> done>

The monitor agent is implemented by checking 'ct lsvob' output and comparingthe vobs listed as registered on the current host versus the vobs foundin the VOB's storage directory:

> #!/bin/sh> # CCVOBReg monitor:>> > vobs -l` -eq 0 ];then> # No albd_server:> exit 100> fi> > # Handle all VOBs created in the VOB storage directory:> if [ ! -d $vobstorage ];then> exit 100> fi> > # Number of VOBS found in the storage:> nvobs_made=`cd $vobstorage; ls | sed 's/.vbs//' | wc -l`> > # Number of VOBS registered on this host:> nvobs_reg=`cleartool lsvob | grep /net/$host$vobstorage | wc -l`> > #if [ $nvobs_reg -lt $nvobs_made ];then> if [ $nvobs_reg -lt 1 ];then> # Not running:> exit 100> else> # Running:> exit 110> fi

The offline agent work in the same way as the online with excetion ofregistering and tagging of the VOB.

vobs__mnt service group:--------------------------------After VOBs are registered and tagged on the cluster node they need tobe mounted. The mount can be done anywhere in the cluster and not necessarylyon the same node where they are registered.The vobs__mnt service group performs the mounting operation. It isdesinged to complement vobs__reg services group and operate on a setof VOBs.

The following resource compose this service group:

- ccase__mnt resource of type ClearCase. This resource powers upclearcase on the cluster node and makes it ready for use.

- ccvobs__mnt resource of type CCVOBMount.The work of mounting a set of VOBs is implemented in the in this resource. The VOBs are defined as a list of tags.

Here is the dependency diagram of the resrouces of this group:

ccvobs__mnt (CCVOBMount)|vccase__mnt (ClearCase)

There may be many instances of the vobs__mnt - the is usedas the name of the VOBs group. We used "cluster4" to match the name ofthe vobs_cluster4_reg group. Here is how we defined it (in main.cf):

group vobs_cluster4_mnt ( SystemList = { foo_c, bar_c } AutoStartList = { foo_c } Parallel = 1 PreOnline = 1 ) CCVOBMount ccvobs_cluster4_mnt ( CCPassword = foobar Tags = { cctest, admin } ) ClearCase ccase_cluster4_mnt ( ) requires group vobs_cluster4_reg online global ccvobs_cluster4_mnt requires ccase_cluster4_mnt

CCVOBMount resource implementation:

The resource type is defined as follows:

type CCVOBMount ( static str ArgList[] = { CCPassword, Tags } NameRule = resource.Name str CCPassword str Tags[] str Storage)

The CCPassword is the ClearCase registry password. The Tags is thelist of VOB's tags to mount.

The online agent mounts and unlocks the list of VOBs. The NFS shares arealso refreshed to allow for remove VOBs use.

> #!/bin/sh> # CCVOBMount online:> shift> pass=$1> shift> shift> tags=$*> > if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then> # No albd_server - no vobs:> exit 1> fi> for tag in $tags;do> cleartool mount /vobs/$tag> cleartool unlock vob:/vobs/$tag> done> > # Refresh share table - othewise remote nodes can't mount storage directory:> shareall

The offline agent terminates all processes that use clearcase file systems.Unexports all views and then umounts all vobs locking them first.

> #!/bin/sh> # CCVOBMount offline:> shift> pass=$1> shift> shift> tags=$*> > if [ `ps -ef | grep albd_server | grep -v grep | wc -l` -eq 0 ];then> # No albd_server - no vobs:> exit 1> fi> > # Kill users of mvfs:> tokill=`/usr/atria/sun5/kvm/5.6/fuser_mvfs -n /dev/ksyms`> while [ -n "$tokill" ];do> kill -HUP $tokill> tokill=`/usr/atria/sun5/kvm/5.6/fuser_mvfs -n /dev/ksyms`> done> > # Unexport views:> /usr/atria/etc/export_mvfs -au> > for tag in $tags;do> on=`cleartool lsvob /vobs/$tag | grep '^*' | wc -l`> if [ $on -ne 0 ];then> cleartool lock vob:/vobs/$tag> cleartool umount /vobs/$tag> fi> done

views service group:----------------------

The views service group manages a group of views configured to usea specific directory as parent of views storage directories. All views that are found in the provided directory are started, stoppedand monitored. The group is using the following resources:

views:

views_views (CCViews)||vvccase_views (ClearCase)mount_views (Mount)|vvolume_views (Volume)|vdgroup_views (DiskGroup)

The views custom resource CCViews is defined as follows (in types.cf):

type CCViews ( static str ArgList[] = { CCPassword, Storage } NameRule = resource.Name str CCPassword str Storage)

ClearCase service groups and NIS:---------------------------------------

,,,

Disk backup of the shared storage:-------------------------------------

The backup node of the cluster should switch all of the shared storage toitself before doing the backup. This can be done with a simple switchoverof all the storage related service groups to the backup node, doing thebackup and switching the groups back to its intended locations.We do it with the following shell script that does full backup of all clusterto the DAT tape evey night.

> #!/bin/sh> # $Id: VCS-HOWTO,v 1.25 2002/09/30 20:05:38 pzi Exp $> # Full backup script. All filesystem from vfstab in cpio format to DLT.> # Logs in /backup/log_.> > set -x> > SYSD=/opt/backup> LOG=log_`date +%y%m%d_%H%M%S`> ATRIAHOME=/usr/atria; export ATRIAHOME> PATH=${PATH}:$ATRIAHOME/bin> DEV=/dev/rmt/4ubn> > exec > $SYSD/$LOG 2>&1> > # Move all cluster shared storage to the backup node:> groups="nis_master homes views vobs_cluster4_mnt vobs_cluster4_reg ccregistry"> for group in $groups; do> /opt/VRTSvcs/bin/hagrp -switch $group -to zeus_c> done> > # Take all file systems in /etc/vfstab that are of type ufs or vxfs and> # are not /backup:> FSYS=`awk '$1 !~ /^#/ { \> if ( ( $4 == "ufs" || $4 == "vxfs" ) \> && $3 != "/backup" && $3 != "/backup1" && $3 != "/spare" ) \> { print $3 } \> }' /etc/vfstab`> > # Start and stop jobs for each file system:> vobs=`cleartool lsvob | grep \* | awk '{ printf "vob:%s ", $2 }'`> cluster4_start="cleartool lock -c Disk-Backup-Running-Now $vobs"> cluster4_stop="cleartool unlock $vobs"> > mt -f $DEV rewind> > cd /> > for f in $FSYS;do> f=`echo $f | sed 's/^\///'`> eval $"${f}_start"> echo $f> find ./$f -mount | cpio -ocvB > $DEV> eval $"${f}_stop"> done> > mt -f $DEV rewind> > # Move cluster to the split state - hades_c runs all users homes, etc.> groups="homes views"> for group in $groups; do> /opt/VRTSvcs/bin/hagrp -switch $group -to hades_c> done> > ( head -40 $SYSD/$LOG; echo '...'; tail $SYSD/$LOG ) | \> mailx -s "backup on `hostname`" root> > > >

How to configure VERITAS Cluster Server (VCS) to set up event notification to users

Details:

The following example makes use of the "resfault" event. This event type can be configured to mail predefined users about a resource that has failed. It can be set up as follows:The following actions need to be performed on all nodes in a cluster:1. Copy the trigger from/opt/VRTSvcs/bin/sample_triggers/resfaultto/opt/VRTSvcs/bin/triggers/resfault2. To set up mail notification, uncomment the following section at the end of the resfault file/opt/VRTSvcs/bin/triggers/resfault:# put your code here...## Here is a sample code to notify a bunch of users.# @recipients=("[email protected]");## $msgfile="/tmp/resfault";# echo system = $ARGV[0], resource = $ARGV[1] > $msgfile;# foreach $recipient (@recipients) {# # Must have elm setup to run this.# elm -s resfault $recipient < $msgfile;# }# rm $msgfile;#By default, the script, once uncommented, is designed to use elm to notify users. Some systems will not have elm. If not, the standard mailx utility can be used instead, as detailed below. Note the use of the "\", which is needed so that the "@" gets interpreted correctly by Perl:# Here is a sample code to notify a bunch of users.@recipients=("root\@anotherserver.com,root");$msgfile="/tmp/resfault";echo system = $ARGV[0], resource = $ARGV[1] > $msgfile;foreach $recipient (@recipients) {# Must have mailx to run this.mailx -s resfault $recipient < $msgfile;}rm $msgfile;This is all that has to be done. When a resource next fails, a message similar to the following will be seen in the relevant person's mailbox:From: Super-User Message-Id: To: rootSubject: resfaultContent-Length: 42system = sptsun****, resource = nfsdg_nfs

How to place a volume and disk group under the control of VCS (Symantec Cluster Server)

Details:

Following is the algorithm to create a volume, file system and put them under VCS (Symantec Cluster Server).1. Create a disk group. This can be done withvxdg.2. Create a mount point and file system.

3. Deport the disk group. This can be done withvxdg.4. Create a service group. This can be done withhagrp.5. Add cluster resources (given bellow) to the service group. This can be done withhagrp. Resources Name Attributes1. Disk group, disk group name2. Mount block device, FSType, MountPoint.

Note:An example of a service group that contains a DiskGroup resource can be found in the "Symantec Cluster Server 6.1 Bundled Agents Guide":https://sort.symantec.com/public/documents/vcs/6.1/linux/productguides/html/vcs_bundled_agents/ch02s02s02.htmThe complete "Symantec Cluster Server 6.1 Bundled Agents Guide" can be found here:http://www.symantec.com/business/support/resources/sites/BUSINESS/content/live/DOCUMENTATION/6000/DOC6943/en_US/vcs_bundled_agents_61_lin.pdf

6. Create dependencies between the resources (given bellow). This can be done usinghadep.

1. Mount and disk group.

7.Enable all resources. This can be done usinghares.

The following examples show how to create a RAID-5 volume with a VxFS file system and put it under VCS control.Method 1 -Using the command line1. Create a disk group using Volume Manager with a minimum of 4 disks:# vxdg init datadg disk01=c1t1d0s2 disk02=c1t2d0s2 disk03=c1t3d0s2 disk04=c1t4d0s2# vxassist -g datadg make vol01 2g layout=raid52. Create a mount point for this volume:# mkdir /vol013. Create a file system on this volume:# mkfs -F vxfs /dev/vx/rdsk/datadg/vol014. Deport this disk group:# vxdg deport datadg5. Create a service group:# haconf -makerw# hagrp -add newgroup# hagrp -modify newgroup SystemList 0 1# hagrp -modify newgroup AutoStartList 6. Create a disk group resource and modify its attributes:# hares -add data_dg DiskGroup newgroup# hares -modify data_dg DiskGroup datadg7. Create a mount resource and modify its attributes:# hares -add mnt Mount newgroup# hares -modify mnt BlockDevice /dev/vx/dsk/datadg/vol01# hares -modify mnt FSType vxfs# hares -modify mnt MountPoint /vol018. Link the mount resource to the disk group resource:# hares -link mnt data_dg

9. Enable the resources and close the configuration:# hagrp -enableresources newgroup # haconf -dump -makero

Method 2 -Editing /etc/VRTSvcs/conf/config/main.cf# hastop -all# cd /etc/VRTSvcs/conf/config# haconf -makerw# vi main.cfAdd the following lines to end of this file, customizing the attributes as appropriate for your configuration:**********************************************START******************************************************group newgroup (SystemList = { sysA =0, sysB=1}AutoStartList = { sysA })DiskGroup data_dg (DiskGroup = datadg)Mount mnt (MountPoint = "/vol01"BlockDevice = " /dev/vx/dsk/datadg/vol01"FSType = vxfs)mnt requires data_dg************************************************END******************************************************# haconf -dump -makero# hastart -localCheck status of the new service group.

How to dynamically remove a node from a live cluster without interruptions

Details:

Before making changes to the VERITAS Cluster Server (VCS) configuration, the main.cf file, make a good copy of the current main.cf. In this example, csvcs6 is removed from a two node cluster. Execute these commands on csvcs5, the system not to be removed.

1.cp -p /etc/VRTSvcs/conf/config/main.cf /etc/VRTSvcs/conf/config/main.cf.last_known.good

2. Check the current systems, group(s), and resource(s) status

#hastatus -sum


A csvcs5 RUNNING 0 A csvcs6 RUNNING 0


B test_A csvcs5 Y N ONLINEB test_A csvcs6 Y N OFFLINEB test_B csvcs6 Y N ONLINEB wvcs csvcs5 Y N OFFLINEB wvcs csvcs6 Y N ONLINE

Based on the outputs, csvcs5 and csvcs6 are the two nodes cluster. Service group test_A and service group wvcs are configured to run on both nodes. Service group test_B is configured to run on csvcs6 only.

Both service groups test_B and wvcs are online on csvcs6. Now it is possible to failover service group wvcs to csvcs5 if it is to be online.

hagrp -switch -to

#hagrp -switch wvcs -to csvcs5

3. Check for service group dependency

#hagrp -dep#Parent Child Relationshiptest_B test_A online global

4. Make VCS configuration writable

#haconf -makerw

5. Unlink the group dependency if there is any. In this case, the service group test_B requires test_A.

hagrp -unlink

#hagrp -unlink test_B test_A

6. Stop VCS on csvcs6, the node to be removed.

hastop -sys

#hastop -sys csvcs6

7. Check the status again, making sure csvcs6 is EXITED and the failover service group is online on running node.

#hastatus -sum


A csvcs5 RUNNING 0 A csvcs6 EXITED 0


B test_A csvcs5 Y N ONLINEB test_A csvcs6 Y N OFFLINEB test_B csvcs6 Y N OFFLINEB wvcs csvcs5 Y N ONLINEB wvcs csvcs6 Y N OFFLINE

8. Delete csvcs6 from wvcs and test_A SystemList.

hagrp -modify SystemList -delete

#hagrp -modify wvcs SystemList -delete csvcs6#hagrp -modify test_A SystemList -delete csvcs6

9. Check all the resources belonging to the service group and delete all the resources from group test_B before removing the group.

hagrp -resources

#hagrp -resources test_Bjprocesskprocess

hares -delete

#hares -delete jprocess#hares -delete kprocess

hagrp -delete

#hagrp -delete test_B

10. Check the status again, making sure all the service groups are online on the other node. In this case csvcs5.

#hastatus -sum


A csvcs5 RUNNING 0 A csvcs6 EXITED 0


B test_A csvcs5 Y N ONLINEB wvcs csvcs5 Y N ONLINE

11. Delete system (node) from cluster, save the configuration, and make it read only.

# hasys -delete csvcs6

#haconf -dump -makero

12. Depending on how the cluster is defined or the number of nodes in the cluster, it might be necessary to reduce the number for "/sbin/gabconfig -c -n # "in the /etc/gabtab file on all the running nodes within the cluster. If the # is larger than the number of nodes in the cluster, the GAB will not be auto seed.

To prevent VCS from starting after rebooting, do the following on the removed node (csvcs6):

1. Unconfigure and unload GAB/sbin/gabconfig -u

modunload -i `modinfo | grep gab | awk '{print $1}`

2. Unconfigure and unload LLT

/sbin/lltconfig -U

modunload -i `modinfo | grep llt | awk '{print $1}`

3. Prevent LLT, GAB and VCS from starting up in the future

mv /etc/rc2.d/S70llt /etc/rc2.d/s70lltmv /etc/rc2.d/S92gab /etc/rc2.d/s92gabmv /etc/rc3.d/S99vcs /etc/rc3.d/s99vcs

4. If it** is not **desired to be running VCS on this particular node again, all the VCS related packages and files can now be removed.

pkgrm VRTSperlpkgrm VRTSvcspkgrm VRTSgabpkgrm VRTSlltrm /etc/llttabrm /etc/gabtab

NOTE:Due to the complexity and variation of VCS configuration, it is not possible to cover all the possible situations and conditions of a cluster configuration in one technote. The above steps are essential for common configuration in most VCS setups and provide some idea how to deal with complex setups.

How to offline a critical resource without affecting other resources and bringing a service group offline

Details:

If there is a need to bring a resource in a service group offline without affecting other resources being run, then something similar to the following procedure can be used:The aim here is to be able to take a critical resource offline to perform maintenance. Note that, ordinarily, a critical resource being offlined would cause the failover of a service group.

1. Freeze the service group in question, e.g.:

#haconf -makerw#hagrp -freeze jbgroup -persistent#haconf -dump makero

#hagrp -display jbgroupreveals that the group is now frozen:Group Attribute System Valuejbgroup AutoFailOver global 1jbgroup AutoStart global 1jbgroup AutoStartList global sptsunvcs2jbgroup FailOverPolicy global Priorityjbgroup Frozen global 1

Date post:	12-Oct-2015
Category:	Documents
Upload:	sumit0428
View:	58 times
Download:	2 times

Veritas Cluster Server-VCS

Documents