Home >Documents >VERITAS Cluster Server for UNIX Fundamentals

VERITAS Cluster Server for UNIX Fundamentals

Date post:27-Apr-2015
Category:
View:1,363 times
Download:8 times
Share this document with a friend
Transcript:

VERITAS Cluster Server for UNIX, Fundamentals (Lessons)

HA-VCS-410-101A-2-10-SRT (100-002149-A)

COURSE DEVELOPERS

Disclaimer The information contained in this publication is subject to change without notice. VERITAS Software Corporation makes no warranty of any kind with regard to this guide, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. VERITAS Software Corporation shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance, or use of this manual. Copyright Copyright 2005 VERITAS Software Corporation. All rights reserved. No part of the contents of this training material may be reproduced in any form or by any means or be used for the purposes of training or education without the written permission of VERITAS Software Corporation. Trademark Notice VERITAS, the VERITAS logo, and VERITAS FirstWatch, VERITAS Cluster Server, VERITAS File System, VERITAS Volume Manager, VERITAS NetBackup, and VERITAS HSM are registered trademarks of VERITAS Software Corporation. Other product names mentioned herein may be trademarks and/or registered trademarks of their respective companies. VERITAS Cluster Server for UNIX, Fundamentals Participant GuideApril 2005 Release

Bilge Gerrits Siobhan Seeger Dawn Walker

LEAD SUBJECT MATTER EXPERTS

Geoff Bergren Connie Economou Paul Johnston Dave Rogers Jim Senicka Pete Toemmes

TECHNICAL CONTRIBUTORS AND REVIEWERS

Billie Bachra Barbara Ceran Bob Lucas Gene Henriksen Margy Cassidy

VERITAS Software Corporation 350 Ellis Street Mountain View, CA 94043 Phone 6505278000 www.veritas.com

Table of ContentsCourse Introduction VERITAS Cluster Server Curriculum ................................................................ Intro-2 Course Prerequisites......................................................................................... Intro-3 Course Objectives............................................................................................. Intro-4 Certification Exam Objectives........................................................................... Intro-5 Cluster Design Input .......................................................................................... Intro-6 Sample Design Input.......................................................................................... Intro-7 Sample Design Worksheet................................................................................. Intro-8 Lab Design for the Course ................................................................................ Intro-9 Lab Naming Conventions ................................................................................ Intro-10 Classroom Values for Labs............................................................................... Intro-11 Course Overview............................................................................................. Intro-12 Legend ............................................................................................................ Intro-15 Lesson 1: VCS Building Blocks Introduction ............................................................................................................. 1-2 Cluster Terminology ................................................................................................ 1-4 A Nonclustered Computing Environment ................................................................ 1-4 Definition of a Cluster .............................................................................................. 1-5 Definition of VERITAS Cluster Server and Failover............................................... 1-6 Definition of an Application Service ........................................................................ 1-7 Definition of Service Group...................................................................................... 1-8 Service Group Types................................................................................................. 1-9 Definition of a Resource ......................................................................................... 1-10 Resource Dependencies .......................................................................................... 1-11 Resource Attributes................................................................................................. 1-12 Resource Types and Type Attributes...................................................................... 1-13 Agents: How VCS Controls Resources .................................................................. 1-14 Using the VERITAS Cluster Server Bundled Agents Reference Guide ................ 1-15 Cluster Communication......................................................................................... 1-16 Low-Latency Transport .......................................................................................... 1-17 Group Membership Services/Atomic Broadcast (GAB) ........................................ 1-18 The Fencing Driver ................................................................................................. 1-19 The High Availability Daemon............................................................................... 1-20 Comparing VCS Communication Protocols and TCP/IP ....................................... 1-21 Maintaining the Cluster Configuration ................................................................... 1-22 VCS Architecture................................................................................................... 1-24 How does VCS know what to fail over?................................................................. 1-24 How does VCS know when to fail over?................................................................ 1-24 Supported Failover Configurations........................................................................ 1-25 Active/Passive......................................................................................................... 1-25 N-to-1...................................................................................................................... 1-26 N + 1 ....................................................................................................................... 1-27 Active/Active .......................................................................................................... 1-28 N-to-N ..................................................................................................................... 1-29Table of ContentsCopyright 2005 VERITAS Software Corporation. All rights reserved.

i

Lesson 2: Preparing a Site for VCS Planning for Implementation ................................................................................... 2-4 Implementation Needs .............................................................................................. 2-4 The Implementation Plan .......................................................................................... 2-5 Using the Design Worksheet..................................................................................... 2-6 Hardware Requirements and Recommendations ................................................... 2-7 SCSI Controller Configuration for Shared Storage .................................................. 2-9 Hardware Verification............................................................................................ 2-12 Software Requirements and Recommendations................................................... 2-13 Software Verification ............................................................................................. 2-15 Preparing Cluster Information ............................................................................... 2-16 VERITAS Security Services .................................................................................. 2-17 Lab 2: Validating Site Preparation ........................................................................ 2-19 Lesson 3: Installing VERITAS Cluster Server Introduction ............................................................................................................. 3-2 Using the VERITAS Product Installer...................................................................... 3-4 Viewing Installation Logs ......................................................................................... 3-4 The installvcs Utility ................................................................................................. 3-5 Automated VCS Installation Procedure .................................................................... 3-6 Installing VCS Updates.......................................................................................... 3-10 VCS Configuration Files........................................................................................ 3-11 VCS File Locations ................................................................................................. 3-11 Communication Configuration Files...................................................................... 3-12 Cluster Configuration Files .................................................................................... 3-13 Viewing the Default VCS Configuration ................................................................ 3-14 Viewing Installation Results .................................................................................. 3-14 Viewing Status ....................................................................................................... 3-15 Other Installation Considerations .......................................................................... 3-16 Fencing Considerations .......................................................................................... 3-16 Cluster Manager Java GUI..................................................................................... 3-17 Lab 3: Installing VCS ............................................................................................ 3-20 Lesson 4: VCS Operations Introduction ............................................................................................................. 4-2 Managing Applications in a Cluster Environment.................................................... 4-4 Key Considerations ................................................................................................... 4-4 VCS Management Tools ........................................................................................... 4-5 Service Group Operations....................................................................................... 4-6 Displaying Attributes and Status............................................................................... 4-7 Bringing Service Groups Online............................................................................... 4-9 Taking Service Groups Offline ............................................................................... 4-11 Switching Service Groups...................................................................................... 4-12 Freezing a Service Group....................................................................................... 4-13 Bringing Resources Online .................................................................................... 4-14 Taking Resources Offline ...................................................................................... 4-15 Clearing Resource Faults ....................................................................................... 4-16ii VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Using the VCS Simulator ...................................................................................... The Simulator Java Console ................................................................................... Creating a New Simulator Configuration ............................................................... Simulator Command-Line Interface ....................................................................... Using the Java GUI with the Simulator .................................................................. Lab 4: Using the VCS Simulator ...........................................................................

4-18 4-19 4-20 4-21 4-22 4-24

Lesson 5: Preparing Services for VCS Introduction ............................................................................................................. 5-2 Preparing Applications for VCS............................................................................... 5-4 Application Service Component Review.................................................................. 5-4 Configuration and Migration Procedure ................................................................... 5-5 One-Time Configuration Tasks ............................................................................... 5-6 Identifying Components............................................................................................ 5-6 Configuring Shared Storage...................................................................................... 5-7 Configuring the Network .......................................................................................... 5-8 Configuring the Application ................................................................................... 5-12 Testing the Application Service............................................................................. 5-13 Bringing Up Resources ........................................................................................... 5-14 Verifying Resources................................................................................................ 5-18 Testing the Integrated Components ........................................................................ 5-19 Stopping and Migrating an Application Service..................................................... 5-20 Stopping Application Components ......................................................................... 5-20 Manually Migrating an Application Service........................................................... 5-21 Validating the Design Worksheet .......................................................................... 5-22 Documenting Resource Attributes.......................................................................... 5-22 Checking Resource Attributes ................................................................................ 5-23 Documenting Resource Dependencies ................................................................... 5-24 Validating Service Group Attributes ...................................................................... 5-25 Lab 5: Preparing Application Services .................................................................. 5-27 Lesson 6: VCS Configuration Methods Introduction ............................................................................................................. 6-2 Overview of Configuration Methods ........................................................................ 6-4 Effects on the Cluster................................................................................................ 6-5 Controlling Access to VCS ...................................................................................... 6-6 Relating VCS and UNIX User Accounts.................................................................. 6-6 Simplifying VCS Administrative Access ................................................................. 6-7 User Accounts........................................................................................................... 6-8 Changing Privileges ................................................................................................ 6-10 VCS Access in Secure Mode .................................................................................. 6-11 Online Configuration ............................................................................................. 6-12 How VCS Changes the Online Cluster Configuration ........................................... 6-13 Opening the Cluster Configuration......................................................................... 6-14 Saving the Cluster Configuration............................................................................ 6-15 Closing the Cluster Configuration .......................................................................... 6-16 How VCS Protects the Cluster Configuration ........................................................ 6-17Table of ContentsCopyright 2005 VERITAS Software Corporation. All rights reserved.

iii

Offline Configuration ............................................................................................. Offline Configuration Examples ............................................................................ Starting and Stopping VCS ................................................................................... How VCS Starts Up by Default ............................................................................. VCS Startup with a .stale File ................................................................................ Forcing VCS to Start from a Wait State................................................................. Building the Configuration Using a Specific main.cf File..................................... Stopping VCS......................................................................................................... Lab 6: Starting and Stopping VCS ........................................................................

6-18 6-19 6-22 6-22 6-25 6-26 6-28 6-30 6-32

Lesson 7: Online Configuration of Service Groups Introduction ............................................................................................................. 7-2 Online Configuration Procedure.............................................................................. 7-4 Creating a Service Group .......................................................................................... 7-4 Adding a Service Group .......................................................................................... 7-5 Adding a Service Group Using the GUI ................................................................... 7-5 Adding a Service Group Using the CLI.................................................................... 7-6 Classroom Exercise: Creating a Service Group ........................................................ 7-7 Design Worksheet Example...................................................................................... 7-8 Adding Resources................................................................................................... 7-9 Online Resource Configuration Procedure ............................................................... 7-9 Adding Resources Using the GUI: NIC Example.................................................. 7-10 Adding an IP Resource........................................................................................... 7-12 Classroom Exercise: Creating Network Resources Using the GUI ....................... 7-13 Adding a Resource Using the CLI: DiskGroup Example ...................................... 7-16 Classroom Exercise: Creating Storage Resources using the CLI .......................... 7-20 The Process Resource ............................................................................................ 7-23 Classroom Exercise: Creating a Process Resource ................................................ 7-24 Solving Common Configuration Errors.................................................................. 7-26 Flushing a Service Group....................................................................................... 7-27 Disabling a Resource.............................................................................................. 7-28 Copying and Deleting a Resource.......................................................................... 7-29 Testing the Service Group .................................................................................... 7-30 Linking Resources.................................................................................................. 7-31 Resource Dependencies ......................................................................................... 7-32 Classroom Exercise: Linking Resources................................................................ 7-33 Design Worksheet Example................................................................................... 7-34 Setting the Critical Attribute .................................................................................. 7-35 Classroom Exercise: Testing the Service Group.................................................... 7-36 A Completed Process Service Group..................................................................... 7-37 Lab 7: Online Configuration of a Service Group ................................................... 7-41 Lesson 8: Offline Configuration of Service Groups Introduction ............................................................................................................. 8-2 Offline Configuration Procedures ............................................................................ 8-4 New Cluster............................................................................................................... 8-4 Example Configuration File ...................................................................................... 8-5 Existing Cluster......................................................................................................... 8-7iv VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

First System .............................................................................................................. 8-7 Using the Design Worksheet................................................................................. 8-10 Resource Dependencies .......................................................................................... 8-11 A Completed Configuration File ............................................................................ 8-12 Offline Configuration Tools.................................................................................... 8-14 Editing Configuration Files .................................................................................... 8-14 Using the VCS Simulator ....................................................................................... 8-15 Solving Offline Configuration Problems ................................................................ 8-16 Common Problems ................................................................................................. 8-16 All Systems in a Wait State .................................................................................... 8-17 Propagating an Old Configuration .......................................................................... 8-17 Recovering from an Old Configuration .................................................................. 8-18 Configuration File Backups .................................................................................... 8-19 Testing the Service Group .................................................................................... 8-20 Service Group Testing Procedure ........................................................................... 8-20 Lab 8: Offline Configuration of Service Groups..................................................... 8-22Lesson 9: Sharing Network Interfaces Introduction ............................................................................................................. 9-2 Sharing Network Interfaces..................................................................................... 9-4 Conceptual View....................................................................................................... 9-4 Alternate Network Configurations ........................................................................... 9-6 Using Proxy Resources ............................................................................................. 9-6 The Proxy Resource Type......................................................................................... 9-7 Using Parallel Service Groups ................................................................................ 9-8 Determining Service Group Status ........................................................................... 9-8 Phantom Resources................................................................................................... 9-9 The Phantom Resource Type .................................................................................. 9-10 Configuring a Parallel Service Group..................................................................... 9-11 Properties of Parallel Service Groups ..................................................................... 9-12 Localizing Resource Attributes.............................................................................. 9-13 Localizing a NIC Resource Attribute ..................................................................... 9-13 Lab 9: Creating a Parallel Service Group.............................................................. 9-15 Lesson 10: Configuring Notification Introduction ........................................................................................................... 10-2 Notification Overview ............................................................................................ 10-4 Message Queue ....................................................................................................... 10-4 Message Severity Levels......................................................................................... 10-5 Configuring Notification ......................................................................................... 10-6 The NotifierMngr Resource Type........................................................................... 10-8 Configuring the ResourceOwner Attribute........................................................... 10-10 Configuring the GroupOwner Attribute................................................................ 10-11 Configuring the SNMP Console ........................................................................... 10-12 Using Triggers for Notification............................................................................. 10-13 Lab 10: Configuring Notification .......................................................................... 10-15

Table of ContentsCopyright 2005 VERITAS Software Corporation. All rights reserved.

v

Lesson 11: Configuring VCS Response to Resource Faults Introduction ........................................................................................................... 11-2 VCS Response to Resource Faults ...................................................................... 11-4 Failover Decisions and Critical Resources ............................................................. 11-4 How VCS Responds to Resource Faults by Default............................................... 11-5 The Impact of Service Group Attributes on Failover.............................................. 11-7 Practice: How VCS Responds to a Fault............................................................... 11-10 Determining Failover Duration ............................................................................. 11-11 Failover Duration on a Resource Fault ................................................................. 11-11 Adjusting Monitoring............................................................................................ 11-13 Adjusting Timeout Values .................................................................................... 11-14 Controlling Fault Behavior................................................................................... 11-15 Type Attributes Related to Resource Faults.......................................................... 11-15 Modifying Resource Type Attributes.................................................................... 11-18 Overriding Resource Type Attributes ................................................................... 11-19 Recovering from Resource Faults....................................................................... 11-20 Recovering a Resource from a FAULTED State .................................................. 11-20 Recovering a Resource from an ADMIN_WAIT State ........................................ 11-22 Fault Notification and Event Handling ................................................................. 11-24 Fault Notification .................................................................................................. 11-24 Extended Event Handling Using Triggers ............................................................ 11-25 The Role of Triggers in Resource Faults .............................................................. 11-25 Lab 11: Configuring Resource Fault Behavior .................................................... 11-28 Lesson 12: Cluster Communications Introduction ........................................................................................................... 12-2 VCS Communications Review .............................................................................. 12-4 VCS On-Node Communications............................................................................ 12-4 VCS Inter-Node Communications ......................................................................... 12-5 VCS Communications Stack Summary ................................................................. 12-5 Cluster Interconnect Specifications........................................................................ 12-6 Cluster Membership .............................................................................................. 12-7 GAB Status and Membership Notation.................................................................. 12-7 Viewing LLT Link Status ...................................................................................... 12-9 The lltstat Command .............................................................................................. 12-9 Cluster Interconnect Configuration...................................................................... 12-10 Configuration Overview....................................................................................... 12-10 LLT Configuration Files ....................................................................................... 12-11 The sysname File.................................................................................................. 12-15 The GAB Configuration File ............................................................................... 12-16 Joining the Cluster Membership.......................................................................... 12-17 Seeding During Startup ........................................................................................ 12-17 LLT, GAB, and VCS Startup Files ...................................................................... 12-18 Manual Seeding.................................................................................................... 12-19 Probing Resources During Startup....................................................................... 12-20

vi

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Lesson 13: System and Communication Faults Introduction ........................................................................................................... 13-2 Ensuring Data Integrity.......................................................................................... 13-4 VCS Response to System Failure ........................................................................... 13-5 Failover Duration on a System Fault ...................................................................... 13-6 Cluster Interconnect Failures ................................................................................ 13-7 Single LLT Link Failure ......................................................................................... 13-7 Jeopardy Membership............................................................................................. 13-8 Recovery Behavior................................................................................................ 13-11 Modifying the Default Recovery Behavior........................................................... 13-12 Potential Split Brain Condition............................................................................. 13-13 Interconnect Failures with a Low-Priority Public Link ........................................ 13-14 Interconnect Failures with Service Group Heartbeats .......................................... 13-16 Preexisting Network Partition............................................................................... 13-17 Changing the Interconnect Configuration............................................................ 13-18 Modifying the Cluster Interconnect Configuration............................................... 13-19 Adding LLT Links ................................................................................................ 13-20 Lab 13: Testing Communication Failures............................................................ 13-22 Optional Lab: Configuring the InJeopardy Trigger .............................................. 13-23 Lesson 14: I/O Fencing Introduction ........................................................................................................... 14-2 Data Protection Requirements .............................................................................. 14-4 Understanding the Data Protection Problem........................................................... 14-4 Split Brain Condition .............................................................................................. 14-7 Data Protection Requirements ................................................................................ 14-8 I/O Fencing Concepts and Components ............................................................... 14-9 I/O Fencing Components ...................................................................................... 14-10 I/O Fencing Operations ....................................................................................... 14-12 Registration with Coordinator Disks .................................................................... 14-12 Service Group Startup........................................................................................... 14-13 System Failure ...................................................................................................... 14-14 Interconnect Failure .............................................................................................. 14-15 I/O Fencing Behavior............................................................................................ 14-19 I/O Fencing with Multiple Nodes ......................................................................... 14-20 I/O Fencing Implementation ................................................................................ 14-21 Communication Stack........................................................................................... 14-21 Fencing Driver ...................................................................................................... 14-23 Fencing Implementation in Volume Manager ...................................................... 14-24 Fencing Implementation in VCS .......................................................................... 14-25 Coordinator Disk Implementation ........................................................................ 14-26 Configuring I/O Fencing ...................................................................................... 14-27 Fencing Effects on Disk Groups ........................................................................... 14-31 Stopping and Recovering Fenced Systems ........................................................ 14-32 Stopping Systems Running I/O Fencing............................................................... 14-32 Recovery with Running Systems .......................................................................... 14-33 Recovering from a Partition-In-Time ................................................................... 14-34 Lab 14: Configuring I/O Fencing ......................................................................... 14-36Table of ContentsCopyright 2005 VERITAS Software Corporation. All rights reserved.

vii

Lesson 15: Troubleshooting Introduction ........................................................................................................... 15-2 Monitoring VCS ..................................................................................................... 15-4 VCS Logs ............................................................................................................... 15-5 UMI-Based Support ............................................................................................... 15-7 Using the VERITAS Support Web Site ................................................................. 15-8 Troubleshooting Guide.......................................................................................... 15-9 Procedure Overview............................................................................................... 15-9 Using the Troubleshooting Job Aid ..................................................................... 15-10 Cluster Communication Problems....................................................................... 15-11 Checking GAB ...................................................................................................... 15-11 Checking LLT ...................................................................................................... 15-12 Duplicate Node IDs.............................................................................................. 15-13 Problems with LLT .............................................................................................. 15-14 VCS Engine Problems ........................................................................................ 15-15 Startup Problems .................................................................................................. 15-15 STALE_ADMIN_WAIT ..................................................................................... 15-16 ADMIN_WAIT.................................................................................................... 15-17 Service Group and Resource Problems.............................................................. 15-18 Service Groups Problems ..................................................................................... 15-18 Resource Problems............................................................................................... 15-27 Agent Problems and Resource Type Problems .................................................... 15-30 Archiving VCS-Related Files............................................................................... 15-32 Making Backups................................................................................................... 15-32 The hasnap Utility ................................................................................................ 15-33 Lab 15: Troubleshooting ..................................................................................... 15-35 Index

viii

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Course Introduction

VERITAS Cluster Server CurriculumLearning PathVERITAS Cluster Server, Fundamentals

VERITAS Cluster Server, Implementing Local Clusters

High Availability Design Using VERITAS Cluster Server

VERITAS Cluster Server Agent Development

Disaster Recovery Using VVR 4.0 and Global Cluster Option

VERITAS Cluster Server CurriculumThe VERITAS Cluster Server curriculum is a series of courses that are designed to provide a full range of expertise with VERITAS Cluster Server (VCS) high availability solutionsfrom design through disaster recovery. VERITAS Cluster Server, Fundamentals This course covers installation and configuration of common VCS configurations, focusing on two-node clusters running application and database services. VERITAS Cluster Server, Implementing Local Clusters This course focuses on multinode VCS clusters and advanced topics related to more complex cluster configurations. VERITAS Cluster Server Agent Development This course enables students to create and customize VCS agents. High Availability Design Using VERITAS Cluster Server This course enables participants to translate high availability requirements into a VCS design that can be deployed using VERITAS Cluster Server. Disaster Recovery Using VVR and Global Cluster Option This course covers cluster configurations across remote sites, including replicated data clusters (RDCs) and the Global Cluster Option for wide-area clusters.

Intro2

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Course PrerequisitesTo successfully complete this course, you should have the following expertise: UNIX operating system and network administration System and network device configuration VERITAS Volume Manager configuration

Course PrerequisitesThis course assumes that you have an administrator-level understanding of one or more UNIX platforms. You should understand how to configure systems, storage devices, and networking in multiserver environments.

Course IntroductionCopyright 2005 VERITAS Software Corporation. All rights reserved.

Intro3

Course ObjectivesAfter completing the VERITAS Cluster Server for UNIX, Fundamentals course, you will be able to:Manage services in an existing VCS environment. Install and configure a cluster according to a specified sample design. Use a design worksheet to put applications under VCS control. Customize cluster behavior to implement specified requirements. Respond to resource, system, and communication failures.

Course ObjectivesIn the VERITAS Cluster Server for UNIX, Fundamentals course, you are given a high availability design to implement in the classroom environment using VERITAS Cluster Server. The course simulates the job tasks you perform to configure a cluster, starting with preparing the site and application services that will be made highly available. Lessons build upon each other, exhibiting the processes and recommended best practices you can apply to implementing any design cluster. The core material focuses on the most common cluster implementations. Other cluster designs emphasizing additional VCS capabilities are provided to illustrate the power and flexibility of VERITAS Cluster Server.

Intro4

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Certification Exam ObjectivesThe summary of VERITAS Certified High Availability Implementation Exam objectives covered in this lesson are:Verify and adjust the preinstallation environment. Install VCS. Configure the high availability environment. Perform advanced cluster configuration. Validate the implementation and make adjustments for high availability. Document and maintain the high availability solution. For the complete set of exam objectives, follow the Certification link from www.veritas.com/education.

Certification Exam Objectives The high-level objectives for the Implementation of HA Solutions certification exam are shown in the slide. Note: Not all objectives are covered by the VERITAS Cluster Server for UNIX, Fundamentals course. The VERITAS Cluster Server for UNIX, Implementing Local Clusters course is also required to provide complete training on all certification exam objectives. Detailed objectives are provided on the VERITAS Web site, along with sample exams.

Course IntroductionCopyright 2005 VERITAS Software Corporation. All rights reserved.

Intro5

Cluster Design InputA VCS cluster design includes: Cluster information, including cluster communications System information Application service information, including detailed information about required software and hardware resources User account information Notification requirements Customization requirements This course provides cluster design information needed to prepare, install, and configure a cluster.

Cluster Design Input The staff responsible for the deployment of a VCS cluster may not necessarily be the same people who developed the cluster design. To ensure a successful deployment process, define the information that needs to be passed to the deployment team from a VCS design. A VCS design includes the following information: Cluster information, including cluster communications The cluster name and ID number Ethernet ports that will be used for the cluster interconnect Any other VCS communication channels required Member system names High availability services information The service name and type Systems where the service can start up and run Startup policies Failover policies Interactions with other services Resources required by the services, and their relationships User information and privilege levels Notification requirements: SNMP/SMTP notification and triggers Customization requirements: Enterprise and custom agents; cluster, service group, system, resource, and agent attributes that are not VCS default valuesIntro6 VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Sample Design InputWeb Server Web Service Start up on system S1. Restart Web server process 3 times before faulting it. Fail over to S2 if any resource faults. Notify [email protected] if any resource faults. IP IP Address 192.168.3.132 192.168.3.132 Mount /web

NIC eri0

Volume WebVol

Disk Group WebDG Components required to Components required to provide the Web service. provide the Web service.

Sample Design Input A VCS design may come in many different formats with varying levels of detail. In some cases, you may have only the information about the application services that need to be clustered and the desired operational behavior in the cluster. For example, you may be told that the application service uses multiple network ports and requires local failover capability among those ports before it fails over to another system. In other cases, you may have the information you need as a set of service dependency diagrams with notes on various aspects of the desired cluster operations. If you receive the design information that does not detail the resource information, develop a detailed design worksheet before starting the deployment, as shown in the following Cluster Design Worksheet. Using a design worksheet to document all aspects of your high availability environment helps ensure that you are well-prepared to start implementing your cluster design. You are provided with a design worksheet showing sample values to use throughout this course as a tool for implementing the cluster design in the lab exercises. You can use a similar format to collect all the information you need before starting deployment at your site.

Course IntroductionCopyright 2005 VERITAS Software Corporation. All rights reserved.

Intro7

Sample Design WorksheetService Group Definition Group Required Attributes FailOverPolicy SystemList Optional Attributes AutoStartList S1 Priority S1=0 S2=1 Sample Value WebSG

Resource Definition Service Group Resource Name Resource Type Required Attributes Device Address Optional Attributes Netmask Critical? Enabled? eri0

Sample Value WebSG WebIP IP

192.168.3.132 255.255.255.0 Yes (1) Yes (1)

Example: main.cfgroup WebSG ( SystemList = { S1 = 0, S2 = 1 } AutoStartList = { S1 } ) IP WebIP ( Device = eri0 Address = 192.168.3.132 Netmask = 255.255.255.0 )

Intro8

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Lab Design for the Course

vcsx

your_nameSG1 your_nameSG2 NetworkSG

their_nameSG1 their_nameSG2

trainxx trainxx

Lab Design for the CourseThe diagram shows a conceptual view of the cluster design used as an example throughout this course and implemented in hands-on lab exercises. Each aspect of the cluster configuration is described in greater detail, where applicable, in course lessons. The cluster consists of: Two nodes Five high availability services; four failover service groups and one parallel network service group Fibre connections to SAN shared storage from each node through a switch Two private Ethernet interfaces for the cluster interconnect network Ethernet connections to the public network Additional complexity is added to the design to illustrate certain aspects of cluster configuration in later lessons. The design diagram shows a conceptual view of the cluster design described in the worksheet.

Course IntroductionCopyright 2005 VERITAS Software Corporation. All rights reserved.

Intro9

Lab Naming ConventionsService Group Definition Group Sample Value nameSG

Resource Definition Service Group Name Resource Name Resource Type Required Attributes ResAttribute1 ResAttribute2 ...

Sample Value nameSG nameIP IP value value

Required Attributes SGAttribute1 value SGAttribute2 value Optional Attributes SGAttribute3 value

Substitute your name, or a nickname, wherever tables or instructions indicate name in labs. Following this convention simplifies labs and helps prevent naming conflicts with your lab partner.

Lab Naming Conventions To simplify the labs, use your name or a nickname as a prefix for cluster objects created in the lab exercises. This includes Volume Manager objects, such as disk groups and volumes, as well as VCS service groups and resources. Following this convention helps distinguish your objects when multiple students are working on systems in the same cluster and helps ensure that each student uses unique names. The lab exercises represent your name with the word name in italics. You substitute the name you select whenever you see the name placeholder in a lab step.

Intro10

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Classroom Values for LabsNetwork Definition Subnet DNS Address Software Location VCS installation dir Lab files directory ... Use the classroom values provided by your instructor at the beginning of each lab exercise. Lab tables are provided to at the beginning of the lab to record these values. Alternately, your instructor may hand out printed tables. If sample values are provided as guidelines, substitute your classroom-specific values provided by your instructor. Your Value Your Value

Classroom Values for Labs Your instructor will provide the classroom-specific information you need to perform the lab exercises. You can record these values in your lab books using the tables provided, or your instructor may provide separate handouts showing the classroom values for your location. In some lab exercises, sample values may be shown in tables as a guide to the types of values you must specify. Substitute the values provided by your instructor to ensure that your configuration is appropriate for your classroom. If you are not sure of the configuration for your classroom, ask your instructor.

Course IntroductionCopyright 2005 VERITAS Software Corporation. All rights reserved.

Intro11

Course OverviewLesson 1: VCS Building Blocks Lesson 2: Preparing a Site for VCS Lesson 3: Installing VCS Lesson 4: VCS Operations Lesson 5: Preparing Services for VCS Lesson 6: VCS Configuration Methods Lesson 7: Online Configuration of Service Groups Lesson 8: Offline Configuration of Service Groups Lesson 9: Sharing Network Interfaces Lesson 10: Configuring Notification Lesson 11: Configuring VCS Response to Faults Lesson 12: Cluster Communications Lesson 13: System and Communication Faults Lesson 14: I/O Fencing Lesson 15: Troubleshooting

Course OverviewThis training provides comprehensive instruction on the installation and initial configuration of VERITAS Cluster Server (VCS). The course covers principles and methods that enable you to prepare, create, and test VCS service groups and resources using tools that best suit your needs and your high availability environment. You learn to configure and test failover and notification behavior, cluster additional applications, and further customize your cluster according to specified design criteria.

Intro12

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Course ResourcesParticipant Guide Lessons Appendix A: Lab Synopses Appendix B: Lab Details Appendix C: Lab Solutions Appendix D: Job Aids Appendix E: Design Worksheet Template

Supplements VCS Simulator: van.veritas.com Troubleshooting Job Aid VCS Command-Line Reference card Tips & Tricks: www.veritas.com/education

Course Resources This course uses this participant guide containing lessons presented by your instructor and lab exercises to enable you to practice your new skills. Lab materials are provided in three forms, with increasing levels of detail to suit a range of student expertise levels. Appendix A: Lab Synopses has high-level task descriptions and design worksheets. Appendix B: Lab Details includes the lab procedures and detailed steps. Appendix C: Lab Solutions includes the lab procedures and steps with the corresponding command lines required to perform each step. Appendix D: Job Aids provides supplementary material that can be used as on-the-job guides for performing some common VCS operations. Appendix E: Design Worksheet Template provides a blank design worksheet. Additional supplements may be used in the classroom or provided to you by your instructor.

Course IntroductionCopyright 2005 VERITAS Software Corporation. All rights reserved.

Intro13

Course PlatformsThis course covers the following versions of VCS: VCS 4.1, 4.0, and 3.5 for Solaris VCS 4.0 for Linux VCS 4.0 for AIX VCS 3.5 for HP-UX

Course Platforms This course material applies to the VCS platforms shown in the slide. Indicators are provided in slides and text where there are differences in platforms. Refer to the VERITAS Cluster Server user documentation for your platform and version to determine which features are supported in your environment.

Intro14

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

LegendThese are common symbols used in this course.Symbol Description Server, node, or cluster system (terms used interchangeably)

Server or cluster system that has faulted

Storage

Application service

Cluster interconnect

Wide area network (WAN) cloud

Course IntroductionCopyright 2005 VERITAS Software Corporation. All rights reserved.

Intro15

Symbol

Description Client systems on a network

VCS service group

Offline service group

VCS resource

Intro16

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Lesson 1 VCS Building Blocks

Lesson IntroductionLesson 1: VCS Building Blocks Lesson 2: Preparing a Site for VCS Lesson 3: Installing VCS Lesson 4: VCS Operations Lesson 5: Preparing Services for VCS Lesson 6: VCS Configuration Methods Lesson 7: Online Configuration of Service Groups Lesson 8: Offline Configuration of Service Groups Lesson 9: Sharing Network Interfaces Lesson 10: Configuring Notification Lesson 11: Configuring VCS Response to Faults Lesson 12: Cluster Communications Lesson 13: System and Communication Faults Lesson 14: I/O Fencing Lesson 15: Troubleshooting

IntroductionOverview This lesson introduces basic VERITAS Cluster Server terminology and concepts, and provides an overview of the VCS architecture and supporting communication mechanisms. Importance The terms and concepts covered in this lesson provide a foundation for learning the tasks you need to perform to deploy the VERITAS Cluster Server product, both in the classroom and in real-world applications.

12

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Lesson Topics and Objectives1

TopicCluster Terminology Cluster Communication Maintaining the Cluster Configuration VCS Architecture Supported Failover Configurations

After completing this lesson, you will be able to:Define clustering terminology. Describe cluster communication mechanisms. Describe how the cluster configuration is maintained. Describe the VCS architecture. Describe the failover configurations supported by VCS.

Outline of Topics Cluster Terminology Cluster Communication Maintaining the Cluster Configuration VCS Architecture Supported Failover Configurations

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

13

A Nonclustered Computing Environment

Cluster TerminologyA Nonclustered Computing Environment An example of a traditional, nonclustered computing environment is a single server running an application that provides public network links for client access and data stored on local or SAN storage. If a single component fails, application processing and the business service that relies on the application are interrupted or degraded until the failed component is repaired or replaced.

14

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Definition of a ClusterA cluster is a collection of multiple independent systems working together under a management framework for increased service availability.1

Application Node Storage Cluster Interconnect

Definition of a Cluster A clustered environment includes multiple components configured such that if one component fails, its role can be taken over by another component to minimize or avoid service interruption. This allows clients to have high availability to their data and processing, which is not possible in nonclustered environments. The term cluster, simply defined, refers to multiple independent systems or domains connected into a management framework for increased availability. Clusters have the following components: Up to 32 systemssometimes referred to as nodes or servers Each system runs its own operating system. A cluster interconnect, which allows for cluster communications A public network, connecting each system in the cluster to a LAN for client access Shared storage (optional), accessible by each system in the cluster that needs to run the application

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

15

Definition of VERITAS Cluster Server and FailoverVCS detects faults and performs automated failover.

Application Node Failed Node Storage Cluster Interconnect

Definition of VERITAS Cluster Server and Failover In a highly available environment, HA software must perform a series of tasks in order for clients to access a service on another server in the event a failure occurs. The software must: Ensure that data stored on the disk is available to the new server, if shared storage is configured (Storage). Move the IP address of the old server to the new server (Network). Start up the application on the new server (Application). VERITAS Cluster Server (VCS) is a software solution for automating these tasks. VCS monitors and controls applications running in the cluster and, if a failure is detected, automates application restart. When another server is required to restart the application, VCS performs a failoverthis is the process of stopping the application on one system and starting them on another system.

16

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Definition of an Application ServiceAn application service is a collection of all the hardware and software components required to provide a service. If the service must be migrated to another system, all components need to be moved in an orderly fashion. Examples include Web servers, databases, and applications.1

Definition of an Application Service An application service is a collection of hardware and software components required to provide a service, such as a Web site an end-user may access by connecting into a particular network IP address or host name. Each application service typically requires components of the following three types: Application binaries (executables) Network Storage If an application service needs to be switched to another system, all of the components of the application service must migrate together to re-create the service on another system. Note: These are the same components that the administrator must manually move from a failed server to a working server to keep the service available to clients in a nonclustered environment. Application service examples include: A Web service consisting of a Web server program, IP addresses, associated network interfaces used to allow access into the Web site, a file system containing Web data files, and a volume and disk group containing the file system. A database service may consist of one or more IP addresses, relational database management system (RDBMS) software, a file system containing data files, a volume and disk group on which the file system resides, and a NIC for network access.Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

17

Definition of a Service Group

A service group is a virtual container that enables VCS to manage an application service as a unit. All components required to provide the service, and the relationships between these components, are defined within the service group. A service groups has attributes that define its behavior, such as where it can start and run.

Definition of Service Group A service group is a virtual container that enables VCS to manage an application service as a unit. The service group contains all the hardware and software components required to run the service, which enables VCS to coordinate failover of the application service resources in the event of failure or at the administrators request. A service group is defined by these attributes: The cluster-wide unique name of the group The list of the resources in the service group, usually determined by which resources are needed to run a specific application service The dependency relationships between the resources The list of cluster systems on which the group is allowed to run The list of cluster systems on which you want the group to start automatically

18

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Service Group TypesFailover: The service group can be online on only one cluster system at a time. VCS migrates the service group at the administrators request and in response to faults.1

Parallel The service group can be online on multiple cluster systems simultaneously. An example is Oracle Real Application Cluster (RAC).

HybridThis is a special-purpose type of service group used to manage service groups in replicated data clusters (RDCs), which are based on VERITAS Volume Replicator.

Service Group Types Service groups can be one of three types: Failover This service group runs on one system at a time in the cluster. Most application services, such as database and NFS servers, use this type of group. Parallel This service group runs simultaneously on more than one system in the cluster. This type of service group requires an application that can be started on more than one system at a time without threat of data corruption. Hybrid (4.x) A hybrid service group is a combination of a failover service group and a parallel service group used in VCS 4.x replicated data clusters (RDCs), which are based on VERITAS Volume Replicator. This service group behaves as a failover group within a defined set of systems, and a parallel service group within a different set of systems. RDC configurations are described in the VERITAS Disaster Recovery Using VVR and Global Cluster Option course.

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

19

Definition of a ResourceResources are VCS objects that correspond to the hardware or software components of an application service.Each resource must have a unique name throughout the cluster. Choosing names that reflect the service group name makes it easy to identify all resources in that group, for example, WebIP in the WebSG group. Resources are always contained within service groups. Resource categories include: Persistent None (NIC) On-only (NFS) Nonpersistent On-off (Mount)

Definition of a Resource Resources are VCS objects that correspond to hardware or software components, such as the application, the networking components, and the storage components. VCS controls resources through these actions: Bringing a resource online (starting) Taking a resource offline (stopping) Monitoring a resource (probing) Resource Categories Persistent None VCS can only monitor persistent resourcesthey cannot be brought online or taken offline. The most common example of a persistent resource is a network interface card (NIC), because it must be present but cannot be stopped. FileNone and ElifNone are other examples. On-only VCS brings the resource online if required, but does not stop it if the associated service group is taken offline. NFS daemons are examples of on-only resources. FileOnOnly is another on-only example. Nonpersistent, also known as on-off Most resources fall into this category, meaning that VCS brings them online and takes them offline as required. Examples are Mount, IP, and Process. FileOnOff is an example of a test version of this resource.110 VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Resource DependenciesResources in a service group have a defined dependency relationship, which determines the online and offline order of the resource.A parent resource depends on a child resource. There is no limit to the number of parent and child resources. Persistent resources, such as NIC, cannot be parent resources. Dependencies cannot be cyclical.1Child

Parent

Parent/child

Resource Dependencies Resources depend on other resources because of application or operating system requirements. Dependencies are defined to configure VCS for these requirements. Dependency Rules These rules apply to resource dependencies: A parent resource depends on a child resource. In the diagram, the Mount resource (parent) depends on the Volume resource (child). This dependency illustrates the operating system requirement that a file system cannot be mounted without the Volume resource being available. Dependencies are homogenous. Resources can only depend on other resources. No cyclical dependencies are allowed. There must be a clearly defined starting point.

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

111

Resource AttributesResource attributes define an individual resource. The attribute values are used by VCS to manage the resource. Resources can have required and optional attributes, as specified by the resource type definition.WebMount WebMount resource resource

Solaris Solaris

mount F vxfs /dev/vx/dsk/WebDG/WebVol /Web mount F vxfs /dev/vx/dsk/WebDG/WebVol /Web

Resource Attributes Resources attributes define the specific characteristics on individual resources. As shown in the slide, the resource attribute values for the sample resource of type Mount correspond to the UNIX command line to mount a specific file system. VCS uses the attribute values to run the appropriate command or system call to perform an operation on the resource. Each resource has a set of required attributes that must be defined in order to enable VCS to manage the resource. For example, the Mount resource on Solaris has four required attributes that must be defined for each resource of type Mount: The directory of the mount point (MountPoint) The device for the mount point (BlockDevice) The type of file system (FSType) The options for the fsck command (FsckOpt) The first three attributes are the values used to build the UNIX mount command shown in the slide. The FsckOpt attribute is used if the mount command fails. In this case, VCS runs fsck with the specified options (-y) and attempts to mount the file system again. Some resources also have additional optional attributes you can define to control how VCS manages a resource. In the Mount resource example, MountOpt is an optional attribute you can use to define options to the UNIX mount command. For example, if this is a read-only file system, you can specify -ro as the MountOpt value.112 VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Resource TypesResources are classified by type. The resource type specifies the attributes needed to define a resource of that type. For example, a Mount resource has different properties than an IP resource.Solaris Solaris mount [-F FSType] [options] block_device mount_point mount [-F FSType] [options] block_device mount_point

Resource Types and Type Attributes Resources are classified by resource type. For example, disk groups, network interface cards (NICs), IP addresses, mount points, and databases are distinct types of resources. VCS provides a set of predefined resource typessome bundled, some add-onsin addition to the ability to create new resource types. Individual resources are instances of a resource type. For example, you may have several IP addresses under VCS control. Each of these IP addresses individually is a single resource of resource type IP. A resource type can be thought of as a template that defines the characteristics or attributes needed to define an individual resource (instance) of that type. You can view the relationship between resources and resource types by comparing the mount command for a resource on the previous slide with the mount syntax on this slide. The resource type defines the syntax for the mount command. The resource attributes fill in the values to form an actual command line.

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

113

1

Agents: How VCS Controls ResourcesEach resource type has a corresponding agent process that manages all resources of that type.Agents have one or more entry points that perform a set of actions on resources. Each system runs one agent for each active resource type.10.1.2.3 eri0 /web /log WebDG WebVol logVol online offline monitor IP NIC Mount Disk Group Volume clean

Agents: How VCS Controls Resources Agents are processes that control resources. Each resource type has a corresponding agent that manages all resources of that resource type. Each cluster system runs only one agent process for each active resource type, no matter how many individual resources of that type are in use. Agents control resources using a defined set of actions, also called entry points. The four entry points common to most agents are: Online: Resource startup Offline: Resource shutdown Monitor: Probing the resource to retrieve status Clean: Killing the resource or cleaning up as necessary when a resource fails to be taken offline gracefully The difference between offline and clean is that offline is an orderly termination and clean is a forced termination. In UNIX, this can be thought of as the difference between exiting an application and sending the kill -9 command to the process. Each resource type needs a different way to be controlled. To accomplish this, each agent has a set of predefined entry points that specify how to perform each of the four actions. For example, the startup entry point of the Mount agent mounts a block device on a directory, whereas the startup entry point of the IP agent uses the ifconfig command to set the IP address on a unique IP alias on the network interface. VCS provides both predefined agents and the ability to create custom agents.114 VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Using the VERITAS Cluster Server Bundled Agents Reference Guide1The VERITAS Cluster Server The VERITAS Cluster Server Bundled Agents Reference Bundled Agents Reference Guide defines all VCS resource Guide defines all VCS resource types for all bundled agents. types for all bundled agents. See http://support.veritas.com See http://support.veritas.com for product documentation. for product documentation. Solaris AIX HP-UX Linux

Using the VERITAS Cluster Server Bundled Agents Reference Guide The VERITAS Cluster Server Bundled Agents Reference Guide describes the agents that are provided with VCS and defines the required and optional attributes for each associated resource type. Excerpts of the definitions for the NIC, Mount, and Process resource types are included in the Job Aids appendix. VERITAS also provides a set of agents that are purchased separately from VCS, known as enterprise agents. Some examples of enterprise agents are: Oracle NetBackup Informix iPlanet Select the Agents and Options link on the VERITAS Cluster Server page at www.veritas.com for a complete list of agents available for VCS. To obtain PDF versions of product documentation for VCS and agents, see the Support Web site at http://support.veritas.com.

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

115

Cluster CommunicationA cluster interconnect provides a communication channel between cluster nodes. The cluster interconnect serves to:Determine which systems are members of the cluster using a heartbeat mechanism. Maintain a single view of the status of the cluster configuration on all systems in the cluster membership.

Cluster CommunicationVCS requires a cluster communication channel between systems in a cluster to serve as the cluster interconnect. This communication channel is also sometimes referred to as the private network because it is often implemented using a dedicated Ethernet network. VERITAS recommends that you use a minimum of two dedicated communication channels with separate infrastructuresfor example, multiple NICs and separate network hubsto implement a highly available cluster interconnect. Although recommended, this configuration is not required. The cluster interconnect has two primary purposes: Determine cluster membership: Membership in a cluster is determined by systems sending and receiving heartbeats (signals) on the cluster interconnect. This enables VCS to determine which systems are active members of the cluster and which systems are joining or leaving the cluster. In order to take corrective action on node failure, surviving members must agree when a node has departed. This membership needs to be accurate and coordinated among active membersnodes can be rebooted, powered off, faulted, and added to the cluster at any time. Maintain a distributed configuration: Cluster configuration and status information for every resource and service group in the cluster is distributed dynamically to all systems in the cluster. Cluster communication is handled by the Group Membership Services/Atomic Broadcast (GAB) mechanism and the Low Latency Transport (LLT) protocol, as described in the next sections.116 VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Low-Latency Transport (LLT)LLT:Is responsible for sending heartbeat messages Transports cluster communication traffic to every active system Balances traffic load across multiple network links Maintains the communication link state Is a nonroutable protocol Runs on an Ethernet network1

LLT

LLT

Low-Latency Transport VERITAS uses a high-performance, low-latency protocol for cluster communications. LLT is designed for the high-bandwidth and low-latency needs of not only VERITAS Cluster Server, but also VERITAS Cluster File System, in addition to Oracle Cache Fusion traffic in Oracle RAC configurations. LLT runs directly on top of the Data Link Provider Interface (DLPI) layer over Ethernet and has several major functions: Sending and receiving heartbeats over network links Monitoring and transporting network traffic over multiple network links to every active system Balancing cluster communication load over multiple links Maintaining the state of communication Providing a nonroutable transport mechanism for cluster communications

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

117

Group Membership Services/Atomic Broadcast (GAB)GAB: Performs two functions: Manages cluster membership; referred to as GAB membership Sends and receives atomic broadcasts of configuration information

GAB

GAB LLT

LLT

Is a proprietary broadcast protocol Uses LLT as its transport mechanism

Group Membership Services/Atomic Broadcast (GAB) GAB provides the following: Group Membership Services: GAB maintains the overall cluster membership by way of its Group Membership Services function. Cluster membership is determined by tracking the heartbeat messages sent and received by LLT on all systems in the cluster over the cluster interconnect. Heartbeats are the mechanism VCS uses to determine whether a system is an active member of the cluster, joining the cluster, or leaving the cluster. If a system stops sending heartbeats, GAB determines that the system has departed the cluster. Atomic Broadcast: Cluster configuration and status information are distributed dynamically to all systems in the cluster using GABs Atomic Broadcast feature. Atomic Broadcast ensures all active systems receive all messages for every resource and service group in the cluster.

118

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

The Fencing DriverFencing: Monitors GAB to detect cluster membership changes Ensures a single view of cluster membership Prevents multiple nodes from accessing the same Volume Manager 4.x shared storage devices1

RebootFence

Fence GAB LLT

GAB LLT

The Fencing Driver The fencing driver prevents multiple systems from accessing the same Volume Manager-controlled shared storage devices in the event that the cluster interconnect is severed. In the example of a two-node cluster displayed in the diagram, if the cluster interconnect fails, each system stops receiving heartbeats from the other system. GAB on each system determines that the other system has failed and passes the cluster membership change to the fencing module. The fencing modules on both systems contend for control of the disks according to an internal algorithm. The losing system is forced to panic and reboot. The winning system is now the only member of the cluster, and it fences off the shared data disks so that only systems that are still part of the cluster membership (only one system in this example) can access the shared storage. The winning system takes corrective action as specified within the cluster configuration, such as bringing service groups online that were previously running on the losing system.

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

119

The High Availability Daemon (HAD)The VCS engine, the high availability daemon:HADhashadow

Fence GAB LLT

Runs on each system in the cluster Maintains configuration and state information for all cluster resources Manages all agents

The hashadow daemon monitors HAD.

The High Availability Daemon The VCS engine, also referred to as the high availability daemon (had), is the primary VCS process running on each cluster system. HAD tracks all changes in cluster configuration and resource status by communicating with GAB. HAD manages all application services (by way of agents) whether the cluster has one or many systems. Building on the knowledge that the agents manage individual resources, you can think of HAD as the manager of the agents. HAD uses the agents to monitor the status of all resources on all nodes. This modularity between had and the agents allows for efficiency of roles: HAD does not need to know how to start up Oracle or any other applications that can come under VCS control. Similarly, the agents do not need to make cluster-wide decisions. This modularity allows a new application to come under VCS control simply by adding a new agentno changes to the VCS engine are required. On each active cluster system, HAD updates all the other cluster systems of changes to the configuration or status. In order to ensure that the had daemon is highly available, a companion daemon, hashadow, monitors had and if had fails, hashadow attempts to restart it. Likewise, had restarts hashadow if hashadow stops.

120

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Comparing VCS Communication Protocols and TCP/IP1HADhashadow

User Processes

iPlanet

GAB LLT NIC

TCP Kernel Processes IP Hardware

NIC

Comparing VCS Communication Protocols and TCP/IP To illustrate the suitability and use of GAB and LLT for VCS communications, compare GAB running over LLT with TCP/IP, the standard public network protocols. GAB Versus TCP GAB is a multipoint-to-multipoint broadcast protocol; all systems in the cluster send and receive messages simultaneously. TCP is a point-to-point protocol. GAB Versus UDP GAB also differs from UDP, another broadcast protocol. UDP is a fire-and-forget protocolit merely sends the packet and assumes it is received. GAB, however, checks and guarantees delivery of transmitted packets, because it requires broadcasts to all nodes including the originator. LLT Versus IP LLT is driven by GAB, has specific targets in its domain and assumes constant connection between servers, known as a connection-oriented protocol. IP is a connectionless protocol it assumes that packets can take different paths to reach the same destination.

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

121

Maintaining the Cluster ConfigurationHAD maintains a replica of the cluster configuration in memory on each system. Changes to the configuration are broadcast to HAD on all systems simultaneously by way of GAB using LLT. The configuration is preserved on disk in the main.cf file.

HAD main.cfhashadow

HADhashadow

Maintaining the Cluster ConfigurationHAD maintains configuration and state information for all cluster resources in memory on each cluster system. Cluster state refers to tracking the status of all resources and service groups in the cluster. When any change to the cluster configuration occurs, such as the addition of a resource to a service group, HAD on the initiating system sends a message to HAD on each member of the cluster by way of GAB atomic broadcast, to ensure that each system has an identical view of the cluster. Atomic means that all systems receive updates, or all systems are rolled back to the previous state, much like a database atomic commit. The cluster configuration in memory is created from the main.cf file on disk in the case where HAD is not currently running on any cluster systems, so there is no configuration in memory. When you start VCS on the first cluster system, HAD builds the configuration in memory on that system from the main.cf file. Changes to a running configuration (in memory) are saved to disk in main.cf when certain operations occur. These procedures are described in more detail later in the course.

122

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

VCS Configuration Files

main.cf

include "types.cf" cluster vcs ( UserNames = { admin = ElmElgLimHmmKumGlj } Administrators = { admin } CounterInterval = 5 A simple text file is used to A simple text file is used to ) store the cluster configuration store the cluster configuration system S1 ( on disk. on disk. ) The file contents are described system S2 ( The file contents are described in detail later in the course. ) in the course. group WebSG ( SystemList = { S1 = 0, S2 = 1 } ) Mount WebMount ( MountPoint = "/web" BlockDevice = "/dev/vx/dsk/WebDG/WebVol" FSType = vxfs FsckOpt = "-y" )

VCS Configuration Files Configuring VCS means conveying to VCS the definitions of the cluster, service groups, resources, and resource dependencies. VCS uses two configuration files in a default configuration: The main.cf file defines the entire cluster, including cluster name, systems in the cluster, and definitions of service groups and resources, in addition to service group and resource dependencies. The types.cf file defines the resource types. Additional files similar to types.cf may be present if agents have been added. For example, if the Oracle enterprise agent is added, a resource types file, such as OracleTypes.cf, is also present. The cluster configuration is saved on disk in the /etc/VRTSvcs/conf/ config directory, so the memory configuration can be re-created after systems are restarted. Note: The VCS installation utility creates the $VCS_CONF environment variable containing the /etc/VRTSvcs path. The short path to the configuration directory is $VCS_CONF/conf/config.

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

123

1

VCS ArchitectureAgents monitor resources on each system and provide status to HAD on the local system. HAD on each system sends status information to GAB. GAB broadcasts configuration information to all cluster members. LLT transports all cluster communications to all cluster nodes. HAD on each node takes corrective action, such as failover, when necessary.

VCS ArchitectureThe slide shows how the major components of the VCS architecture work together to manage application services. How does VCS know what to fail over? Each cluster system has its own copy of configuration files, libraries, scripts, daemons, and executable programs that are components of VCS. Cluster systems share a common view of the cluster configuration. An application service consists of all the resources that the application requires in order to run, including the application itself, and networking and storage resources. This application service provides the structure for a service group, which is the unit of failover. Dependencies define whether a resource or service group failure impacts other resources or service groups. Dependencies also define the order VCS brings service groups and resources online or takes them offline. How does VCS know when to fail over? Agents communicate the status of resources to HAD, the VCS engine. The agents alert the engine when a resource has faulted. The VCS engine determines what to do and initiates any necessary action.

124

VERITAS Cluster Server for UNIX, FundamentalsCopyright 2005 VERITAS Software Corporation. All rights reserved.

Active/Passive1Before Failover After Failover

Supported Failover ConfigurationsThe following examples illustrate the wide variety of failover configurations supported by VCS. Active/Passive In this configuration, an application runs on a primary or master server. A dedicated redundant server is present to take over on any failover. The redundant server is not configured to perform any other functions. The redundant server is on standby with full performance capability. The next examples show types of active/passive configurations:

Lesson 1 VCS Building BlocksCopyright 2005 VERITAS Software Corporation. All rights reserved.

125

Active/Passive N-to-1

Before Failover

After Failover

N-to-1 This configuration reduces the cost of hardware redundancy while still providing a dedicated spare. One server protects multiple active servers, on the theory that simultaneo

Click here to load reader

Reader Image
Embed Size (px)
Recommended