+ All Categories
Home > Documents > 100-002086-B

100-002086-B

Date post: 27-Sep-2015
Category:
Upload: backspa
View: 221 times
Download: 0 times
Share this document with a friend
Description:
ss
Popular Tags:
127
High Availability Design and Customization Using VCS for UNIX and Windows (Appendixes) Lab Guide
Transcript
  • High Availability Design and Customization Using VCS for UNIX and Windows(Appendixes)

    Lab Guide

  • COURSE DEVELOPERSBilge GerritsGail AdeyColin JonesJade Arrington

    LEAD SUBJECT MATTER EXPERTS

    Dave Rogers

    TECHNICAL CONTRIBUTORS AND REVIEWERS

    Mike CarewBarb CeranJim Sennicka

    Disclaimer

    The information contained in this publication is subject to change without notice. VERITAS Software Corporation makes no warranty of any kind with regard to this guide, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. VERITAS Software Corporation shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance, or use of this manual.

    CopyrightCopyright 2004 VERITAS Software Corporation. All rights reserved. No part of the contents of this training material may be reproduced in any form or by any means or be used for the purposes of training or education without the written permission of VERITAS Software Corporation.

    Trademark NoticeVERITAS, the VERITAS logo, and VERITAS Storage Foundation, VERITAS Cluster Server, VERITAS File System, VERITAS Volume Manager, and VERITAS NetBackup are registered trademarks of VERITAS Software Corporation. Other product names mentioned herein may be trademarks and/or registered trademarks of their respective companies.

    High Availability Design and Customization Using VCS for UNIX and Windows

    SKU: VERITAS Software Corporation350 Ellis StreetMountain View, CA 94043Phone 6505278000 www.veritas.com

  • Table of Contents i

    Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Appendix A: Case Studies and ExercisesIntroduction.........................................................................................................A-2Case Study Overview: Investment Bank ............................................................A-2Lesson 1 Exercises ............................................................................................A-8Lesson 2 Exercises ..........................................................................................A-10Lesson 3 Exercises ..........................................................................................A-12Lesson 4 Exercises ..........................................................................................A-14Lesson 5 Exercises ..........................................................................................A-17Lesson 6 Exercises ..........................................................................................A-21Lesson 7 Exercises ..........................................................................................A-23Case Study Overview: Retail Company ...........................................................A-25Lesson 8 Exercises ..........................................................................................A-26Lesson 9 Exercises ..........................................................................................A-27Lesson 10 Exercises ........................................................................................A-30Lesson 11 Exercises ........................................................................................A-35

    Appendix B: Case Study and Exercise SolutionsIntroduction.........................................................................................................B-2Case Study Overview: Investment Bank ............................................................B-2Lesson 1 Exercises and Solutions .....................................................................B-8Lesson 2 Exercises and Solutions ...................................................................B-11Lesson 3 Exercises and Solutions ...................................................................B-13Lesson 4 Exercises and Solutions ...................................................................B-15Lesson 5 Exercises and Solutions ...................................................................B-20Lesson 6 Exercises and Solutions ...................................................................B-25Lesson 7 Exercises and Solutions ...................................................................B-27Case Study Overview: Retail Company ...........................................................B-29Lesson 8 Exercises and Solutions ...................................................................B-30Lesson 9 Exercises and Solutions ...................................................................B-32Lesson 10 Exercises and Solutions .................................................................B-36Lesson 11 Exercises and Solutions .................................................................B-46

    Appendix C: HA/DR Design Process and Design Template

    Appendix D: Lesson Questions and Answers

    Table of Contents

  • ii High Availability Design and Customization Using VCS for UNIX and Windows

    Copyright 2004 VERITAS Software Corporation. All rights reserved.

  • Appendix ACase Studies and Exercises

  • A2 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    IntroductionOverviewThis appendix provides case study exercises based on two different fictional environments: A global investment bank A retail companyThe case study exercises make reference to a number of different contexts, within which there are several different HA/DR requirements. This section gives the background and contexts for the global investment bank case study in which the exercises should be conducted. The retail company case study will be introduced in later lessons.

    Case Study Overview: Investment BankBackground InformationThe investment bank has three main data centers at New York, London and Tokyo, and many buildings, with their own IT systems and infrastructure, distributed across the world. Each building hosts one or more business units. Each of the three main data centers consists of two buildings about 10 kilometers apart connected by dark fibres. Redundant SAN connectivity exists between the two buildings at each main data center.Within the investment bank, there are a number of environments with varying HA requirements. These include: An IT operations database at the three main data centers A read-only NFS service at each building A read-write NFS service within each business unit Corporate e-mail services at several locations throughout the global network

    IT Operations DatabaseThe bank IT infrastructure comprises over 5,000 UNIX workstations, and approximately 500 UNIX servers. Management of these systems in a piecemeal or local basis does not fit with the global strategy of the bank. The bank has devised a centralized system for controlling and managing this data.

    Read-Only Network File ServiceThe bank operates globally 24x7x365. Part of the banks IT strategy is to minimize the need for local knowledge and custom solutions; its support staff may be working remotely from the other side of the globe. The approach taken by the bank is to deploy generic platforms managed with common administration and management tools. Central to the generic platform is a read-only network file service. This service provides each desktop UNIX system with generic root and

  • Appendix A Case Studies and Exercises A3Copyright 2004 VERITAS Software Corporation. All rights reserved.

    /usr file systems. The read-only network file service is provided on a strategic building-wide basis. All business units in a particular building are provided with the file service as a centralized but highly available service.

    Read/Write Network File ServiceEach local business unit provides users with home directory storage for scratch files, mail folders, working files, and other non-critical data. Although the availability of the data is not critical to the banks profits, it is important to the smooth operations within the group.

    Corporate Email ServicesThe investment bank provides e-mail services to their employees on a strategic level. The bank has several internet gateways throughout the global network, and mailbox services come under the control of the Internet communications group responsible for Internet gateway technologies. The services provided to the banks employees include email through conventional means (IMAP/POP) and also through a web interface.

    High Level Business Requirements

    IT Operations Database

    The system is a combination of applications for adding, deleting, updating system configuration data, and a database that holds this information. Everything from platform type, hardware, installed OS, local system build type (server, desktop),

    B-2

    IT Operations Database

    Client

    Client

    Proxy

    Proxy

    Database(New York)

    R/W

    Database(Tokyo)

    RODatabase(London)

    RO

    Server

    Periodic updatesfrom proxies

    Read

    Building 1

    Server

    Server

    Periodic replicationusing database

    replication software

    Building 2

    .

    .

    .

  • A4 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    hostname, IP, domain, physical location, through to system hardware inventory, internal disks, option cards installed and disk VTOC information are stored in the database.The contents of this database represent the complete configuration data of the whole global infrastructure and as such its integrity and availability are of the utmost importance. The database is also widely used across the globe, is under constant revision although at low levels of change and as such, the bank has devised a scheme to reduce the dependency on a single central database instance by providing database replication and the read only database instances at the two other data centers. This improves the load balancing and reduces SPOF failures.The existing configuration is shown on the slide and is as follows:The database is accessed by local proxies (servers with a specific application) within each building. Some of the data is collected automatically by a software at specific times, other data is specified by system administrators during system provisioning. All the changes made to client configurations during the day are cached on redundant proxy systems at each building. On a nightly basis, these changes are uploaded to the only read/write database in the main data center at New York. The changes are then replicated by the application itself to the other two read only copies at the two other data centers. The database replication product guarantees database integrity at read only sites during and after periodic replication. Since there are buildings distributed across the world, each building is allocated a specific time for updates according to its time zone. This results in the main database being updated only three times during a day. The integrity of the database must be maintained. If a new production server is deployed, its configuration data is initially stored locally at the proxies. If the data is lost during this stage, it is not critical and it can be recreated locally at a small manual cost. However, after the changes are committed to the main database, they should not be lost. Whenever a subsequent re-install of the same production server is necessary, it will take place according to the data in the database and if the data is not available, it may have a direct impact on the production environment.The database is used as a source for many aspects of the environment, for example, each night the network information services (NIS) maps are rebuilt from this database. Any lost information would result in production services not functioning properly.If the database is unavailable for a while, the impact is only upon IT operations. It has no direct impact upon banking or market dealing availability or functionality. The impact is upon system administration, server provisioning and rebuild. However, the bank needs to maintain global administration and operations functional and this is a key component of that service. The impact of downtime is measured in terms of backlog and disruption to work routines.To safeguard the environment the database is currently replicated between main data centers: New York, London and Tokyo. The New York instance is the primary database instance and is read/write, London and Tokyo instances are normally read

  • Appendix A Case Studies and Exercises A5Copyright 2004 VERITAS Software Corporation. All rights reserved.

    only and are used only for load balancing read requests. In the event of the read/write instance failing, one of the remaining instances is changed to read/write. Currently, this is carried out manually by a system administrator.The bank IT strategy includes the requirement that all key infrastructure and application services have a disaster recovery plan enabling continuity of business in the event of a building outage. Currently, the database instance at each data center is using one disk array with one server in one of the two buildings available to the data center.

    Read-Only Network File System

    In the existing environment, the read-only NFS service is centralized within each building. Each building has a number of NFS servers: half of them on the A side, half of them on the B side. The number of NFS servers working in parallel depends on the total number of clients in the building and may be different in each building. The slide shows an environment with 2 active NFS servers at each side. All the clients in a business unit are manually separated into two groups: A clients and B clients. A clients connect to the NFS services on the A side and B clients connect to the NFS services on the B side. The total clients on each side is also manually distributed to different NFS services on each side for load balancing. Each side also has one stand-by NFS server which is used to manually fail over the NFS service that has failed. Each read-only NFS server has its own local copy of the data. The data on these servers is updated very infrequently only when there are version changes. Any updates are tested on separate systems at the main data centers and pushed out to

    B-3

    Read-Only NFS Service

    RO

    NFSActive

    RO

    NFSActive

    RO

    NFSStandby

    AService

    RO

    NFS Active

    RO

    NFS Active

    RO

    NFS Standby

    BService

    AClients

    BClients

    AClients

    BClients

    A Side B Side

    Building XX

    Business Unit 1

    Business Unit 2

  • A6 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    individual buildings. Servers in individual buildings are updated one by one using the spare capacity to keep the services running. Each building gets an image of the new read only data with every update.The read only NFS service is very critical to the business and should be available 24x7x365. If an NFS service fails and cannot be recovered, part of the clients in a building cannot operate having an immediate impact on the business. The existing manual failover procedure is no longer satisfactory and the bank wants to improve it.It is a fundamental business requirement that banking and trading operations be maintained in a highly available operational state. As such, end user systems must be supported with a highly available network file service. In the event of a system failure, immediate and automatic response must return the service for those affected within minutes. Server failure must not affect all users in the business unit. Processing must be balanced across multiple servers to reduce the likelihood of a sudden load on one system after a failure from impacting the others. The bank has also made a strategic decision that any new technology deployed must have minimal or no single points of failure. The elimination of single points of failure from the existing environment will improve uptime and therefore profitability.

    Read/Write Network File Service

    The business unit is responsible for the provision of home directory resources to users in the local business unit. The resource is then controlled and managed according to the objectives of the business unit. Those objectives require that the file service be maintained in a state of high availability during business hours with

    B-4

    Read/Write NFS Service

    RO

    NFS

    RO

    NFSBusiness Unit 1

    AClients

    BClients

    NFS NFS

    R/W

    Business Unit 2

    AClients

    BClients

    NFS NFS

    R/W

    Building XX

    RO

    NFS

    RO

    NFS

    RO

    NFS

    RO

    NFS

  • Appendix A Case Studies and Exercises A7Copyright 2004 VERITAS Software Corporation. All rights reserved.

    automated response to system failure, to bring the service back on-line within minutes. The read/write NFS services is shown in the slide for each business unit. All of the clients in a business unit connect to the same read/write NFS service.The business unit policy for backups is that recovery should be up to the state of the previous day. Backups are performed nightly. The business unit does not store mission critical data in the home directories on the read/write NFS server.

  • A8 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 1 ExercisesIT Operations Database1 Based on the information provided in the case study, identify the requirements

    that will have most impact upon the design and the ultimate solution.

    2 The recovery objectives can be broken down into two elements: recovery time and recovery point. Suggest appropriate technologies for these elements to achieve the business objectives.

    3 What would be the impact of a logical error in the database? What measures can be taken to respond to this situation? What technologies can be used to effect the recovery?

    Read-Only NFS Service1 Based on the information provided in the case study, identify the requirements

    that will have most impact upon the design and the ultimate solution.

  • Appendix A Case Studies and Exercises A9Copyright 2004 VERITAS Software Corporation. All rights reserved.

    2 The recovery objectives can be broken down into two elements: recovery time and recovery point. Suggest appropriate technologies for these elements to achieve the business objectives.

    3 What would be the impact of a logical error in the file system of the read-only NFS service? What measures can be taken to respond to this situation? What technologies can be used to effect the recovery?

    Read/Write Network File Service1 Based on the information provided in the case study, identify the requirements

    that will have most impact upon the design and the ultimate solution.

    2 The recovery objectives can be broken down into two elements: recovery time and recovery point. Suggest appropriate technologies for these aspects of the design to achieve the business objectives.

    3 What would be the impact of a logical error in the file system of the RW NFS service? What measures can be taken to respond to this situation? What technologies can be used to effect the recovery?

  • A10 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 2 ExercisesIT Operations Database1 List the major components of an SLA that reflect the business requirements in

    this case study.

    2 State the RPO and RTO for this service.

    3 Identify the VERITAS technologies that address the business requirements and satisfy the SLA. Explain the role of each technology.

    Read-only NFS Service1 List the major components of an SLA that reflect the business requirements in

    this case study.

    2 State the RPO and RTO for this service.

  • Appendix A Case Studies and Exercises A11Copyright 2004 VERITAS Software Corporation. All rights reserved.

    3 Identify the VERITAS technologies that address the business requirements and satisfy the SLA. Explain the role of each technology.

    Read/Write Network File Service1 List the major components of an SLA that reflect the business requirements in

    this case study.

    2 State the RPO and RTO for this service.

    3 Identify the VERITAS technologies that address the business requirements and satisfy the SLA. Explain the role of each technology.

  • A12 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 3 ExercisesIT Operations Database1 Identify the storage architecture, the cluster topology and the failover

    configuration that could be deployed to meet the business objectives.

    2 Identify requirements that cannot be directly resolved through the deployment of standard HA/DR technology.

    3 Suggest an alternative topology, and discuss the issues with this alternative.

    Read-only NFS Service1 Identify the storage architecture, the cluster topology and the failover

    configuration that could be deployed to meet the business objectives.

    2 Identify any requirements that cannot be directly resolved through the deployment of standard HA technology.

  • Appendix A Case Studies and Exercises A13Copyright 2004 VERITAS Software Corporation. All rights reserved.

    3 Suggest an alternative topology, and discuss the issues with this alternative.

    Read/write Network File Service1 Identify the storage architecture, the cluster topology and the failover

    configuration that could be deployed to meet the business objectives.

    2 Identify any requirements that cannot be directly resolved through the deployment of standard HA technology.

    3 Suggest an alternative topology, and discuss the issues with this alternative.

  • A14 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 4 ExercisesAdditional Infrastructure Details of Investment BankThe legacy infrastructure in place within each building comprises SAN and existing servers. The hope of management is that much of the existing infrastructure can be redeployed within the new framework so as to minimize the capital expenditure. The details of the infrastructure are shown in the following subsections.

    SAN InfrastructureThe bank has been deploying SAN infrastructure driven by the rapid growth in banking data and the legal requirements to maintain accessibility to that data for prolonged periods. For each building, there are two fabrics, which are for use as backup for each other, and load balancing where possible.The diagram shows the general connectivity scheme.

    Legacy StorageThe bank has also deployed hardware RAID technology to enhance flexibility in allocation of storage on demand, with resilience and performance also being the main drivers in the storage technology decision. Existing application environments are given storage space on designated HW RAID systems. Access to the specific LUN/Array is dual ported via fabrics A and B.

    B-5

    Building SAN Infrastructure

    Multiple Paths to Storage

    SAN Switch

    SAN Switch

    Host HWRAID

    SAN Switch

    SAN Switch

    Redundant Inter-Switch Links

    Redundant Inter-Switch Links

    Dual-Ported Storage

    Fabric A

    Fabric B

    Building

  • Appendix A Case Studies and Exercises A15Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Legacy Server SpecificationThe legacy server specification includes a dual channel fibre controller for accessing the hardware RAID system over either fabric. Each system has internal mirrored boot disks and a single network interface.

    Investment Bank: Read-Only NFS Service1 Is it possible to implement a VCS cluster with the existing infrastructure

    alone? Does the legacy infrastructure satisfy the minimal VCS requirements to activate the cluster technology. If necessary, specify any hardware components that must be supplied in addition to the existing infrastructure.

    B-6

    Legacy Server Specification

    On BoardEther

    Controller

    Boot Mirrors

    Dual ChannelFibre

    Controller

    SpareSpareSpare

  • A16 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    2 Does the minimal functioning solution satisfy the business requirements? Explain where there may be a gap between the minimal solution and the business requirement.

    Investment Bank: Read/Write Network File Service1 Is it possible to implement a VCS cluster with the existing infrastructure

    alone? Does the legacy infrastructure satisfy the minimal VCS requirements to activate the cluster technology. If necessary, specify any hardware components that must be supplied in addition to the existing infrastructure.

    2 Does the minimal functioning solution satisfy the business requirements? Explain where there may be a gap between the minimal solution and the business requirement.

  • Appendix A Case Studies and Exercises A17Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 5 ExercisesInvestment Bank: More Background informationThe investment bank provides e-mail services to its employees on a strategic level. Because of the size of the organization, the deployment and use of standard mail access daemons is not appropriate. The quantity of users and level of activity results in major performance issues using standard IMAP daemons and so the Cyrus IMAP daemon has been deployed on the mailbox servers. This daemon runs standalone, and so would normally be started at boot time by its own boot script. The message store is a database of messages. Users have rights to their own message areas. The IMAP daemon has a specific CLI for administering the message store including startup, status, close and crash recovery. The IMAP daemon binds to a local virtual IP address on the server on which it is running to provide access to clients.The Internet Communications group responsible for the mail services has a legacy environment that includes Volume Manager for managing JBOD disk arrays, and VxFS for file system recovery. The JBODs are SAN attached. Each mail service is identified by a different IP address and name in DNS.

    Questions1 From the information provided, identify the service group architecture for the

    simplest solution if the mailbox service is clustered, taking into account only one mailbox service with its storage and network components.

    2 Because of the load on a single mailbox service, the bank has implemented two mailbox services at each location. Each mailbox service runs on a different server. The load on each mailbox service is balanced manually through the assignment of users to an appropriate mailbox service when the user joins the organization.The availability of these mailbox services is deemed critical to the bank. However, the bank is also very cost aware and does not want to provide a separate standby server for each individual mailbox service. Therefore, they have decided to add only one standby server to the two mailbox servers at each location for clustering as shown here:

  • A18 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    The systems do not have enough capacity to run both mailbox services at the same time. But all the systems are identical and they have access to all of the message stores through SAN. Propose a failover configuration for the 3-node cluster that would address the given requirements. Suggest a method to prevent a system running both mailbox services at the same time.

    3 Considering that each system comes with one public network port, can you use the service group diagram provided in the solution of question 1 for both service groups?

    B-12

    MessageStore 1

    MessageStore 2

    Mailbox 1 Mailbox 2 Standby

    SAN

  • Appendix A Case Studies and Exercises A19Copyright 2004 VERITAS Software Corporation. All rights reserved.

    4 Each server has been deployed with redundant NICs. Construct the service group diagrams for both service groups in the cluster. Take advantage of the redundant NICs to minimize service group failovers due to NIC, cable or switch failure.

    5 The bank has now decided to implement a web interface to the mailbox services in addition to the conventional methods like IMAP or POP. The web interface has two components: Apache web server:

    Apache web server provides the interface to the mailbox users. It has its own virtual IP address for the clients to connect. It uses local static web pages and therefore does not require any shared storage access.

    A webmail daemon:The webmail daemon is responsible for managing the TCP connections to the IMAP daemon and runs as a multithreaded single process. It maintains no state and as such is very crash resilient. The webmail daemon is effectively no different from a standard IMAP client like Microsoft Outlook, however, the user interface to the customers is running on the Apache web server.

    The key points about the web interface are as follows: The communication between the Apache web server and the webmail

    daemon is through UNIX sockets, so the Apache web server and the webmail daemon have to be running on the same machine. However, there is no specific dependency between them; they can be started or stopped in any order.

    Users connect by browser to a login page presented by the Apache web server. When a user logs in, he/she presents his/her username and server name in this form: user@server. The user and server names are parsed and then the server name used by the webmail daemon to direct the connection to the appropriate mail server. The webmail daemon can therefore connect to local or remote IMAP services.

    The webmail daemon does not depend on any IP address, it can connect to an IMAP daemon running on any other system as long as the system it is running on has access to the public network.

    The bank has decided to use the same 3-node cluster to provide redundancy for the web interface.

  • A20 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Draw the service group diagram for the web interface. Considering the nature of the IMAP application, Apache web server and the webmail daemon, discuss the type of agents (bundled, Enterprise or custom) you can use for each application resource.

  • Appendix A Case Studies and Exercises A21Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 6 ExercisesQuestions1 State the prerequisites for a particular agent type to start on a cluster node.

    2 Within a running cluster, what circumstances would cause an agent that was not previously running, to start up?

    3 What exit code or codes does the monitor entry point return to signify online, offline and unknown?

    4 When does monitoring of a resource begin?

    5 Which attributes control the point in time when automated ongoing monitoring occurs?

  • A22 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    6 Identify eight circumstances when the monitor entry point is invoked.

    7 Which two circumstances will result in the close entry point being invoked?

    8 What is the minimum entry point requirement for an on only persistent resource type?

    9 Which resource type attribute identifies the resource type category as OnOff, OnOnly, or None?

    10 How should activity be logged to the cluster log files? What utility should be used?

  • Appendix A Case Studies and Exercises A23Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 7 ExercisesQuestions1 Describe the sequence of events that result in a trigger script being executed.

    2 What are the different ways that a trigger can be configured?

    3 Which triggers are configured by default at VCS installation?

    4 Which command must be last in a preonline event trigger? Where does this trigger execute?

    5 On which cluster node does a trigger run?

  • A24 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    6 You want to implement a postoffline event trigger for a specific service group in a cluster with multiple service groups. What would you do?

    7 Is it acceptable to use a trigger script to clear fault states of resources?

    8 Is it acceptable to use a trigger script to decide where to bring a service group online?

    9 On which cluster node does the injeopardy trigger execute?

    10 Does the preonline trigger apply to all service groups?

  • Appendix A Case Studies and Exercises A25Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Case Study Overview: Retail CompanyThe retail company owns a chain of stores where its products are sold. The management of warehouse stock, ordering, sales, accounting, invoicing, and distribution is centrally managed at the group headquarters.The group is currently investing in a new IT infrastructure to improve efficiency and take account of the companys increasing dependency upon its information systems. The new infrastructure includes the implementation of custom applications within the Oracle application framework, which reads and writes data in an Oracle database.

    Retail Company: Background InformationThere are two environments, including a production and a test/development database. The testing and development databases must not interfere, conflict with or compromise the production database. In the event of a production failover, the development database must be taken offline automatically.The development team, however, has little or no VCS knowledge, and requires complete control over the test environment without impacting upon the production environment or upon the cluster.The development and test environment consists of a predefined list of applications running and an a development Oracle database, all of which have defined startup and shutdown scripts.

    Retail Company: Questions1 What possibilities exist to cause the test environment to be taken offline,

    before the production environment is brought online?

  • A26 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 8 ExercisesControlling Failover Behavior

    Investment Bank: Special Actions for Specific ResourcesThe Investment Bank has critical market trading applications accessing data through an RDBMS. Maintaining integrity in the database is of such high importance that, in the case of database failure, automatic failover should not take place; instead, one of the database specialists should bring the database online after performing some checks and safeguards. 1 Describe the mechanism by which this can be achieved within the VCS

    configuration such that automatic failover will occur in all other circumstances.

    Investment Bank: TimeoutsThe Databases are located on file systems, up to 200GB in size. In order to reduce the failover time to a minimum in the event of a system crash, the use of file system logging is deployed. 2 Describe the impact of a file system logical error (resulting in a corrupt log) in

    such a circumstance and how it may impact the failover, and what measures could be taken to minimize any downtime.

  • Appendix A Case Studies and Exercises A27Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 9 ExercisesParallel Service Groups

    Retail Company: Shared DataThe retail organization currently has a two-node Asymmetric Failover cluster topology. The application environment includes Oracle applications sitting on top of an Oracle9i database. The company is looking into alternatives to double capacity to increase throughput. One option that is being discussed is bringing the standby machine into use so both systems are active so there will be two parallel instances accessing the same database. The two systems are identical, V880 mid range servers, loaded with 4 CPUs (750MHz), 8Gbytes of RAM, and gigabit private net interfaces. 1 Discuss the merits of different approaches to increase processing capacity.

    2 What would be the criteria for moving towards a parallel shared data architecture in the previous case?

    Investment Bank: Nothing SharedThe investment banks Read-Only network file service is provided independently to each of the two halves of a building. For each half, there is a separate cluster that provides service to that half of the building. The cluster topology is 2+1 running the service in parallel on two of the three nodes, the 3rd node functioning as a standby. The two active nodes run the NFS service from unique IP addresses. The client systems are load balanced across the servers by IP address, that is, the clients with an odd numbered IP address are configured as clients of the odd-service, and the clients with an even numbered IP address are configured as clients of the even-service. Both NFS services are active at the same time. The services are differentiated by their respective IP addresses. The architecture is shared nothing and all systems are symmetric, they have identical physical configuration, as the diagram shows with multiple public nets for NIC failover.

  • A28 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    3 Design the service group, taking into account all possible parallelism, specify the values of any important service group attributes, where necessary.

    Investment Bank: Web Interface to Email ServicesNote: This exercise depends on the environment described in lesson 5 exercises. Carry out lesson 5 exercises before you read the rest of this section and do step 4. Due to the load on the web interface, the bank has now decided to implement a web service on two systems in the cluster. Each web service has its own virtual IP address. The two virtual IP addresses used by the two Apache web servers are aliased to the same virtual hostname on the DNS server. Clients connect to the web server using this virtual hostname. The DNS server directs each user to either one of the web servers evenly using a round-robin DNS policy.The bank has the following requirements for the two web interfaces: Under normal conditions each web interface should be running on a separate

    system with its own virtual IP address. If one of the web services fail either due to a resource failure or a system

    failure and cannot be failed over to a standby system, only the virtual IP address belonging to that service should fail over to a system that has another

    B-15

    c1t0d0 c1t0d0 c1t0d0

    Public Nets

    SystemA SystemB SystemC

  • Appendix A Case Studies and Exercises A29Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Apache web server and a webmail daemon running. In this case, one Apache web server would be providing web services to both virtual IP addresses.

    Under no circumstances should there be two Apache web servers and two webmail daemons running on the same system.

    4 Considering these requirements, suggest modifications to the service group diagram, illustrated in the solution to step 5 of lesson 5 exercises, that would support this environment. Discuss the customization requirements in this environment to make sure that each virtual IP address is configured on a separate system with an Apache web server available.

  • A30 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 10 ExercisesInvestment Bank: Campus Infrastructure DetailsAs part of the business continuity planning for the global investment bank, the bank has decided to extend the disaster recovery planning that is already in place in the main data centers to other buildings all around the world. A research into the various possibilities resulted in pairing of existing buildings so that each building would act as the DR site for another building within 20 kilometers of distance. The investigation also found out that although most of the paired buildings could be connected to one another with dark fibre connections, there were several pairs that could not be connected using dark fibre due to logistical reasons beyond the control of the bank. These building pairs still had good network connectivity and it was possible to extend subnets using VLANs, if desired.The existing storage technology within the bank is largely hardware RAID and significantly EMC. Although from a commercial point of view this technology has to be reused, the bank does not want to be locked into a single vendor and wishes to move towards a more heterogeneous environment. This requirement should also be taken into account in designing the disaster recovery solution.Keep in mind that each building has the following services that will require a disaster recovery solution: Two N+1 clusters for read only NFS services (no shared storage) A number of 2-node asymmetric read/write NFS services (one NFS service per

    business unit in the building) Some buildings also have the mail services which include a 2+1 cluster that

    has the mailboxes and the web interfaces as described in lesson 5 and lesson 8 exercises. Note that if there is a mail service in one building, the pair building would not have any mail services configured.

    Your Task: Devising a StrategyGiven the requirement to implement a disaster recovery plan and accepting that some building pairs have dark fibre and some do not, devise a strategy for implementing the disaster recovery plan for the read-only NFS, read/write NFS and e-mail services. The solution should itemize server, storage, and network specification as well as changes to the cluster topologies at a strategic level.Complete the following table with the details of your strategy.

  • Appendix A Case Studies and Exercises A31Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Strategy Overview

    LAN Infrastructure

    SAN Infrastructure

  • A32 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Storage Infrastructure

    Server Specification

  • Appendix A Case Studies and Exercises A33Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Additional Questions1 Identify the different approaches to the deployment of a single cluster for

    disaster recovery purposes.

    2 Identify any restrictions that must be placed upon site inter-connectivity in an environment where a single cluster extends over multiple sites.

    3 With particular regard to a complete loss of connectivity between sites (both network and SAN), what measures should be taken to ensure data integrity? Would the approach be any different with different types of cluster configurations (Campus, RDC, GCO)?

    4 Which two general types of data replication methods are supported by VCS?

    5 Identify one significant difference between a stretch cluster and a RDC.

  • A34 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    6 How many global clusters can any one cluster be a member of?

    7 What is the max number of clusters in any global cluster?

    8 List the new VCS objects used to enable global clusters and describe their main functionality.

  • Appendix A Case Studies and Exercises A35Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 11 ExercisesQuestions1 When backing up cluster nodes, how should the backup file list be specified? Is

    it acceptable to backup all local files? When backing up over IP networks using NetBackup, how should the backup of a service group be specified in the policy client list?

    2 Is it acceptable to backup a service groups data whilst the application is online? What issues arise in considering the answer to this?

    3 If the data must be backed up whilst the application is online and in use, what impact will the backup have on the system and network if the data is being backed up to tape devices on a separate system? How does this change if backup is performed on the cluster node itself? What technologies exist that can minimize the impact of performing backups over the network or to the current active node hosting the service group?

    4 Describe the outcome of an application that is managed under VCS control being shut down by its normal standalone administration start/stop scripts/methods. How would VCS respond?

  • A36 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    5 What features exist in VCS to allow the delegation of control of the cluster and the service groups selectively.

    6 Can a non-root user, execute VCS commands at the command line, to perform specific procedures without needing to be authenticated by VCS? If so, what configuration is required to enable this?

  • Appendix BCase Study and Exercise Solutions

  • B2 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    IntroductionOverviewThis appendix provides case study exercises based on two different fictional environments: A global investment bank A retail companyThe case study exercises make reference to a number of different contexts, within which there are several different HA/DR requirements. This section gives the background and contexts for the global investment bank case study in which the exercises should be conducted. The retail company case study will be introduced in later lessons.

    Case Study Overview: Investment BankBackground InformationThe investment bank has three main data centers at New York, London and Tokyo, and many buildings, with their own IT systems and infrastructure, distributed across the world. Each building hosts one or more business units. Each of the three main data centers consists of two buildings about 10 kilometers apart connected by dark fibres. Redundant SAN connectivity exists between the two buildings at each main data center.Within the investment bank, there are a number of environments with varying HA requirements. These include: An IT operations database at the three main data centers A read-only NFS service at each building A read-write NFS service within each business unit Corporate e-mail services at several locations throughout the global network

    IT Operations DatabaseThe bank IT infrastructure comprises over 5,000 UNIX workstations, and approximately 500 UNIX servers. Management of these systems in a piecemeal or local basis does not fit with the global strategy of the bank. The bank has devised a centralized system for controlling and managing this data.

    Read-Only Network File ServiceThe bank operates globally 24x7x365. Part of the banks IT strategy is to minimize the need for local knowledge and custom solutions; its support staff may be working remotely from the other side of the globe. The approach taken by the bank is to deploy generic platforms managed with common administration and management tools. Central to the generic platform is a read-only network file service. This service provides each desktop UNIX system with generic root and

  • Appendix B Case Study and Exercise Solutions B3Copyright 2004 VERITAS Software Corporation. All rights reserved.

    /usr file systems. The read-only network file service is provided on a strategic building-wide basis. All business units in a particular building are provided with the file service as a centralized but highly available service.

    Read/Write Network File ServiceEach local business unit provides users with home directory storage for scratch files, mail folders, working files, and other non-critical data. Although the availability of the data is not critical to the banks profits, it is important to the smooth operations within the group.

    Corporate Email ServicesThe investment bank provides e-mail services to their employees on a strategic level. The bank has several internet gateways throughout the global network, and mailbox services come under the control of the Internet communications group responsible for Internet gateway technologies. The services provided to the banks employees include email through conventional means (IMAP/POP) and also through a web interface.

    High Level Business Requirements

    IT Operations Database

    The system is a combination of applications for adding, deleting, updating system configuration data, and a database that holds this information. Everything from platform type, hardware, installed OS, local system build type (server, desktop),

    B-2

    IT Operations Database

    Client

    Client

    Proxy

    Proxy

    Database(New York)

    R/W

    Database(Tokyo)

    RODatabase(London)

    RO

    Server

    Periodic updatesfrom proxies

    Read

    Building 1

    Server

    Server

    Periodic replicationusing database

    replication software

    Building 2

    .

    .

    .

  • B4 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    hostname, IP, domain, physical location, through to system hardware inventory, internal disks, option cards installed and disk VTOC information are stored in the database.The contents of this database represent the complete configuration data of the whole global infrastructure and as such its integrity and availability are of the utmost importance. The database is also widely used across the globe, is under constant revision although at low levels of change and as such, the bank has devised a scheme to reduce the dependency on a single central database instance by providing database replication and the read only database instances at the two other data centers. This improves the load balancing and reduces SPOF failures.The existing configuration is shown on the slide and is as follows:The database is accessed by local proxies (servers with a specific application) within each building. Some of the data is collected automatically by a software at specific times, other data is specified by system administrators during system provisioning. All the changes made to client configurations during the day are cached on redundant proxy systems at each building. On a nightly basis, these changes are uploaded to the only read/write database in the main data center at New York. The changes are then replicated by the application itself to the other two read only copies at the two other data centers. The database replication product guarantees database integrity at read only sites during and after periodic replication. Since there are buildings distributed across the world, each building is allocated a specific time for updates according to its time zone. This results in the main database being updated only three times during a day. The integrity of the database must be maintained. If a new production server is deployed, its configuration data is initially stored locally at the proxies. If the data is lost during this stage, it is not critical and it can be recreated locally at a small manual cost. However, after the changes are committed to the main database, they should not be lost. Whenever a subsequent re-install of the same production server is necessary, it will take place according to the data in the database and if the data is not available, it may have a direct impact on the production environment.The database is used as a source for many aspects of the environment, for example, each night the network information services (NIS) maps are rebuilt from this database. Any lost information would result in production services not functioning properly.If the database is unavailable for a while, the impact is only upon IT operations. It has no direct impact upon banking or market dealing availability or functionality. The impact is upon system administration, server provisioning and rebuild. However, the bank needs to maintain global administration and operations functional and this is a key component of that service. The impact of downtime is measured in terms of backlog and disruption to work routines.To safeguard the environment the database is currently replicated between main data centers: New York, London and Tokyo. The New York instance is the primary database instance and is read/write, London and Tokyo instances are normally read

  • Appendix B Case Study and Exercise Solutions B5Copyright 2004 VERITAS Software Corporation. All rights reserved.

    only and are used only for load balancing read requests. In the event of the read/write instance failing, one of the remaining instances is changed to read/write. Currently, this is carried out manually by a system administrator.The bank IT strategy includes the requirement that all key infrastructure and application services have a disaster recovery plan enabling continuity of business in the event of a building outage. Currently, the database instance at each data center is using one disk array with one server in one of the two buildings available to the data center.

    Read-Only Network File System

    In the existing environment, the read-only NFS service is centralized within each building. Each building has a number of NFS servers: half of them on the A side, half of them on the B side. The number of NFS servers working in parallel depends on the total number of clients in the building and may be different in each building. The slide shows an environment with 2 active NFS servers at each side. All the clients in a business unit are manually separated into two groups: A clients and B clients. A clients connect to the NFS services on the A side and B clients connect to the NFS services on the B side. The total clients on each side is also manually distributed to different NFS services on each side for load balancing. Each side also has one stand-by NFS server which is used to manually fail over the NFS service that has failed. Each read-only NFS server has its own local copy of the data. The data on these servers is updated very infrequently only when there are version changes. Any updates are tested on separate systems at the main data centers and pushed out to

    B-3

    Read-Only NFS Service

    RO

    NFSActive

    RO

    NFSActive

    RO

    NFSStandby

    AService

    RO

    NFS Active

    RO

    NFS Active

    RO

    NFS Standby

    BService

    AClients

    BClients

    AClients

    BClients

    A Side B Side

    Building XX

    Business Unit 1

    Business Unit 2

  • B6 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    individual buildings. Servers in individual buildings are updated one by one using the spare capacity to keep the services running. Each building gets an image of the new read only data with every update.The read only NFS service is very critical to the business and should be available 24x7x365. If an NFS service fails and cannot be recovered, part of the clients in a building cannot operate having an immediate impact on the business. The existing manual failover procedure is no longer satisfactory and the bank wants to improve it.It is a fundamental business requirement that banking and trading operations be maintained in a highly available operational state. As such, end user systems must be supported with a highly available network file service. In the event of a system failure, immediate and automatic response must return the service for those affected within minutes. Server failure must not affect all users in the business unit. Processing must be balanced across multiple servers to reduce the likelihood of a sudden load on one system after a failure from impacting the others. The bank has also made a strategic decision that any new technology deployed must have minimal or no single points of failure. The elimination of single points of failure from the existing environment will improve uptime and therefore profitability.

    Read/Write Network File Service

    The business unit is responsible for the provision of home directory resources to users in the local business unit. The resource is then controlled and managed according to the objectives of the business unit. Those objectives require that the file service be maintained in a state of high availability during business hours with

    B-4

    Read/Write NFS Service

    RO

    NFS

    RO

    NFSBusiness Unit 1

    AClients

    BClients

    NFS NFS

    R/W

    Business Unit 2

    AClients

    BClients

    NFS NFS

    R/W

    Building XX

    RO

    NFS

    RO

    NFS

    RO

    NFS

    RO

    NFS

  • Appendix B Case Study and Exercise Solutions B7Copyright 2004 VERITAS Software Corporation. All rights reserved.

    automated response to system failure, to bring the service back on-line within minutes. The read/write NFS services is shown in the slide for each business unit. All of the clients in a business unit connect to the same read/write NFS service.The business unit policy for backups is that recovery should be up to the state of the previous day. Backups are performed nightly. The business unit does not store mission critical data in the home directories on the read/write NFS server.

  • B8 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 1 Exercises and SolutionsIT Operations Database1 Based on the information provided in the case study, identify the requirements

    that will have most impact upon the design and the ultimate solution.The most significant requirements influencing the design are: The database integrity requirements (committed data guarantee) Replication using the database replication tool Primary business objective not affected by short-term loss of service Secondary objective (maintaining a stable operations support

    environment) is important, so minimizing down time is important. Disaster recovery requirements for responding to building outages.

    2 The recovery objectives can be broken down into two elements: recovery time and recovery point. Suggest appropriate technologies for these elements to achieve the business objectives.Recovery time:Redundancy Stand-by system at each data center Additional disk arrays at each data centerRecovery point: Mirrors at each data center Periodic database replication between data centers Logical backup of database

    3 What would be the impact of a logical error in the database? What measures can be taken to respond to this situation? What technologies can be used to effect the recovery?Impact: Database corruption Corruption may be replicated to remote instances Service unavailableResponse: Restore from full or logical backup Recovery may require replaying the redo logs after restoring the full

    backup to find the point just before the corruption.Technologies: Copy based backups Logical backups

    Read-Only NFS Service1 Based on the information provided in the case study, identify the requirements

    that will have most impact upon the design and the ultimate solution.

  • Appendix B Case Study and Exercise Solutions B9Copyright 2004 VERITAS Software Corporation. All rights reserved.

    The most significant requirements influencing the design are: The read-only NFS service must be maintained available 24x7x365. Automated service monitoring and automatic response to failure Rapid recovery of service (within minutes of the failure) Any single failure must not impact all systems. Performance must not be impaired following recovery of service Read-Only data

    2 The recovery objectives can be broken down into two elements: recovery time and recovery point. Suggest appropriate technologies for these elements to achieve the business objectives.Recovery time: Redundancy

    Standby systems (parallel active systems + shared standby system) Automated failover to stand-by system Local disk/data mirroring

    Fast file system recoveryRecovery point: Multiple copies of same static data Recovery of any failed copy

    Re-install from the most recent read-only image at the building3 What would be the impact of a logical error in the file system of the read-only

    NFS service? What measures can be taken to respond to this situation? What technologies can be used to effect the recovery?The impact depends upon the extent of the logical error: If a file is corrupted, then impact will be on clients. This should be

    unlikely to happen in a read-only file system share. This may not result in a failover.

    If file system becomes corrupted due to OS error, then server will likely crash and result in service outage/failover.

    The response would be to re-install the data from the original copy. In this case, from an image of the read-only file system. This can be a simple software distribution, such as tar/cpio extraction.If SAN technology is available, then it is also possible to rebuild from a good reference copy using off-host processing techniques, such as 3rd mirror join/break-off.

    Read/Write Network File Service1 Based on the information provided in the case study, identify the requirements

    that will have most impact upon the design and the ultimate solution.The most significant requirements influencing the design are: Service must be maintained available during business hours. Automated service monitoring and automatic response to failure

  • B10 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Rapid recovery of service (minutes) Read-write data Not mission critical data (recovery point: up to a day)

    2 The recovery objectives can be broken down into two elements: recovery time and recovery point. Suggest appropriate technologies for these aspects of the design to achieve the business objectives.Recovery time: Redundancy

    Standby systems Automated failover to standby system Mirroring

    Fast file system recoveryRecovery point: Copy based backup

    Re-install from nightly backup3 What would be the impact of a logical error in the file system of the RW NFS

    service? What measures can be taken to respond to this situation? What technologies can be used to effect the recovery?Depends upon the extent of the logical error, but in either case there is a high chance that data will be lost: If the error is file corruption, then specific clients/users will be

    affected. If file system becomes corrupted, then the server will likely crash

    (sooner the better).Because data is shared between multiple systems, failover to another standby system will not resolve the corruption problem. File system data will need restoring; either partial or complete restore depending upon the extent of the corruption. Recovery from the most recent backup (last nightly backup) will be needed.

  • Appendix B Case Study and Exercise Solutions B11Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 2 Exercises and SolutionsIT Operations Database1 List the major components of an SLA that reflect the business requirements in

    this case study. Database updates must not be lost; recovery point objective is all

    transactions committed to the database. Service must resume as quickly as possible but short periods of

    downtime is acceptable. If the primary instance is completely lost, one of the remaining

    database instances must become the primary instance by changing to a read/write configuration.

    2 State the RPO and RTO for this service. RTO: Medium. No immediate impact upon primary business objective RPO: High. Database updates must not be lost.

    3 Identify the VERITAS technologies that address the business requirements and satisfy the SLA. Explain the role of each technology. VCS to monitor systems and provide HA framework Database Enterprise Agent to monitor the database instances Volume Manager to manage storage availability (Mirroring) VxFS for fast file system recovery

    Storage Foundation for the specific database in question to address other issues, such as database performance.

    Read-only NFS Service1 List the major components of an SLA that reflect the business requirements in

    this case study. Service must resume immediately (within minutes of the problem

    being detected). A single problem must not affect all users within any business unit. Resumption of service should occur without any degradation in

    performance.2 State the RPO and RTO for this service.

    RTO: High. Service must be resumed very quickly RPO: Low. The data is static, except for infrequent version updates.

    3 Identify the VERITAS technologies that address the business requirements and satisfy the SLA. Explain the role of each technology. VCS to monitor systems and provide HA framework NFS bundled agent to monitor and control NFS application Volume Manager to manage storage availability (Software Raid

    Mirroring).

  • B12 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    VERITAS File System is not a requirement, due to the non-shared, read-only nature of data. File system recovery is not an issue.

    Other VERITAS products, such as NetBackup or OpForce (if applicable), can be used to provide the image kept for recovery purposes.

    Read/Write Network File Service1 List the major components of an SLA that reflect the business requirements in

    this case study.Service must resume immediately within minutes of the problem being detected during business hours.

    2 State the RPO and RTO for this service. RTO: High. Service must be resumed very quickly during business

    hours RPO: Medium. The data is not business critical, and must be restored

    in the event of data loss to the last nightly backup.3 Identify the VERITAS technologies that address the business requirements and

    satisfy the SLA. Explain the role of each technology. VCS to monitor systems and provide HA framework NFS bundled Agent to monitor and control NFS application Volume Manager to manage storage availability (Software Raid

    Mirroring) VERITAS File System for fast file system recovery NetBackup for regular nightly backups

  • Appendix B Case Study and Exercise Solutions B13Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 3 Exercises and SolutionsIT Operations Database1 Identify the storage architecture, the cluster topology and the failover

    configuration that could be deployed to meet the business objectives.As part of the banks disaster recovery strategy, the bank has installed dark fibre between the two buildings at each data center (NY, London, Tokyo) using multiple independent paths. This enables SAN connectivity between buildings. At each main data centre, one standby system and one storage array can be added to the existing configuration to accomplish a two node campus cluster topology. The campus cluster addresses both the high availability and the disaster recovery needs. Moreover, the three campus clusters formed at each data center can be linked to form a global cluster. Such a configuration would have a local database service group that is online in each cluster. A separate global service group would be configured to manage the database replication component. This custom agent would be used to change the read-only copy at one of the secondary sites to a read-write copy in case of a complete cluster failure at the primary site. Note that neither VVR nor any other array-based replication method can be used in such a configuration. Because these replication methods modify the application data at the secondary site without the application itself knowing that things have changed. This would cause major issues since the application does not expect the cached data to be different from what is on the disk.

    2 Identify requirements that cannot be directly resolved through the deployment of standard HA/DR technology.The management of the database replication component and the failover of the primary instance between different data centers cannot be deployed using existing technology. Custom agents and custom scripts need to be developed to address failover between data centers.

    3 Suggest an alternative topology, and discuss the issues with this alternative.To satisfy the disaster recovery requirement, the cluster in each main business centre must operate across different buildings. The alternative in this case can be a replicated data cluster extending across two buildings. However, the existence of the SAN technology would make this choice inappropriate.

    Read-only NFS Service1 Identify the storage architecture, the cluster topology and the failover

    configuration that could be deployed to meet the business objectives.The deployment of the service in the existing solution has been with multiple active systems with one standby for all. The data architecture is shared nothing; each server has a local replica of the original source. The failover time is as long as it takes to fail over IP addresses. The service

  • B14 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    from any single server can be moved to the standby and remain there until subsequent rearrangement becomes necessary due to other subsequent failures. The failover configuration is N+1.

    2 Identify any requirements that cannot be directly resolved through the deployment of standard HA technology.The balancing of load across the servers is a function of client and name service configuration, rather than any cluster or server configuration.

    3 Suggest an alternative topology, and discuss the issues with this alternative.An alternative solution would be for N+1 shared storage. In this scenario, file systems will be taken over by the standby node, checked, mounted and shared as part of the failover sequence. This would reduce the storage requirement for the solution but increase the time to fail over. Moreover, in this case failover would not provide a solution to file system corruption. Unless the bank wants to consolidate existing storage arrays, this would not be a desirable solution.

    Read/write Network File Service1 Identify the storage architecture, the cluster topology and the failover

    configuration that could be deployed to meet the business objectives.The existing environment already includes a dedicated standby server, with shared access to storage. The most straight forward solution is to incorporate the existing manual solution into a VCS framework. The cluster topology in this case is a two-node shared storage cluster and the failover configuration is asymmetric failover. The storage is shared so each server can take control of the disks/LUN devices. Under normal operations the standby is idle.

    2 Identify any requirements that cannot be directly resolved through the deployment of standard HA technology.No special requirements.

    3 Suggest an alternative topology, and discuss the issues with this alternative.The defining factor in this scenario is the need for access by each host to dynamic data. This can either be the original data (shared configuration), or an up-to-date copy (synchronous replication). Depending upon availability of storage technologies and resources within the hardware raid device this could be achieved either through hardware based volume replication (for example, EMC-SRDF) or host based volume replication (VVR). However, these technologies are more appropriate for inter-building clusters with no SAN availability, whereas this scenario is contained within the infrastructure of a single building. The other alternative would be to aggregate the read/write file services from more than one business unit, and therefore, deploy an N+1 topology with shared everything. This would yield a greater level of efficiency (for example, 2 + 1, 2 business units sharing a single standby). However, this solution would require SAN based storage infrastructure and sharing resources between business units may be against the banks policies.

  • Appendix B Case Study and Exercise Solutions B15Copyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 4 Exercises and SolutionsAdditional Infrastructure Details of Investment BankThe legacy infrastructure in place within each building comprises SAN and existing servers. The hope of management is that much of the existing infrastructure can be redeployed within the new framework so as to minimize the capital expenditure. The details of the infrastructure are shown in the following subsections.

    SAN InfrastructureThe bank has been deploying SAN infrastructure driven by the rapid growth in banking data and the legal requirements to maintain accessibility to that data for prolonged periods. For each building, there are two fabrics, which are for use as backup for each other, and load balancing where possible.The diagram shows the general connectivity scheme.

    Legacy StorageThe bank has also deployed hardware RAID technology to enhance flexibility in allocation of storage on demand, with resilience and performance also being the main drivers in the storage technology decision. Existing application environments are given storage space on designated HW RAID systems. Access to the specific LUN/Array is dual ported via fabrics A and B.

    B-5

    Building SAN Infrastructure

    Multiple Paths to Storage

    SAN Switch

    SAN Switch

    Host HWRAID

    SAN Switch

    SAN Switch

    Redundant Inter-Switch Links

    Redundant Inter-Switch Links

    Dual-Ported Storage

    Fabric A

    Fabric B

    Building

  • B16 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Legacy Server SpecificationThe legacy server specification includes a dual channel fibre controller for accessing the hardware RAID system over either fabric. Each system has internal mirrored boot disks and a single network interface.

    Investment Bank: Read-Only NFS Service1 Is it possible to implement a VCS cluster with the existing infrastructure

    alone? Does the legacy infrastructure satisfy the minimal VCS requirements to activate the cluster technology. If necessary, specify any hardware components that must be supplied in addition to the existing infrastructure.The legacy infrastructure on its own would only allow the deployment of multiple one-node clusters. This topology (1 node clusters), however, does not match with the optimal topology for the service identified in lesson 3 exercises. In essence the existing infrastructure alone is not enough to satisfy the VCS software framework to implement a solution in accordance with the business requirements. To achieve the minimal solution for the Read-Only NFS service topology (Shared nothing N + 1) will require the addition of extra network ports and private network switch technology, as follows:

    B-6

    Legacy Server Specification

    On BoardEther

    Controller

    Boot Mirrors

    Dual ChannelFibre

    Controller

    SpareSpareSpare

  • Appendix B Case Study and Exercise Solutions B17Copyright 2004 VERITAS Software Corporation. All rights reserved.

    The minimal solution to enable VCS to function in the 2+1 failover topology is as follows:

    B-7

    Legacy Server Specification and Hearbeat NIC

    On BoardEther

    Controller

    Dual ChannelFibre

    Controller

    Public NICsHeartbeat NICs

    Quad EtherController

    B-8

    Fabric A

    Heartbeat 1

    Fabric B

    Heartbeat 2

  • B18 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    2 Does the minimal functioning solution satisfy the business requirements? Explain where there may be a gap between the minimal solution and the business requirement.The minimal functioning VCS solution does not address the requirement that all single points of failure be removed. The specific issue is with the storage. Although the LUNs presented by the hardware RAID system are redundant, there is no protection against total array failure. In addition the fibre channel controller (host bus adaptor HBA) is a single point of failure as it provides access to both fabrics.

    Investment Bank: Read/Write Network File Service1 Is it possible to implement a VCS cluster with the existing infrastructure

    alone? Does the legacy infrastructure satisfy the minimal VCS requirements to activate the cluster technology. If necessary, specify any hardware components that must be supplied in addition to the existing infrastructure.The legacy infrastructure on its own would only allow the deployment of multiple one-node clusters therefore additional hardware is required. To achieve the minimal solution of the topology identified in lesson 3 exercises, (Shared Everything Asymmetric Failover) will require the addition of extra network ports, but whilst the servers remain in the same building this is all (private networks via cross-over Ethernet).

    B-9

    Legacy Server Specification and Hearbeat NIC

    On BoardEther

    Controller

    Dual ChannelFibre

    Controller

    Public NICsHeartbeat NICs

    Quad EtherController

  • Appendix B Case Study and Exercise Solutions B19Copyright 2004 VERITAS Software Corporation. All rights reserved.

    The minimal solution to enable VCS to function in the 2-node shared cluster topology is as follows:

    2 Does the minimal functioning solution satisfy the business requirements? Explain where there may be a gap between the minimal solution and the business requirement.The minimal functioning VCS solution does not address the requirement that all single points of failure be removed. The specific issue is with the storage. Although the LUNs presented by the hardware RAID system are redundant, there is no protection against total array failure. In addition the fibre channel controller (host bus adaptor HBA) is a single point of failure as it provides access to both fabrics.

    B-10

    Fabric A

    Fabric B

    CrossoverEthernet

    Private Network

    Public Nets

    Public Nets

  • B20 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Lesson 5 Exercises and SolutionsInvestment Bank: More Background informationThe investment bank provides e-mail services to its employees on a strategic level. Because of the size of the organization, the deployment and use of standard mail access daemons is not appropriate. The quantity of users and level of activity results in major performance issues using standard IMAP daemons and so the Cyrus IMAP daemon has been deployed on the mailbox servers. This daemon runs standalone, and so would normally be started at boot time by its own boot script. The message store is a database of messages. Users have rights to their own message areas. The IMAP daemon has a specific CLI for administering the message store including startup, status, close and crash recovery. The IMAP daemon binds to a local virtual IP address on the server on which it is running to provide access to clients.The Internet Communications group responsible for the mail services has a legacy environment that includes Volume Manager for managing JBOD disk arrays, and VxFS for file system recovery. The JBODs are SAN attached. Each mail service is identified by a different IP address and name in DNS.

    Questions1 From the information provided, identify the service group architecture for the

    simplest solution if the mailbox service is clustered, taking into account only one mailbox service with its storage and network components.The simplest solution comprises the storage components DiskGroup, Volume, Mount, network components NIC and IP, and the application resource for the IMAP Application, as shown here:

    IMAP Application

    Mount

    Volume

    DiskGroup

    IP

    NIC

  • Appendix B Case Study and Exercise Solutions B21Copyright 2004 VERITAS Software Corporation. All rights reserved.

    2 Because of the load on a single mailbox service, the bank has implemented two mailbox services at each location. Each mailbox service runs on a different server. The load on each mailbox service is balanced manually through the assignment of users to an appropriate mailbox service when the user joins the organization.The availability of these mailbox services is deemed critical to the bank. However, the bank is also very cost aware and does not want to provide a separate standby server for each individual mailbox service. Therefore, they have decided to add only one standby server to the two mailbox servers at each location for clustering as shown here:

    The systems do not have enough capacity to run both mailbox services at the same time. But all the systems are identical and they have access to all of the message stores through SAN. Propose a failover configuration for the 3-node cluster that would address the given requirements. Suggest a method to prevent a system running both mailbox services at the same time.With the given conditions you can either have a 2-to-1 failover configuration or a 2+1 failover configuration. A 2+1 failover configuration would be a better solution since there is SAN access and any system can act as a standby server. With the 2-to-1 failover configuration, one of the mailbox services would loose redundancy as soon as the other mailbox service fails over to the standby system since two mailbox services cannot run on the same system. To limit the number of service groups coming online on a system, define the Prerequisites attributes of the two service groups and the Limits attributes of the systems as follows:Prerequisites = { Weight=1 }Limits = { Weight=1 }

    B-12

    MessageStore 1

    MessageStore 2

    Mailbox 1 Mailbox 2 Standby

    SAN

  • B22 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    3 Considering that each system comes with one public network port, can you use the service group diagram provided in the solution of question 1 for both service groups?It is possible to use the service group diagram provided in the solution of question 1 for both service groups, defining different resource attributes for each. VCS will function properly in that environment, but this would be an inefficient design. The same NIC resource is defined in both service groups. This will result in network monitoring being unnecessarily duplicated even though there is only one hardware interface being monitored on each system. A better design would use Proxy resources in the failover service groups used for the mailbox services and design a separate parallel service group for the NIC resource. The parallel service group would also include a Phantom type of resource.

    4 Each server has been deployed with redundant NICs. Construct the service group diagrams for both service groups in the cluster. Take advantage of the redundant NICs to minimize service group failovers due to NIC, cable or switch failure.This solution requires the use of additional resource types not used in the previous simple solution: Proxy, MultiNICB, Phantom, and IPMultiNIC.The service group diagram is as follows:

    B-13

    IMAP Application

    Mount

    Volume

    DiskGroup

    Proxy

    IPMultiNICIMAP Application

    Mount

    Volume

    DiskGroupMultiNICB

    Phantom

    Proxy

    IPMultiNICParallelNIC

    Group

    Failover Group for Mailbox1 Failover Group for Mailbox2

  • Appendix B Case Study and Exercise Solutions B23Copyright 2004 VERITAS Software Corporation. All rights reserved.

    5 The bank has now decided to implement a web interface to the mailbox services in addition to the conventional methods like IMAP or POP. The web interface has two components: Apache web server:

    Apache web server provides the interface to the mailbox users. It has its own virtual IP address for the clients to connect. It uses local static web pages and therefore does not require any shared storage access.

    A webmail daemon:The webmail daemon is responsible for managing the TCP connections to the IMAP daemon and runs as a multithreaded single process. It maintains no state and as such is very crash resilient. The webmail daemon is effectively no different from a standard IMAP client like Microsoft Outlook, however, the user interface to the customers is running on the Apache web server.

    The key points about the web interface are as follows: The communication between the Apache web server and the webmail

    daemon is through UNIX sockets, so the Apache web server and the webmail daemon have to be running on the same machine. However, there is no specific dependency between them; they can be started or stopped in any order.

    Users connect by browser to a login page presented by the Apache web server. When a user logs in, he/she presents his/her username and server name in this form: user@server. The user and server names are parsed and then the server name used by the webmail daemon to direct the connection to the appropriate mail server. The webmail daemon can therefore connect to local or remote IMAP services.

    The webmail daemon does not depend on any IP address, it can connect to an IMAP daemon running on any other system as long as the system it is running on has access to the public network.

    The bank has decided to use the same 3-node cluster to provide redundancy for the web interface. Draw the service group diagram for the web interface. Considering the nature of the IMAP application, Apache web server and the webmail daemon, discuss the type of agents (bundled, Enterprise or custom) you can use for each application resource.The service group diagram for all service groups is as follows:

  • B24 High Availability Design and Customization Using VCSCopyright 2004 VERITAS Software Corporation. All rights reserved.

    Apache: Apache is a well known and stable application. It can quite easily be managed through the configuration of the bundled Application agent. However, as it is so widely used, there also exists a VERITAS supported enterprise agent for Solaris, and a bundled agent on Linux.

    Webmail daemon: This is third party technology which will most easily be integrated using the Process agent. Simple monitoring will suffice for this application. Consideration for the development of a monitor script to test IMAP connectivity may be considered.

    IMAP: The IMAP daemon, is a complex daemon with a rich user and administration CLI. It is essentially an RDBMS for email messages. For full confidence in the services functionality and safe startup, shutdown and monitoring, this application is a reasonable candidate for investigating custom agent development. Alternatively, for a minimal solution, integration with the cluster configuration through use of the Application agent will suffice, maybe with some specialized startup and monitor scripts.

    B-14

    IMAP Application

    Mount

    Volume

    DiskGroup

    Proxy

    IPMultiNICIMAP Application

    Mount

    Volume

    DiskGroupMultiNICB

    Phantom

    Proxy

    IPMultiNICParallelNIC

    Group

    Failover Group for Mailbox1 Failover Group for Mailbox2

    Apache Web Server

    IPMultiNIC

    Proxy

    Webmaildaemon

    Failover Group for Web interface

  • Appendix B Case Study and Exercise Solutions B25Copyright 2004


Recommended