+ All Categories
Home > Documents > SRM Beta Program Evaluator Guide

SRM Beta Program Evaluator Guide

Date post: 09-Feb-2017
Category:
Upload: truongphuc
View: 222 times
Download: 1 times
Share this document with a friend
53
Evaluator Guide Site Recovery Manager 1.0
Transcript
Page 1: SRM Beta Program Evaluator Guide

Evaluator Guide

Site Recovery Manager 1.0

Page 2: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

2 VMware, Inc.

© 2006-2008 VMware, Inc. All rights reserved. Protected by one or more of U.S. Patent Nos. 6,397,242, 6,496,847, 6,704,925, 6,711,672, 6,725,289, 6,735,601, 6,785,886, 6,789,156, 6,795,966, 6,880,022, 6,944,699, 6,961,806, 6,961,941, 7,069,413, 7,082,598, 7,089,377, 7,111,086, 7,111,145, 7,117,481, 7,149,843, 7,155,558, 7,222,221, 7,260,815, 7,260,820, 7,269,683, 7,275,136, 7,277,998, 7,277,999, 7,278,030, 7,281,102, and 7,290,253; patents pending.

VMware, the VMware “boxes” logo and design, Virtual SMP and VMotion are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. VMware, Inc. 3401 Hillview Ave. Palo Alto, CA 94304 www.vmware.com

Page 3: SRM Beta Program Evaluator Guide

3 VMware, Inc.

CONTENTS

About This SRM 1.0 Evaluator Guide SRM Quick Start Checklist SRM Evaluation Checklist Chapter 1: Overview of VMware Site Recovery Manager (SRM)

Chapter 2: Planning for BC/DR when using VMware SRM Chapter 3: SRM Workflow setup at the Protected and Recovery sites Chapter 4: Using SRM to run a Test against a Recovery Plan Chapter 5: Using SRM to failover the Protected Site to the Recovery Site Chapter 6: Failback from the Recovery Site to the Protected Site

Chapter 7: SRM Alarms and Site Status Monitoring Chapter 8: SRM Roles and Privileges

Conclusion

Page 4: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

4 VMware, Inc.

About This Evaluator Guide Intended Audience The Site Recovery Manager (SRM Evaluator Guide is intended to provide the SRM customers and evaluators with a guide that will walk them through the SRM workflow that has to be completed to allow for the successful and automated service failover from the designated SRM protected site to the designated SRM recovery site. This guide will also provide an overview which includes the considerations and guidance to execute a failback of services from the recovery site back to the site that was originally designated as the SRM protected site. To successfully use this SRM Evaluator Guide the following is assumed:

ESX Server 3.0.2 or ESX 3.5 has been installed on physical servers in the SRM protected and recovery sites.

An instance of VirtualCenter 2.5 exists in each of the SRM protected and recovery sites.

A multisite SAN infrastructure is in place, and setup to replicate designated VMFS datastores between the SRM protected and recovery sites.

The virtual machines (VMs) that have been selected to be protected VMs for the SRM evaluation have been moved onto the designated replicated datastores. VMs that have not been selected to be protected VMs for the evaluation should be moved to non replicated datastores. If you are running ESX 3.5, Storage VMotion could be used to complete the move with zero downtime.

Referring to the SRM Installation and Administration Guide for details complete the following:

The basic installation of Site Recovery Manager on the SRM or VirtualCenter servers in the SRM protected and recovery sites has been completed.

A SRM license is installed on the VirtualCenter license server at the protected and recovery sites.

The installation of the SRM plug-in has been completed and the SRM plug-in has been enabled on the Virtual Infrastructure Client instances that will be used to access the SRM protected and recovery sites.

VMware Infrastructure Documentation If you need additional information on VMware Virtual Infrastructure, consult the VMware Infrastructure documentation, which consists of the combined VMware VirtualCenter and ESX Server documentation set. Documentation is available from: http://www.vmware.com/support/pubs/

Page 5: SRM Beta Program Evaluator Guide

5 VMware, Inc.

Disaster Recovery (DR), Virtual Infrastructure (VI) and SRM abbreviations Used in this Guide The following DR, VI and SRM abbreviations are used throughout this evaluator guide:

Abbreviation Description

BC/DR Business Continuity and Disaster Recovery

SRM Site Recovery Manager

VC VirtualCenter

VI Client Virtual Infrastructure Client used to access Virtual Center and SRM

VM Virtual machines on a managed host

RP Virtual Infrastructure Resource Pool

VMFS Virtual Machine File System

SAN Storage area network type datastore shared between managed hosts

Disaster Recovery (DR) and SRM Terminology Used in this Guide The following DR and SRM terminology is used throughout this evaluator guide:

DR and SRM Terminology

Description

array-based replication

Replication of virtual machines that is managed and executed by the storage subsystem itself rather than from inside the virtual machines, the vmkernel or the Service Console.

logical unit number (LUN)

Refers to a single SCSI storage device on the SAN that can be mapped to one or more ESX Servers.

Failover Event that occurs when the recovery site takes over operation in place of the protected site after the declaration of a disaster.

Failback Reversal of failover, returning IT operations to the primary site.

datastore Storage for the managed host

Host VirtualCenter managed hosts

SRM Server Manages and monitors the SRM recovery plans

protected VM A VM that is protected by SRM because it is located on a replicated datastore

un-protected VM A VM that is not protected by SRM because it is located on a non replicated datastore

protected site The site that contains the protected VMs

recovery site The site that contains the replicated protected VMs from the protected site

datastore group Replicated datastores containing complete sets of protected VM

protection group A group of VMs that will be failed over together to the recovery site during test or recovery

Storage Replication Adapter (SRA)

Enables SRM to interact with a storage array

shadow VM An artifact in the recovery site VC inventory that represents a protected VM from the protected site VC

inventory mappings Associations between protected resource pools VM folders, networks and their destination counterparts at the recovery site

recovery plan Contains the complete set of steps needed to recover (or test recovery of) the protected VMs in one or more protection groups

Document Feedback VMware welcomes your suggestions for improving our documentation. If you have comments, send your feedback to: [email protected].

Page 6: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

6 VMware, Inc.

Technical Support and Education Resources The following sections describe the technical support resources available to you.

Online Support for the Site Recovery Manager Technical support is available through the Support Request (SR) system. Go to http://www.vmware.com/support/ and click on Create Support Request.

Please note that you will need a valid VMware account in order to open a SR. If you do not already have an account, you will need to register for one here.

Support Offerings Find out how VMware support offerings can help meet your business needs, go to http://www.vmware.com/support/services. VMware Education Services VMware courses offer extensive hands on labs, case study examples, and course materials designed to be used as on the job reference tools. For more information about VMware Education Services, go to http://mylearn1.vmware.com/mgrreg/index.cfm.

Page 7: SRM Beta Program Evaluator Guide

7 VMware, Inc.

SRM Quick Start Checklist Before starting to work through the SRM installation and configuration workflows that are outlined in this SRM Evaluator guide, we recommend you refer to the SRM Storage Release notes that are specific for the SRM supported storage platform that you will be using in your protected and recovery site and work through the checklist to ensure your storage platforms are ready for integration with SRM. Once you have worked through the appropriate storage checklist you should then proceed to work through the SRM Pre-Install checklist below which when completed will ensure you are ready to proceed with the setup SRM. SRM Pre-Install Checklist

Site Description Yes / No

Using your VMware Store Account access the URL below to download the SRM software, Storage Replication Adapter, and other relevant product and program information: http://www.vmware.com/download/srm/eval.html

Protected SQL Enterprise 2005 or Oracle Database server setup and ready for use.

Protected A database instance has been created for Virtual Center

Protected A database user created with „db owner‟ and „create table‟ privileges

Protected A system DSN created for the VC database

Protected Virtual Center 2.5 server installed and ready for use.

Protected The ability to access the Virtual Center 2.5 server via the VI Client

Protected At least one ESX Server 3.0.2 or ESX Server 3.5 installed and integrated into Virtual Center, with access to a LUN on a SAN that has been configured as a VMFS datastore and setup for data replication to a corresponding SAN in the recovery site.

Protected A database instance has been created for Site Recovery Manager (SRM)

Protected A database user created with „db owner‟ and „create table‟ privileges

Protected A system DSN created for the SRM database

Protected Identify a system (physical or virtual) to install the SRM software and the Storage Replication Adapter (SRA) for your respective array

Recovery SQL Enterprise 2005 or Oracle Database server setup and ready for use.

Recovery A database instance has been created for Virtual Center

Recovery A database user created with „db owner‟ and „create table‟ privileges

Recovery A system DSN created for the VC database

Recovery Virtual Center 2.5 server installed and ready for use.

Recovery The ability to access the Virtual Center 2.5 server via the VI Client

Recovery At least on ESX Server 3.0.2 or ESX Server 3.5 installed and integrated into Virtual Center, with access to a LUN on a SAN that has been configured as a VMFS datastore and setup for data replication to a corresponding SAN in the recovery site.

Recovery A database instance has been created for Site Recovery Manager (SRM)

Recovery A database user created with „db owner‟ and „create table‟ privileges

Recovery A system DSN created for the SRM database

Recovery Identify a system (physical or virtual) to install the SRM software and the Storage Replication Adapter (SRA) for your respective array

Page 8: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

8 VMware, Inc.

SRM Evaluation Checklist To aid you with your SRM evaluation please refer to the SRM Evaluation Checklist below which provides a high level summary of the various SRM workflows and configuration tasks that should be completed during your SRM Evaluation. SRM Evaluation Checklist

Site Description Yes / No

Protected Connection: This involves pairing the VirtualCenter servers at the protected and recovery sites. CH 3 - Pg 15

Protected Array Managers: SRM leverages array based replication between a protected site and a recovery site. When working through the Array Manager configuration wizard SRM will identify which arrays are available including the datastores groups that have been setup for replication between the protected and recovery site. CH 3 - Pg16

Protected Inventory Preferences: Using the Inventory Mapper wizard, the protected VMs now need to be mapped to the Networks, Compute Resources and Virtual Machine Folders that are available at the recovery site. CH 3 - Pg 19

Protected Protection Groups: A protection group is a group of VMs that will be failed over together to the recovery site.Work through the Protection Groups configuration wizard to complete the Protection Group setup. CH 3 - Pg 20

Recovery Recovery Plan. A recovery plan describes the steps necessary to recover the protected VMs in one or more protection groups. These steps can be predefined (e.g. Power On VM) or user-defined callouts. When a recovery plan is defined, the basic steps necessary to recover the protection groups it contains are automatically generated. CH 3 - 25

Protected IP Address Network Customization to allow protected VMs to start with the correct IP addresses and network configuration in the recovery site. CH - 3 Pg 25 Note: This task may not be required for protected VMs that use DHCP to obtain an IP address or in environments that have a Stretched VLAN network topology.

Recovery Test your Recovery Plan via the SRM TEST Recovery Plan option. CH 4

Recovery Run your Recovery Plan via the SRM RUN Recovery Plan option. CH 5 Note: SRM does not support an automated „push one button‟ failback via the SRM User Interface. A failback to the original protected site is possible and is documented in CH 6 should you want to resync the data in the recovery site back to the protected site.

Recovery Protected

Failback (Optional). Refer to CH 6 for the failback procedure which will involve you working closely with your storage team to complete failback.

Protected Recovery

SRM Alarms and Site Monitoring. Enable the appropriate notifications and alarms to stay in compliance with your documented monitoring policies. CH 7

Protected Recovery

SRM Roles and Privileges. Assign the appropriate SRM Roles to stay in compliance with your documented security policies. CH 8

Page 9: SRM Beta Program Evaluator Guide

9 VMware, Inc.

Chapter 1: Overview of VMware Site Recovery Manager (SRM)

VMware Site Recovery Manager(SRM) provides business continuity and disaster recovery protection for virtual environments. Protection can extend from individual replicated datastores to an entire virtual site. VMware‟s virtualization of the data center offers advantages that can be applied to business continuity and disaster recovery:

The entire state of a virtual machine (memory, disk images, I/O and device state) is encapsulated. Encapsulation enables the state of a virtual machine to be saved to a file. Saving the state of a virtual machine to a file allows the transfer of an entire virtual machine to another host.

Hardware independence eliminates the need for a complete replication of hardware at the recovery site. Hardware running ESX at one site can provide business continuity and disaster recovery protection for hardware running ESX at another site. This eliminates the cost of purchasing and maintaining a system that sits idle until disaster strikes.

Hardware independence allows an image of the system at the protected site to boot from disk at the recovery site in minutes or hours instead of days.

SRM leverages array based replication between a protected site and a recovery site. The workflow that is built into SRM automatically discovers which datastores are setup for replication between the protected and recovery sites. SRM can be configured to support bi-directional protection between two sites.

SRM provides protection for the operating systems and applications encapsulated by the virtual machines running on ESX. A SRM server must be installed at the protected site and at the recovery site. The protected and recovery sites must each be managed by their own VirtualCenter Server. The SRM server uses the extensibility of the VirtualCenter Server to provide:

Access control

Authorization

Custom events

Event-triggered alarms

Site Recovery Manager Prerequisites

SRM has the following prerequisites:

A VirtualCenter server installed at the protected site.

A VirtualCenter server installed at the recovery site.

Pre-configured array-based replication between the protected site and the recovery site.

Network configuration that allows TCP connectivity between SRM servers and VC servers

An Oracle or SQL Server database that uses ODBC for connectivity in the protected site and in the recovery site.

A SRM license installed on the VC license server at the protected site and the recovery site.

Page 10: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

10 VMware, Inc.

Site Recovery Manager Configuration and Protection

Setup and configuration are accomplished by following workflows for the protected and recovery sites. SRM is installed as a plugin into a Virtual Infrastructure Client (VI Client). SRM uses the VI Client as the User Interface (UI). The SRM UI is accessed by clicking on the Site Recovery icon in the VI client toolbar and is used for the setup of the SRM workflows, recovey plan testing as well as services failover from the protected site to the recovery site.

It is important to complete the worklows in the order they are presented in this guide.

The recovery site configuration workflow involves the following activities:

The user installs the SRM server.

The user installs the SRM plugin into the VI Client

The protection site configuration workflow involves the following activities:

The user installs the SRM server.

If a different VI Client is used to access the protected and recovery sites, the user installs the SRM plugin into the VI Client, otherwise this activity can be skipped.

Security certificates are established between the SRM servers and the VC servers.

The user pairs the SRM servers at the protected and recovery sites.

SRM identifies available arrays and replicated datastores and determines the datastore groups.

The protection site protection workflow involves the following activities:

Using the Inventory Mapper, the user maps the networks, compute resources and virtual machine folders in the protected site to their counterparts in the recovery site.

The user creates protection groups from the datastores discovered by SRM.

For each protected VM, the user can override default values.

The recovery site protection workflow involves the following activities:

The user creates the recovery plan.

SRM creates the recovery plan steps.

Optionally the user has the ability to customize the recovery plan

Failover and Testing

SRM automates many of the tasks required at failover. With the push of one button, SRM:

will power down the protected VMs if there is connectivity between sites and they are online.

suspend data replication and Read/Write enable the replica datastores.

rescan the ESX servers at the recovery site.

registers the replicated protected VMs.

shuts down non-essential VMs at the recovery site if required to free up resources for the protected VMs being failed over.

completes power-up of replicated protected VMs in accordance with the recovery plan.

Page 11: SRM Beta Program Evaluator Guide

11 VMware, Inc.

SRM does not require production system downtime to run tests. This means you can test often to ensure that you are protected in case of a disaster. For testing, SRM:

creates a test environment that includes network and storage infrastructure that is isolated from the production environment.

rescans the ESX servers.

registers the replicated VMs.

completes power-up of protected VMs in the order specified during creation of the disaster recovery plan.

provides a report of test results.

resets everything in preparation for a disaster or next scheduled SRM Test.

Page 12: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

12 VMware, Inc.

Chapter 2: Planning for BC/DR when using VMware SRM This chapter will provide an overview of the site planning and preparation that should be completed to ensure the SRM protected and recovery sites are prepared for the SRM Evaluation.

Figure 2.1

Figure 2.1 represents a SRM protected site which contains „local services‟ and „protected services‟. The „local services‟ are infrastructure type services (Active Directory, Print services, Virus Management services and Security Camera services) and are generally bound to the data center. The „protected services‟ are application type services, and these are the services that need to be made available to the business at time of test or disaster. This will be accomplished using SRM. Using the SRM protected site depicted in Figure 2.1 we will now review the planning and preparations that should be completed to ensure both the SRM protected and recovery sites are ready for a successful SRM deployment. Site planning and preparation at the protected site involves the following:

Identify which VMs will be designated as protected VMs. o app_vm1 through app_vm12

Identify which VMs will be designated as un-protected VMs o ad_server, print_server, security_camera_server and virus_mgt_server

Determine the number of datastore groups that will be required to hold the protected VMs. o Based on the 12 VMs we have designated to be protected VMs and for the purposes of

the SRM configuration that will be depicted in this evaluator guide we will require 2 datastore groups which will contain six complete VMs per datastore group.

Page 13: SRM Beta Program Evaluator Guide

13 VMware, Inc.

If existing datastores will be used for the protected VMs, identify which datastores need to be configured as datastore groups otherwise provision the required number of new datastores to host the protected VMs. Working with your SAN team ensure all the datastores that will host protected VMs are configured as datastore groups i.e. setup for replication between the protected and recovery site.

o Referring to Figure 2.2, we will require 2 datastore groups, shared-san-1 and shared-san-2, which were previously configured to allow for the replication of data to the recovery site. Note: The setup and configuration of SAN replication will differ from array vendor to array vendor, if you are unsure of how to complete the necessary replication setup and configuration, consult with your array vendor who should be in a position to provide you with all the necessary information.

Move all the designated protected VMs onto the SRM datastore groups. Storage VMotion can be used to complete the relocation of the protected VMs with zero service downtime. If possible ensure there are only protected VMs on the datastores that are being replicated from the protected site to the recovery site. Referring to Figure 2.2 which is a VMware topology map view, it is clear to see that:

o app_vm1 through app_vm6 are hosted from datastore group shared-san-1 o app_vm7 through app_vm12 are hosted from datastore group shared-san-2 o The non replicated infrastructure VMs are hosted from datastore vim22-storage1

Figure 2.2

Page 14: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

14 VMware, Inc.

Figure 2.2 represents a different view of the same SRM protected site depicted in Figure 2.1 which contains „local services‟ being hosted from datastore vim22-storage1 and the „protected services‟ hosted from datastore groups shared-san-1 and shared-san-2 respectively. The „protected services‟ are under the control of SRM and will be made available at time of test or disaster via SRM at the recovery site.

Figure 2.3 Figure 2.3 represents a SRM recovery site which contains „local services‟ and will also service the failed over „protected services‟ which are all the protected VMs hosted from datastore groups shared-san-1 and shared-san-2 in the SRM protected site depicted in Figure 2.2. Once again the „local services‟ are infrastructure type services (Active Directory, Print services, Virus Management services and Security Camera services) that are bound to the recovery site data center.

Page 15: SRM Beta Program Evaluator Guide

15 VMware, Inc.

Chapter 3: SRM Workflow setup at the Protected and Recovery sites This chapter will provide an overview of the SRM workflows that have to be completed to ensure SRM is providing BC/DR services for the designated virtual machines at time of test or during an actual event that necessitated the declaration of a disaster.

Figure 3.1

The SRM workflows that will be outlined below will be associated with the virtual data centers vim22dc and vim23dc depicted in Figures 2.1, 2.2 and 2.3.

Figure 3.1 shows part of the VI Client window, the Site Recovery icon has been highlighted as well as the Setup pane. The protected site SRM workflows will be completed via the VI Client by selecting Configure and working through the configuration wizard for each of the steps identified below:

Connection: This involves pairing the VirtualCenter servers at the protected and recovery sites.

The VC server in the local data center vim22dc is dr-vc-vim22.eng.vmware.com. The VC server in the remote data center vim23dc is dr-vc-vim23.eng.vmware.com.

Page 16: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

16 VMware, Inc.

Figure 3.2

Once the remote VC servers information has been entered you will be presented with the following Connect to Remote Site window.

Figure 3.3

Once reciprocity has been established, click Close to complete the setup. You are now ready to move onto the Array Manager configuration step.

Note: If you are using certificates that are not properly signed, the last two check marks in Figure 3.3 may appear as „yellow‟ warning triangles when „Reciprocity is established‟. The use of certificates that are not properly signed will not prevent you from moving onto the next SRM configuration step which involves configuring the Array Managers.

Array Managers: SRM leverages array based replication between a protected site and a recovery site. When working through the Array Manager configuration wizard SRM will identify which arrays are available including the datastores groups that have been setup for replication between the protected and recovery site.

Page 17: SRM Beta Program Evaluator Guide

17 VMware, Inc.

Virtual machines reside on VMFS datastores which are created on LUNs that reside on the storage arrays. SRM uses the term "datastore group" as a way of identifying a replicated datastore(s) that protect virtual machines. If SRM detects a virtual machine spanning more than one datastore (i.e VM has two virtual disks one on each datastore) then to allow that whole VM to be failed over the SRM datastore group *must* contain both datastores and SRM will enforce this, we will cover this further below. In SRM a datastore group is the basic unit of replication.

VMFS

Datastore B

VMFS

Datastore A

VMFS

Datastore C

VMFS

Datastore DE

Replicated

LUN C

Replicated

LUN AReplicated

LUN D

Replicated

LUN E

VM

4

. vmx

VM

4

Disk

VM

5 .vmx

VM

5Disk

VM 1

. vmx

VM 1

Disk

VM

3 .vmx

VM

3Disk

VM

2

Disk

VM

2.vmx

Datastore Group 1 Datastore Group 2

Replicated

LUN B

Non

VM 1 is

Protected by SRM

VM 2 is not

Protected by SRMVM 3, VM 4 and VM 5

are Protected by SRM

SRM PROTECTED SITE - STORAGE ARRAY

Figure 3.4

In Figure 3.4, VM1 is contained in Replicated LUN A (Datastore Group 1) so VM1 is a protected

VM. VM2 is contained on a non replicated LUN and therefore VM2 is not protected by SRM.

Datastore Group 2 consists of Datastore C and Datastore DE which contains protected VMs 3,

4, and 5. It is worth noting that even though VM4 spans two datastores, it is completely

contained within Datastore Group 2 and as a result is fully protected by SRM. In SRM an entire

VM needs to be located within a datastore group which has a one to one mapping to a

protection group. Datastore groups are automatically discovered by SRM, a datastore group is

defined by the configuration of the virtual machines selected for protection by SRM. SRM

protection groups will be covered at a later stage in this guide.

It is worth noting at this time that VMware is actively working with our storage partners who are responsible for the development of their own Storage Replication Adapaters (SRAs), which will enable their storage arrays to integrate with SRM. For this reason VMware anticipates the list of storage Manager Types to become more extensive over time as the storage partners complete work on their respective SRAs.

Page 18: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

18 VMware, Inc.

If you do not see a Manager Type for a storage array which you have in your environment that you wish to integrate with SRM, VMware strongly urges you to follow-up directly with the storage vendor in question to enquire about the availability of their SRM storage replication adapter as VMware is not in a position to comment on the availability of products currently under development by our partners, in this case our storage partners.

Working through the Array Manager configuration wizard will take you to the Add Array Manager window depicted in Figure 3.5, select the correct Manager Type for the SAN in your environment.

Figure 3.5

Once you have selected the correct Manager Type from the drop down box, complete the entry of all the appropriate information within the Array Manager Information section and click Connect to start the SRM Discover Storage Array process which will run for several minutes. The Array Manager configuration wizard will walk you through the configuration for the Protection Side Array Managers and Recovery Side Array Managers.

Figure 3.6 is a consolidated view of the three Configure Array Managers windows to shows an example of the information that will be presented at the end of the SRM Discover Storage Array process for the protected and recovery sites. The information shown below is for the Virtual Data Centers vim22dc and vim23dc. At this stage you are now ready to move onto the Inventory Preferences configuration step.

Page 19: SRM Beta Program Evaluator Guide

19 VMware, Inc.

Figure 3.6

Inventory Preferences: Using the Inventory Mapper wizard, the protected VMs now need to be mapped to the Networks, Compute Resources and Virtual Machine Folders that are available at the recovery site. The mappings are completed via the Inventory Preferences pane below by clicking on the respective Primary Site Resources catergory and clicking on Edit and working through the inventory mapper wizard for the respective resource catergory.

Page 20: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

20 VMware, Inc.

Figure 3.7

In Figure 3.7 the Inventory Preferences pane shows the configured Inventory Preferences for the protected VMs in the virtual data center vim22dc and how they are mapped to the appropriate resources in the virtual data center vim23dc which is the designated recovery site.

Protection Groups: A protection group is a group of VMs that will be failed over together to the recovery site. A protection group is associated with a single datastore group. A datastore group could contain a single datastore or multiple datasotres as illustrated in Figure 3.4.

Working through the Protection Groups configuration wizard you will get to the window shown in Figure 3.8. During the creation of the Protection Groups, SRM requires a location to store some temporary VirtualCenter inventory files for the protected VMs. SRM will present the available datastores at the recovery site that could be selected for the storing of these temporary files. It is preferable and suggested that you select a non replicated datastore for these temporary files at the recovery site.

Figure 3.8

Page 21: SRM Beta Program Evaluator Guide

21 VMware, Inc.

Select the datastore that will store the temporary virtual machine files and click Next which will take you to the next Create Protection Group window shown in Figure 3.9. You will now be presented with the list of all the protected VMs that will be assigned to the Protection Group currently being created.

Figure 3.9

Figure 3.9 shows the six protected VMs assigned to the first protection group called „Protection Group 1‟ which was created in the virtual data center vim22dc.

Page 22: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

22 VMware, Inc.

Figure 3.10

Figure 3.10 shows the six protected VMs assigned to the second protection group called „Protection Group 2‟ which was created in the virtual data center vim22dc.

The creation of the protection groups completes the SRM workflow activities for the protected site. To recap we have worked through the following SRM workflow activities so far:

Connection: This involved the pairing the VirtualCenter servers at the protected and recovery sites.

Array Managers: SRM leverages array based replication between a protected site and a recovery site. The integration of array based replication with SRM is achieved by selecting the correct Manager Type for the SAN in your protected and recovery sites.

Inventory Preferences: Using the Inventory Mapper , the protected VMs are mapped to the Networks, Compute Resources and Virtual Machine Folders that are available at the recovery site.

Protection Groups: A protection group is a group of VMs that will be failed over together to the recovery site. The creation of a protection group results in VC inventory updates in the recovery site.

Page 23: SRM Beta Program Evaluator Guide

23 VMware, Inc.

Figure 3.11 shows the view of the recovery site virtual data center vim23dc, it is worth noting that once the protection groups were created, the virtual infrastructure inventory in the recovery site was automatically updated with new inventory objects. The first new inventory object is the nested resource pool called recovery under the top level RP called shared. The remaining inventory objects that have been added are the protected VMs from the SRM protected site.

Figure 3.11

The remaining workflow activity which is the creation the SRM Recovery Plan and any subsequent customizations to the recovery plan are completed via the VI Client connecting to the VC server in the designated recovery site

The recovery site protection workflow involves the following activities:

Building a Recovery Plan. A recovery plan describes the steps necessary to recover the protected VMs in one or more protection groups. These steps can be predefined (e.g. Power On VM) or user-defined callouts. When a recovery plan is defined, the basic steps necessary to recover the protection groups it contains are automatically generated. These steps can then be customized by you by re-ordering existing steps or adding new steps and callouts. When a Recovery Plan is created for one or more protection groups, the plan is automatically populated with the basic steps needed to failover the protected VMs to the recovery site.

Page 24: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

24 VMware, Inc.

Figure 3.12 shows the recovery steps from a SRM recovery plan called „Recovery Plan 2 – Protection Group 2‟ required to complete a partial site failover for the local data center vim22dc which is protected by SRM. The protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2.

Figure 3.12

Page 25: SRM Beta Program Evaluator Guide

25 VMware, Inc.

In Figure 3.13 the VI Client lists three Recovery Plans that were created by working through the Recovery Plan wizard. To create a new Recovery Plan, click on the Add button on the toolbar or Add Recovery Plan under the Commands section and work through the Recovery Plan wizard.

Figure 3.13

The following section will illustrate one of the supported ways to customize settings associated with a protected VM. By working through the steps that follow you will be able to customize the network configuration settings for a protected VM which will allow the protected VM to start up at the recovery site after an actual failover with an IP address that is correct for the network in the recovery site. Referring to Figure 3.14, working from the VI Client that is connected to the recovery site complete the following. From the Edit menu option, click on Customization Specifications and work through the wizard that follows.

Figure 3.14

A Customization Specification Manager window opens up, click on New, and complete the information being requested by the wizard. Ensure you select the correct Target Virtual Machine OS and provide a name for virtual machine customization profile in the Customization Specification Information as shown in Figure 3.15.

Page 26: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

26 VMware, Inc.

Figure 3.15 Click Next to continue to the window below which is depicted by Figure 3.16.

Figure 3.16

Page 27: SRM Beta Program Evaluator Guide

27 VMware, Inc.

At this point it worth noting that the only information that will be applied to the protected VM you wish to customize via SRM will be Network information. For the SRM 1.0 release you may need to provide information for all the virtual machine properties highlighted in Figure 3.16 by the two „red boxes‟ to allow you to move onto the next virtual machine property screen as the Next button will only become active once you enter text into the property field currently being displayed to you.

Figure 3.17

When you get to the Network properties section as depicted in Figure 3.17, please be sure to click on the Custom Settings radio button, and then click on the Next button to proceed to the Network Custom Settings window shown in Figure 3.18.

Figure 3.18

Page 28: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

28 VMware, Inc.

Referring to Figure 3.18, highlight the NIC 1 and click on the Customize button which will take you to the standard Network Properties window below which is shown in Figure 3.19. Complete entering all the relevant network configuration information you wish to assign to the protected VM when it is started in the recovery site and click OK.

Figure 3.19 Complete working through the Customization Specifications wizard which will bring you to the Customization Specification Manager window depicted in Figure 3.20.

Figure 3.20 The steps that follow outline how to apply the network customization profile to a protected VM, in this case app_vm12 which is completed via the VI Client that is connecting to the protected site. Select the protection group that contains the virtual machine you wish to customize. In this case we want to customize app_vm12 which is associated to Protection Group 2 as indicated by Figure 3.21. Click on Configure Protection which will launch the Recovery Launch wizard.

Page 29: SRM Beta Program Evaluator Guide

29 VMware, Inc.

Figure 3.21 Working through the Recovery Launch wizard, you will get to the Virtual Machine Customization Window that is shown in Figure 3.22. From the drop down box select the virtual machine customization profile you wish to assign the protected VM, in this case customization_appvm12, and click Next to proceed onto the last two remaining customizations which are Before Power On and After Power On which will conclude Customization Specifications wizard.

Figure 3.22

Page 30: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

30 VMware, Inc.

Once the Customization Specifications wizard closes you will be taken back to the VI Client view depicted in Figure 3.23. Under the Recent Tasks pane you should see an entry which will serve as confirmation that the customization profile customization_appvm12 was successfully applied o app_vm12. The protected VM app_vm12 has now been customized and will start up with the network configuration information assigned to the customization profile customization_appvm12.

Figure 3.23

Page 31: SRM Beta Program Evaluator Guide

31 VMware, Inc.

Chapter 4: Using SRM to run a Test against a Recovery Plan

This chapter will provide an overview of how SRM enables you to „Test‟ a recovery plan by simulating a failover of virtual machines from the protected site to the recovery site. The benefit of using SRM to run a failover simulation against a recovery plan is that it allows you to confirm that the recovery plan has been setup correctly for the protected VMs. You will be able to confirm that the protected VMs startup in the correct order, taking into account the various application service dependencies for the protected VMs in your environment. It is worth pointing out that when you select the option to „Test‟ a recovery plan via SRM, the simulated failover is executed in an isolated environment that includes network and storage infrastructure at the recovery site that is isolated from the protected site (production environment) which ensures the protected VMs at the protected site are not subject to any kind of service interruption during the testing of the recovery plan. SRM will also create a test report that can be used to demonstrate your level of preparedness to the business or individual business units whose services are being protected by SRM as well as to the auditors and compliance officers if required. The simulated failover completes by resetting the environment to be ready for the next event which could be another simulated failover, or an actual failover for a scheduled BC/DR test or in response to an event which resulted in the business declaring a disaster.

Figure 4.1

Page 32: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

32 VMware, Inc.

We will now work through a simulated failover leveraging the SRM „Test‟ a recovery plan option. In Figure 4.1 the VI Client lists the three Recovery Plans that were created by working through the Recovery Plan wizard. There are two ways to initiate the simulated failover, you can either click on the „Test‟ button in the toolbar or click on the „Test Recovery Plan‟ link under the Commands section, and both are highlighted in Figure 4.1. Before the simulated failover is started you will be presented with the dialog box (Figure 4.2) that informs you the performance of local virtual machines may be impacted if there are insufficient compute resources at the recovery site to support the local virtual machines and protected VMs. The dialog box also informs you that the replication of the datastore groups may be suspended during the simulated failover. Click „Yes‟ to start the „Test‟ of your recovery plan.

Figure 4.2 While the simulated failover test is running, the status of each step that makes up the recovery plan can be monitored by going to Recovery Steps tab in the VI Client which will inform you what steps are currently Running as well as what steps were completed with a Success status. It is worth pointing out that there are some steps in a recovery plan that will only be executed during a simulated test, these steps are identified by „Test Only‟ under the Mode column, there are also some steps that will only be executed during an actual failover, these steps are identified by „Recovery only‟ under the Mode column.

Figure 4.3

Page 33: SRM Beta Program Evaluator Guide

33 VMware, Inc.

Figure 4.4 shows a partial view of the recovery site‟s VI Client window. SRM provides an audit trail via a report which is generated automatically at the end of each SRM Test or SRM Recovery. The reports are accessible via the History tab and can be viewed by clicking on the View link under the Actions column, which will result in a browser window opening that contains a log of the steps executed during the test, with the total time to execute the recovery plan and the time it took to execute each step in the recovery plan.

Figure 4.4

The following is a recap of the highlevel tasks excecuted by SRM when performing a simulated failover via the „Test‟ a recovery plan option that is availble via SRM enabled VI Client. With the push of one button, SRM:

creates a test environment that includes network and storage infrastructure that is isolated from the production environment.

rescans the ESX servers.

registers the replicated VMs.

completes power-up of protected VMs in the order specified during creation of the disaster recovery plan.

provides a report of test results.

resets everything in preparation for a disaster or next scheduled SRM Test.

Page 34: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

34 VMware, Inc.

Chapter 5: Using SRM to failover the Protected Site to the Recovery Site This chapter will provide an overview of how SRM enables you to „Run‟ a recovery plan which will result in the actual failover of virtual machines from the protected site; the failover process via SRM is rapid, repeatable, reliable, manageable and auditable.

Figure 5.1 We will now work through an actual failover leveraging the SRM „Run‟ a recovery plan option. In Figure 5.1 the VI Client lists the three Recovery Plans that were created by working through the Recovery Plan wizard. There are two ways to initiate the actual failover, you can either click on the „Run‟ button or click on the „Execute Recovery Plan‟ link under the Commands section, and both are highlighted in Figure 5.1. The Run Recovery Plan dialog box represented by Figure 5.2 warns you that you are about to run the a recovery plan which will result in changes to the protected virtual machines and the infrastructure of both the protected and recovery site datacenters. Click the radio button to confirm you understand the implications of running your recovery plan and then click on the Run Recovery Plan button that is highlighted in figure 5.2 to start the failover of protected VMs from the protected site to the recovery site. The Run Recovery Plan dialog box also provides a summary of the Recovery Plan Information, that includes the Recovery Plan that is going to be run, along with the names of the protected and recovery sites, the number of protected VMs that will be failed over as well as a connectivity status from the recovery site back to the protected site.

Page 35: SRM Beta Program Evaluator Guide

35 VMware, Inc.

Figure 5.2 While the failover is being executed, the status of each step that makes up the recovery plan can be monitored by going to Recovery Steps tab highlighted in Figure 5.1 of recovery site‟s VI Client which will inform you what steps are currently Running as well as what steps were completed with a Success status. Once again it is worth pointing out that there are some steps in a recovery plan that will only be executed during a simulated test, these steps are identified by „Test Only‟ under the Mode column, there are also some steps that will only be executed during an actual failover, these steps are identified by „Recovery only‟ under the Mode column.

Once all the protected VMs have been failed over and reported to be powered, which can be confirmed in several places from within the VI Client you are now ready to start validating that all application services restarted cleanly at the recovery site, in this case we are referring to the protected VMs app_vm7 through app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2. Once you have completed the validation of the failed over application services at the recovery site you are now in a position to report the successful failover to the business and allow the respective business users to access the application services which are now being hosted out of the recovery site.

Figure 5.3 shows a VMware topology map view, it is clear to see that app_vm7 through app_vm12 were successfully failed over to the recovery site and that they are being hosted from a datastore connected to a host in the recovery site. The non replicated infrastructure VMs (ad_server, print_server, security_camera_server and virus_mgt_server) are hosted from datastore vim23-storage1 in the recovery site. Note: SRM will automatically perform a re-signature of the replicated datastore in the recovery site, which means LVM.EnableResignature will be set to 1 on the ESX host/s that have access to the replicated datastores in the recovery site. The re-signature that is initiated by SRM will result in the replicated datastores being presented with a prefix of snap-0000000X- where X is a number, this is evident in Figure 5.3 which shows the replicated datastore presented as snap-000000020-shared-san-2.

Page 36: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

36 VMware, Inc.

Figure 5.3

As pointed out in the previous chapter SRM will automatically generate a report, in this instance the report is for a SRM „Run‟ operation against the recovery plan we selected. The report is acessible via the History tab and can be viewed by clicking on the View link under the Actions column.

The steps to failback services from the recovery site back to the protected site once the disaster event is over will be outlined in the next chapter of this guide.

The following is a recap of the highlevel tasks excecuted by SRM when performing a failover of virtual machines from the protected site to the recovery site via the „Run‟ a recovery plan option that is availble via SRM enabled VI Client. SRM automates many of the tasks required at time of failover. With the push of one button, SRM:

will power down the protected VMs if there is connectivity between sites and they are online.

suspend data replication and Read/Write enable the replica datastores.

rescan the ESX servers at the recovery site.

registers the replicated protected VMs.

shuts down non-essential VMs at the recovery site if required to free up resources for the protected VMs being failed over.

completes power-up of replicated protected VMs in accordance with the recovery plan.

Page 37: SRM Beta Program Evaluator Guide

37 VMware, Inc.

Chapter 6: Failback from the Recovery Site to the Protected Site Although not included as an automated procedure in SRM 1.0 this chapter will provide an overview which includes the considerations and guidance to execute a failback of services from the recovery site back to the site that was originally designated as the protected site or to a new site due to a catastrophic disaster that has destroyed the original site. For the purpose of this SRM evaluator guide we will consider the failback scenario outlined below: Failback Scenario: Failback of application services from the recovery site back to the protected site after a scheduled BC/DR test which was conducted using SRM to perform the Recovery „actual failover‟ and not a „test failover‟ of services from the protected site to the recovery site.

With this scenario we will assume the infrastructure at protected site has not changed during BC/DR test and that SRM completed a successful recovery of the protected VMs required for the BC/DR test to the recovery site. The protected VMs from the protected site that were failed over for the BC/DR test were confirmed to have been shutdown by SRM in the protected site. The business has a requirement to restore all data generated during the BC/DR test at the recovery site back to the protected site. To aid in the explanation of the failback steps that will follow we will use the abbreviations listed in the following table.

Note: Failback in SRM 1.0 is a multi step procedure that for most part is a manual process; however SRM can be leveraged to provide some automation during the failback as outlined in Steps 1 through 20 below when working in conjunction with your storage team to complete the necessary storage configuration work (storage personality swap) outlined in Step 7 and Step 17. Before starting with Step 1, you will need to set the LVM.DisallowSnapshotLun = 0 on all the ESX hosts in the protected site that are zoned to the LUNs which have been assigned for SRM protected virtual machines and configured for replication between the protected and recovery site. This one time operation is required to ensure the ESX hosts in the protected site are able to access the replicated datastores after the storage personality swap has occurred. Steps 1 through 20 outline the steps that need to be completed for a successful failback from Site B to Site A as well as the steps to complete the re-protection of Site A after the failback from Site B.

Abbreviation Description

Site A original protected site

Site B original recovery site

PG 1 original protection group that was defined at Site A

RP 1 original recovery plan that was defined at Site B

PG 2 new protection group that is defined at Site B to facilitate the failback from Site B via SRM back to Site A

RP 2 new recovery plan that is defined at Site A to facilitate the failback from Site B via SRM back to Site A

PG 3 new protection group that is defined at Site A to facilitate the failover to Site B via SRM. Note: this protection group is basically the same protection group that was defined for PG1

Source LUN VMFS Datastore being replicated to alternate data center

Target LUN Replicated Datastore in the alternate data center

Clone LUN This is a clone of the Target LUN, this will be used during a „Test‟ failover only

Page 38: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

38 VMware, Inc.

Site A - Protected Site

Source LUN

(shared-san-2)Protected VMs

(app_vm7 to app_vm12)

All powered off by SRM

at start of SRM Recovery

Site B - Recovery Site

Target LUN

(shared-san-2)

Write Disabled

(read only)Read Write

Enabled

Protected VMs

(app_vm7 to app_vm12)

All powered on by SRM

during the SRM Recovery

Note: Datastore „shared-san-1‟ will be in the same configuration state as „shared-san-2‟

A Clone LUN is not used during a Recovery in SRM.

Storage configuration after running a Recovery in SRM (Actual Failover)

from Site A to Site B for datastore ‘shared-san-2’

Data Replication is suspended

Source LUN in Site A with Target LUN in Site B

Figure 6.1

1. Ensure all users that were involved with the BC/DR test have completed their test scripts and are no longer accessing any of the protected VMs that were recovered from Site A for the BC/DR test.

2. Shutdown all of the protected VMs that were recovered to Site B for the BC/DR test. 3. Ensure you have created a list of all the Protected VMs that were recovered to Site B. 4. Perform a cleanup of the directory in Site B that contained the VM configuration files

created during protection group creation in Site A (this is the location selected during the creation of the original protection group/s in Site A - Protection Group 1 and Protection Group 2, refer to Figure 3.8). Refer to Figure 6.2 for an example of the placeholder VM configuration file information that was written to vim23-storage1 during the creation of the protection groups in Site A. You can use your list you created in Step 3 above as a reference during this clean-up step.

Figure 6.2

Page 39: SRM Beta Program Evaluator Guide

39 VMware, Inc.

5. Connect to the VC instance in Site A and delete PG 1 (Protection Group 1 and Protection Group 2). Refer to Figure 6.3, a protection group can be removed by right clicking on the protection group in the right hand side Site Recovery pane.

Figure 6.3

6. Connect to the VC instance in Site A and perform a remove from inventory operation on all the protected VMs in Site A that were recovered to Site B. In Figure 6.4 all the protected VMs in Site A have been selected, by right clicking on the selected VMs you will be presented with a menu box, click on Remove from Inventory, which will then remove the all the highlighted protected VMs from the VC inventory in Site A.

Figure 6.4

Page 40: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

40 VMware, Inc.

7. Work with your Storage team to complete a storage configuration change „personality swap‟ whereby the Source LUN is now associated with Site B and the Target LUN is associated with Site A, as depicted in Figure 6.5. Rescan the ESX servers at the protected and recovery site to ensure they become aware of the underlying storage changes.

Site A - Recovery Site

Target LUN

(shared-san-2)Protected VMs

(app_vm7 to app_vm12)

Protected VMs offline

in Site A

Site B - Protected Site

Source LUN

(shared-san-2)

Write Disabled

(read only)Read Write

Enabled

Protected VMs

(app_vm7 to app_vm12)

Protected VMs that will be

recovered to Site A

Note: A Clone LUN is not configured in Site A and will not be used during

the Failback from Site B back to Site A

Storage configuration prior to running a Recovery in SRM from Site B to

Site A (Failback) after a storage configuration change ‘personality swap’

Data Replication is now configured from Site B to Site A

Source LUN is in Site B with Target LUN in Site A

Figure 6.5

8. Complete the Array Manager configuration wizard from Site B. After the storage configuration work is completed in Step 7, the Source LUN is now assigned to Site B and the Target LUN is assigned to Site A as depicted in Figure 6.6.

Figure 6.6

Page 41: SRM Beta Program Evaluator Guide

41 VMware, Inc.

9. Configure the Inventory Preferences in Site B, these inventory preferences will be

assigned to the protected VMs when they are restarted in Site A after the failback. Figure 6.7 shows the inventory preferences that will be mapped to the protected VMs in Site A after the failback from Site B.

Figure 6.7 10. Connect to the VC instance in Site B and configure PG 2 (Failback Protection Group

1 and Failback Protection Group 2) in Site B as depicted in Figure 6.8 for the protected VMs you wish to failback to Site A.

Figure 6.8

11. Connect to the VC instance in Site A and configure RP 2 in Site A. Note: You should not delete RP 1 (Recovery Plan 3 – Complete Site Failover , refer to Figure 6.8) that was created in Site B during the initial SRM workflows that were completed to protect

Page 42: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

42 VMware, Inc.

the designated VMs in Site A. Refer to Figure 6.9 which shows RP 2 (Failback Recovery Plan 3) which was created in Site A.

Figure 6.9

12. Using SRM complete the Failback of the original protected VMs back to Site A. This is

accomplished by performing a Recovery against RP 2 (Failback Recovery Plan 3) shown in Figure 6.9. Figure 6.10 depicts the storage configuration once the SRM Recovery against RP 2 completes.

Site A - Recovery Site

Target LUN

(shared-san-2)

Site B - Protected Site

Source LUN

(shared-san-2)

Write Disabled

(read only)

Read Write

Enabled

Note: A Clone LUN is not used during a Recovery in SRM (Actual Failover) in SRM.

Storage configuration after running a Recovery in SRM (Actual Failover)

from Site B to Site A for datastore ‘shared-san-2’

Protected VMs

(app_vm7 to app_vm12)

All powered off by SRM

at start of SRM Recovery

Protected VMs

(app_vm7 to app_vm12)

All powered on by SRM

during the SRM Recovery

Data Replication is suspended

Source LUN in Site B with Target LUN in Site A

Figure 6.10

Page 43: SRM Beta Program Evaluator Guide

43 VMware, Inc.

13. Shutdown all of the protected VMs in Site A that were failed back from Site B during the SRM Recovery operation performed in Step 12.

14. Perform a cleanup of the directory in Site A that contained the VM configuration files created during protection group creation in Site B (this is the location selected during the creation of PG 2 in Site B, refer to Figure 3.8). Refer to Step 4 above for guidance if required.

15. Connect to the VC instance in Site B and delete PG 2 (Failback Protection Group 1 and Failback Protection Group 2) that were created in Site B. Refer to Step 5 above for guidance if required.

16. Connect to the VC instance in Site B and perform a remove from inventory operation on all the protected VMs in Site B that were recovered to Site A, in this scenario this would be app_vm1 to app_vm12. Refer to Step 6 above for guidance if required.

17. Work with your Storage team to complete a second storage configuration change „personality swap‟ whereby the Source LUN is now re-associated with Site A, the Target LUN is re-associated with Site B along with the Clone LUN as depicted in Figure 6.11. Rescan the ESX servers at the protected and recovery site to ensure they become aware of the underlying storage changes. Note: The Storage configuration has now been reverted back to the original configuration that was handed over to the Virtualization team prior to the setup of SRM and for this reason we depict the storage configuration in Figure 6.11 with the Clone LUN in Site B. The data synchronization method (snapshot at intervals or continuous synchronization) of the Target LUN to the Clone LUN is determined by the Storage Array vendor. When a simulated failover is initiated via the „Test‟ option in SRM, final data synchronization is performed from the Target LUN to the Clone LUN.

Site A - Protected Site

Source LUN

(shared-san-2)

Site B - Recovery Site

Clone LUN

(shared-san-2)

Read Write

Enabled

Data Replication continues between the Source LUN and Target LUNThe data synchronization between the Target LUN and the Clone LUN is suspended

Target LUN

(shared-san-2)

Note: Datastore „shared-san-1‟ will be in the same configuration state as „shared-san-2‟

Protected VMs

(app_vm7 to app_vm12)

Protected VMs powered on

in Site B during the SRM

Test failover

Protected VMs

(app_vm7 to app_vm12)

Protected VMs that will be

recovered to Site B

Storage configuration during a SRM Test failover from Site A to Site B

for datastore ‘shared-san-2’

Write Disabled

(read only)

Read Write

Enabled

Figure 6.11

18. Create PG 3 (Protection Groups 1 and Protection Group 2) in Site A for the protected VMs. Note: The protection groups you create here should be identical to the protection groups that were originally associated with RP 1 (Recovery Plan 3 – Complete Site Failover), the recovery plan that was executed in Recovery mode that resulted in the startup of the protected VMs in Site B.

Page 44: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

44 VMware, Inc.

19. Re-associate the protection groups created in step 19 in Site A with RP 1 (Recovery Plan 3 – Complete Site Failover) in Site B. Refer to Figure 6.12 which highlights the first part of the re-association which is initiated by right clicking on Recovery Plan 3 – Complete Site Failover. Working through the Edit Recovery Plan wizard you will then get to a screen that will require you to select which protection groups you wish to re-associate with the recovery plan, refer to Figure 6.13 which shows the two protection groups Protection Group 1 and Protection Group 2 that should be selected so they can be re-associated with RP1 (Recovery Plan 3 – Complete Site Failover) in Site B. Note: You do not need to delete the RP 2 (Failback Recovery Plan 3) that was created in Site A to facilitate the recovery back to Site A from Site

Figure 6.12

Figure 6.13

20. Once the protection groups have been re-associated with the original recovery plan as detailed in step 19 you have now completed the re-protection of the protected VMs in Site A with SRM. It is highly recommended to that you now complete a final „Test‟ a simulated failover against RP 1 (Recovery Plan 3 – Complete Site Failover) to ensure that Site A is protected and ready for any event that may necessitate a Recovery via SRM to Site B should the business deem it necessary to declare a disaster.

Page 45: SRM Beta Program Evaluator Guide

45 VMware, Inc.

Note: The SRM Failback steps 1 through 20 outlined in this Chapter did not detail the following:

Site Pairing from Site B back to Site A via the SRM Connection wizard: This step is not required as SRM maintains a bi-directional relationship between the paired sites, and therefore the Connection workflow only needs to be completed once for Site A and Site B to ensure each site is aware of each VC and SRM instance in the respective sites.

SRM License transfer between Site A and Site B: The SRM failback steps did not discuss the transfer of SRM licenses between Site A and Site B that should be completed to ensure you are in compliance with the SRM EULA.

DNS Updates: SRM 1.0 does not provide a mechanism to update DNS. DNS updates will need to be completed by you as the virtual machines are moved between Site A and Site B with SRM and under go IP address changes to accommodate disparate networks in Site A and Site B should they not be joined by Stretched VLANs.

Figure 6.14 shows a summary view of the SRM configuration of Site A which has now been reverted back to the original protected site. In addition to the two protection groups Protection Group 1 and Protection Group 2 which are associated with the recovery plan Recovery Plan 3 – Complete Site Failover in the designated recovery site – Site B, we also have the recovery plan Failback Recovery Plan 3 listed which was used to enable the failback procedure outlined in this chapter.

Figure 6.14

Page 46: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

46 VMware, Inc.

SRM Failback Checklist To aid with Failback after a SRM Recovery please refer to the SRM Failback Checklist below which provides a high level summary of Failback steps 1 through 20 which are detailed in this chapter.

Step # Site Description Yes / No

1 Site B Protected VMs recovered to Site B are no longer being used and can be powered down

2 Site B Power down the Protected VMs in Site B

3 Site B Create a list of all the Protected VMs that were recovered to Site B

4 Site B Perform a cleanup of the directory in Site B that contained the VM configuration files created during protection group creation in Site A

5 Site A Connect to the VC instance in Site A and delete PG 1

6 Site A Connect to the VC instance in Site A and perform a remove from inventory operation on all the protected VMs in Site A that were recovered to Site B

7 Storage Work

Work with your Storage team to complete a storage configuration change „personality swap‟ whereby the Source LUN is now associated with Site B and the Target LUN is associated with Site A. Refer to Figure 6.5. Rescan the ESX servers at the protected and recovery site to ensure they become aware of the underlying storage changes.

8 Site B Complete the Array Manager configuration wizard in Site B which now has the Source LUN configured in Site B and the Target LUN configured in Site A

9 Site B Configure the Inventory Preferences in Site B, these inventory preferences will be assigned to the protected VMs when they are restarted in Site A after the failback

10 Site B Connect to the VC instance in Site B and configure PG 2

11 Site A Connect to the VC instance in Site A and configure RP 2 in Site A

12 Site A Using SRM complete the failback of the original protected VMs back to Site A. This is accomplished by performing a Recovery against RP 2. Figure 6.10 depicts the storage configuration after the Recovery completes

13 Site A Shutdown all of the protected VMs in Site A that were failed back from Site B during the SRM Recovery operation performed in Step 12

14 Site A Perform a cleanup of the directory in Site A that contained the VM configuration files created during protection group creation in Site B

15 Site B Connect to the VC instance in Site B and delete PG 2 that was created in Site B in step 10

16 Site B Connect to the VC instance in Site B and perform a remove from inventory operation on all the protected VMs in Site B that were recovered to Site A

17 Storage Work

Work with your Storage team to complete a second storage configuration change „personality swap‟ whereby the Source LUN is now re-associated with Site A, the Target LUN is re-associated with Site B along with the Clone LUN as depicted in Figure 6.11. Rescan the ESX servers at the protected and recovery site to ensure they become aware of the underlying storage changes.

18 Site A Create PG 3 in Site A for the protected VMs

19 Site B Re-associate PG 3 from step 18 in Site A with RP 1 in Site B

20 Site B Complete a final „Test‟ a simulated failover against RP 1 to ensure that Site A is protected and ready for any event that may necessitate a Recovery via SRM to Site B should a disaster be declared

Page 47: SRM Beta Program Evaluator Guide

47 VMware, Inc.

Chapter 7: SRM Alarms and Site Status Monitoring This chapter will provide an overview of some of the SRM Alarms that will be generated due to certain types of failures or conditions that may occur at the protected or recovery site. Awareness of the SRM alarms is an important part of understanding how SRM works across the protected and recovery sites. During the SRM product evaluation it is recommended that where possible and without impact to your production environment, failures or conditions be created in the protected and recovery site that will result in the generation of SRM alarms. The generation of these SRM alarms will serve as validation that SRM is monitoring both the protected and recovery site correctly.

Each SRM server monitors the CPU utilization, disk space, and memory consumption of the guest on which it is running, and also maintains a heartbeat with its peer SRM server. VC events are sent if any of these measures falls outside of configured bounds. SRM will support the configuration of event-triggered alarms so that you can associate a notification action with any given SRM Alarm Event. These alarms are configured via the SRM UI. SRM will support the following alarm notification actions:

Send e-mail to specified address

Send SNMP trap to VC trap receivers

Execute specified command on VC host

Please refer to Chapter 9 – Alerting and Monitoring in the Administrators Guide for Site Recovery Manager which details how to setup the alarm actions listed above.

Failure of either site generates events which can be associated with VC alarms.

Problems with the local site (e.g. resource constraints) Problems with remote site (e.g., unable to ping remote site which may indicate a disaster)

Remote site failure is reflected in the SRM Alarm Events and will not automatically trigger a recovery. This must be initiated manually.

SRM will raise VC events for the following conditions:

Disk Space Low

CPU use exceeded limit.

Memory low.

Remote Site not responding.

Remote Site heartbeat failed.

Recovery Plan Test started, ended, succeeded, failed, or cancelled.

Virtual Machine Recovery started, ended, succeeded, failed, or reports a warning. As a starting point during the SRM Evaluation we recommend you complete the Action setup for the SRM Alarm Events listed below for the protected and recovery sites. You should be able to trigger these events in your environment without impacting your production environment, with the goal being that you see first hand how SRM responds and notifies you when subjected to one of the failure events listed below.

Remote Site Down

Remote Site Ping Failed

Replication Group Removed

Recovery Plan Destroyed

License Server Unreachable

Page 48: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

48 VMware, Inc.

Figure 7.1

Page 49: SRM Beta Program Evaluator Guide

49 VMware, Inc.

As you become more familiar with SRM, it associated workflows that allow you to Test your recovery plans as well as Run your recovery plan which results in the failover of services from your protected site to your recovery site we recommend that you work through the list of SRM Alarm Events which are accessed via the Alarms tab, as depicted in Figure 7.1 and enable the appropriate notification Actions for any additional SRM Alarm Events that you deem to be important for your environment.

Page 50: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

50 VMware, Inc.

Chapter 8: SRM Roles and Privileges This chapter will provide an overview of the SRM roles and the types of SRM privileges that can be set. Authorization in SRM uses the same authorization model as VirtualCenter Server. Figure 8.1 show the default SRM roles which become available for use after the SRM plug-in has been installed and enabled for use. To access these roles click on the Administration icon in the toolbar and click on the Roles tab to see a list of all the roles that are available. These default SRM roles provide the ability to delegate control to a very granular level.

Figure 8.1

Page 51: SRM Beta Program Evaluator Guide

51 VMware, Inc.

There are two sets roles. The first set contains the roles required for the primary site user to administer protection and the SRM roles are prefixed by Protection. The second set contains the roles required for the secondary site user to administer recovery and the SRM roles are prefixed by Recovery. Protection Side SRM Roles

Protection Virtual Machine Administrator: This role should be assigned on the protected Virtual Machine object in the VC inventory. It grants the associated user the ability to setup and modify the protection characteristics of the protected virtual machine.

Protection SRM Administrator: This role should be assigned on the Service Instance object in the primary SRM inventory. It grants the associated user the ability to pair two sites, configure inventory mappings, and SAN arrays.

Protection Groups Administrator: This role should be assigned on the Primary Configuration/Protection Service object in the SRM inventory. It grants the associated user the ability to create and modify protection profiles/groups.

Recovery Side SRM Roles

Recovery Inventory Administrator: This role should be assigned on the root of the VC inventory. It grants the associated user the ability to view customization specifications existing on the secondary site.

Recovery Datacenter Administrator: This role should be assigned on the Datacenter object in the VC inventory where the VMs will be recovered. It grants the associated user the ability to view available datastores and perform recovery (shadow) VM customizations.

Recovery Host Administrator: This role should be assigned on the Host or DRS cluster object in the VC inventory where the VM will be recovered. It grants the associated user the ability to configure VM components during recovery.

Recovery Virtual Machine Administrator: This role should be assigned on the Folder and Resource Pool objects in the VC inventory where the recovery (shadow) VMs are to be placed. It grants the associated user the ability to create and add shadow VMs to the resource pool and the folder as well as the ability to reconfigure and customize the shadow VMs at runtime and during the process of recovery.

Recovery SRM Administrator: This role should be assigned on the Service Instance object in the secondary SRM inventory. It grants the associated user the ability to configure SAN arrays and create protection profiles.

Recovery Plans Administrator: This role should be assigned on the Secondary Configuration/Recovery Service object in the SRM inventory. It grants the associated user the ability to reconfigure protection and shadow VMs and setup and run recovery.

Note: VirtualCenter already defines a Read-Only system role which can be used to grant users the ability to view the Site Recovery Manager service. In addition, the Administrator role can be used to grant user complete control over both the protection and recovery SRM components.

Page 52: SRM Beta Program Evaluator Guide

Site Recovery Manager Evaluator Guide

52 VMware, Inc.

SRM also allows for the creation of custom SRM roles by allowing you to clone one of the default SRM roles and then by editing the cloned SRM role you can select which privileges should be associated to the custom SRM role that you are creating. Figure 8.2 shows a Custom SRM Role and all the privileges that can be selected to complete the creation of the SRM Custom Role.

Figure 8.2

Page 53: SRM Beta Program Evaluator Guide

53 VMware, Inc.

Conclusion Site Recovery Manager will leverage your VMware Infrastructure to make disaster recovery:

Rapid - by automating the disaster recovery process for your virtual machines by eliminating the complexities of traditional physical disaster recovery.

Reliable - by ensuring proper execution of the recovery plan as well as the ability to enable easier, more frequent tests in an isolated environment without impacting services in the protected site.

Manageable - centrally manage recovery plans and make plans dynamic to match a dynamic virtualized environment.

Affordable - utilize recovery site infrastructure and reduce management costs. Site Recovery Manager will enable you to:

Expand disaster recovery protection - now any workload in a virtual machine can be protected with minimal incremental effort and cost.

Reduce time to recovery - as soon as a disaster is declared, SRM allows for the recovery of protected virtual machines with a few mouse clicks to the designated recovery site.

Increase reliability of recovery - replication of system state ensures your protected virtual machines have all they need to startup in the protected site. Hardware independence which is realized through your VMware Infrastructure eliminates failures due to different hardware.

Easier and more frequent testing – SRM enables you to test your recovery plan in an isolated environment without impacting services in the protected site while using the actual failover sequence that will be executed during a real disaster.


Recommended