Disaster Recovery, Concepts in BS2000/OSD -...

Issue January 2006 Disaster RecoveryConcepts in BS2000/OSD

Pages 43

Summary Disaster recovery concepts are needed to restart applications and their data andto resume interrupted business transactions in the event of a disaster.

This paper provides an introduction to and an overview of the issue of disasterrecovery specifically for BS2000 systems using data mirroring, also over longdistances.

The paper reflects the experience of Fujitsu and its customers in the important and highly topical area of mission-critical computingand also illustrates the sophisticated disaster recovery features of BS2000/OSDand its partner products.

The objective of the paper is to introduce to responsible carriers of the data center the disaster recovery configurationvariants recommended by Fujitsu and to describe the prerequisites for disaster tolerant architectures and the basic procedures to be adopted (failover and fail back) in the event of a disaster. Technicaldisaster protection options are also explained for system administrators. Requisite techniques and utilities such as

• Symmetrix Remote Data Facility (SRDFTM), Symmetrix Host Component for BS2000/OSD (SHC-OSD) • WDM technology for real-time data mirroring over long distances • Deployment of virtual hosts (or DNS interventions) for network switchover in the event of a disaster • HIPLEX AF for automatic failure detection and automatic switchover in the event of a disaster • Deployment of global storage in disaster recovery configurations • Deployment of the Symmetrix Multi Mirror Facility (TimeFinder) function for asynchronous disaster recovery

are described intelligibly.

White Paper ⏐ Issue: January 2006 ⏐ Disaster Recovery Concepts in BS2000/OSD Page 2 / 43

Contents Summary 1 1 Introduction 3

1.1 What is meant by disaster recovery? 3 1.2 What do we mean by high availability (in contrast to disaster recovery)? 4 1.3 Interrelationship of high availability (HA) and disaster recovery 4 1.4 Purpose and structure of this document 5

2 Terms, definitions, introduction 6 2.1 Disaster tolerant architecture 6 2.2 Standby data center 6 2.3 Symmetric disaster recovery / asymmetric disaster recovery 6 2.4 Cold standby / hot standby / backup 6 2.5 Synchronous and asynchronous disaster recovery configurations 6 2.6 Failover / failback 7 2.7 Manual and automatic disaster recovery 7 2.8 Symmetrix SRDF function 7 2.9 Symmetrix TimeFinder function (SMMF) 10 2.10 Global storage (GS) 10 2.11 Consistency groups 10 2.12 HIPLEX MSCF 11 2.13 HIPLEX AF 11 2.14 Symmetrix Host Component in BS2000 (SHC-OSD) 13 2.15 Dual Recording by Volume (DRV) 13

3 Prerequisites for disaster recovery in BS2000 15 3.1 Organizational measures 15 3.2 Setup of a standby data center 16 3.3 Preparation of data network 16 3.4 Procedures relating to availability of online data in the standby DC 18 3.5 Distance-dependent restrictions 24 3.6 Administrative preparations for disaster recovery 25 3.7 Data backup concept 26 3.8 Step-by-step establishment of disaster recovery prerequisites 26

4 Disaster tolerant architectures 27 4.1 X-link disaster recovery configuration 27 4.2 X-link disaster recovery configuration with GS 29 4.3 U-link disaster recovery configuration 31 4.4 Asynchronous U-link disaster recovery configuration 33 4.5 Combined disaster recovery configurations 35 4.6 Configurations with more than one Symmetrix subsystem 36 4.7 Separation of applications by VM2000 36

5 Failover process 38 5.1 Automatic failover with HIPLEX AF 38 5.2 Manual failover 40

6 Failback process 41 6.1 Failback with HIPLEX AF 41 6.2 Manual failback 41

7 Related publications and online references 43


1 Introduction

1.1 What is meant by disaster recovery? In information technology, disaster recovery refers to the precautionary measures needed to enable production operation and therefore mission-critical applications to be resumed after partial or total destruction of a production site (“data center” in an IT infrastructure). A disaster refers to the failure of a data center due to power outage or destruction (by fire, flooding, explosion, earthquake, storm, sabotage, etc) or, more specifically, to the failure of a host and local storage peripherals or only parts of storage peripherals on which current production data resides. In essence, a disaster of this kind is an event which is caused by external factors and results in an interruption of production operation. Consequently, it is necessary to switch over to analogous resources in a standby data center in order to resume production operation. In the following, the term standby data center refers either to a remote data center with equivalent hardware or to hardware equipment (usually host, disk storage, network components) which is capable of supporting the corresponding applications, which is separated from the production host and its associated disk storage subsystems by at least a firewall, and which has its own power supply. The requirements to be satisfied by a disaster recovery concept relate to the provision of a configuration (hardware and software) which enables production operation to be continued in a standby data center (site failure recovery) in the event of total failure of the primary data center. The following requirements must also be fulfilled as far as possible.

Availability of consistent and current (as current as possible) data of the production applications in the standby data center Unchanged user access to the data (in the standby data center) by means of “network switching” in the event of a disaster No long downtimes until applications are restarted Access to past, archived data and guaranteed continuation of the previous data backup concept, also in the standby data center

Support for transferring applications back to the recovered data center (failback) Optional disaster recovery requirements are

Automatic failure detection and automatic restart Support for long distances between production and standby data centers (more than just a few kilometers)

If it is assumed that a standby data center (host, online peripherals, network components, etc.) is available, the disaster recovery concept must address the following important questions. 1) How is the application data transferred to the standby data center?

2) What network precautions are needed to ensure that users are able to access their procedures (now running on a different

host) as normal within a reasonable period after restart in the standby data center (ideally without user intervention). The figure below summarizes these requirements as basic prerequisites for disaster recovery.


1 Current Data

Users

2 Network Switchover

Standby Computer CenterWork Computer Center

3 Application

Fig. 1-1: Basic prerequisites for disaster recovery The terms

disaster recovery disaster protection disaster tolerance

are equivalents in the literature and are used as synonyms in this paper.

1.2 What do we mean by high availability (in contrast to disaster recovery)? The primary goal of high-availability configurations is to compensate for the failure of individual resources (both hardware and software), if possible without interrupting production operation (minimization of downtime, single failure recovery). Measures associated with high availability are the redundancy of all hardware resources needed to maintain production operation, software monitoring of resources, and automated responses to hardware faults and software errors. The major difference between disaster protection and high availability is that in disaster recovery the redundant resources and data needed to maintain production operation are geographically separate and are therefore protected against destructive influences at the production site.

1.3 Interrelationship of high availability (HA) and disaster recovery Ideally, high availability and disaster recovery are two cornerstones of an IT landscape in which HA is a basic platform supplemented by disaster recovery. It is not true that disaster recovery requires or implies maximum HA. There are, for example, disaster recovery procedures that ensure 100% data integrity, data consistency and data currency (in the standby data center) with redundant hardware (such as two Symmetrix systems linked via SRDF) when a system (or links between systems) fails; these procedures ensure the above not by allowing work to continue with the remaining system but by terminating production operation with errors as a precautionary measure and, where appropriate, resuming operation with a single system after the failure scenario has been assessed by specially trained staff (see also “domino mode” in section 2.8.2). Consequently, high availability and disaster recovery pursue two different goals but the individual forms they take can and must be harmonized. The risks of downtime as opposed to data loss must be assessed separately for each customer so that the two approaches can be best coordinated. Chapter 4 of this document describes disaster tolerant architectures for BS2000 systems. Synchronous X-link configurations, for example, enable disaster recovery and high availability to be ideally combined and controlled by common software (HIPLEX AF). If configurations of this kind are not feasible due to long distances, performance requirements, or for reasons of cost, minor compromises will have to be made either in terms of disaster recovery or high availability.


1.4 Purpose and structure of this document This paper provides an introduction to and an overview of disaster recovery specifically for BS2000 systems using the SRDFfunctionality of attached Symmetrix systems.

The objective is to introduce disaster recovery configuration variants recommended by Fujitsu to responsible carriers of the data center and to describe the prerequisites for disaster tolerant architectures and the basicprocedures to be adopted (failover and failback) in the event of a disaster. Technical disaster protection options are alsoexplained for system administrators. Requisite techniques and utilities such as

Symmetrix Remote Data Facility (SRDF), Symmetrix Host Component for BS2000/OSD (SHC-OSD) WDM technology for real-time data mirroring over long distances Deployment of virtual hosts (or DNS interventions) for network switchover in the event of a disaster HIPLEX AF for automatic failure detection and automatic switchover in the event of a disaster Deployment of global storage in disaster tolerant architectures Deployment of the Symmetrix Multi Mirror Facility (TimeFinder) function for asynchronous disaster recovery

are described clearly and simply.

Many factors must be taken into account in the detailed planning and implementation of a full disaster protection solution.Relevant issues are, for example, data currency requirements in the event of a disaster, choice of a cold standby or hot standbysolution, pre-existing data center architecture (hosts, online peripherals, network, VMs, nearline peripherals), applications to beprotected and their IO load, data center sites, distances between sites, tape library site and the data backup concepts deployed,as well as customer-specific organizational and personnel details, and finally the choice of manual or automatic failure detectionand of manual, semi-automatic or automatic switchover in the event of a disaster.

Due to the multitude of influencing factors, it is clearly not possible to describe turnkey product solutions for disaster recovery.The purpose of this paper is to help devise custom disaster protection solutions jointly with our Competence Centers. These can then be implemented by Fujitsu specialists.

In the past, many of our customers have used the resources offered to prepare their data centers for disaster recovery and haveimplemented configurations, either independently or with the help of our Competence Centers, to protect mission-critical data inthe event of a disaster and to quickly restart critical IT-based procedures. This paper provides an all-round view of disaster recovery in BS2000/OSD and points to current disaster protection utilities and procedures made available by Fujitsu.

Chapter 2 explains terms and describes products of importance for disaster recovery concepts in BS2000. Readers alreadyfamiliar with the products described may skip the corresponding sections of the chapter.

Chapter 3 describes the above-mentioned prerequisites for disaster recovery concepts (data mirroring, network connection)and, where appropriate, additional organizational and administrative steps. Procedures to ensure the availability of online datain standby data centers and techniques for covering long distances for purposes of data mirroring are also discussed.

Chapter 4 describes the configurations which, in the view of the authors, should be recommended for disaster recoveryconcepts in BS2000/OSD. The configurations differ to cater for factors such as distance between data centers, data currency,performance and automation options.

Chapters 5 and 6 outline the steps needed to continue operation in the standby data center if a disaster occurs, and to fail backto the work data center.


2 Terms, definitions, introduction A central topic in this and following chapters is the mirroring of data to the standby data center. For disaster recovery concepts in BS2000/OSD, we recommend high-quality Symmetrix subsystems by EMC2 and the associated SRDF remote copy function. We have therefore restricted ourselves to describing only these systems in this document. The alternative of host-driven mirroring via the DRV subsystem of BS2000 is also outlined in section 2.15. Currently, the use of FibreCat disk subsystems by EMC2 and the associated MirrorView remote copy function is limited to non-automated cold standby disaster recovery concepts in BS2000/OSD and is beyond the scope of this paper.

2.1 Disaster tolerant architecture Disaster tolerant architecture is a cluster architecture of two geographically separate data centers for which specific administrative measures relating to disaster protection have been defined and implemented. The sum total of hardware, software and administration measures must be such that selected applications can be up-and-running in a second data center within an acceptable period of time if a disaster arises in the work data center (in which mission-critical applications run). Customers should specify what the acceptable period of time is and should set up their configuration accordingly.

2.2 Standby data center If two geographically separate data centers form a disaster tolerant architecture and a disaster occurs in one of the data centers, the center that remains intact is known as the standby data center.

2.3 Symmetric disaster recovery / asymmetric disaster recovery In a disaster tolerant architecture with two separate data centers DC1 and DC2, one or more servers in standby data center DC2 are configured to run the applications of the servers in the work data center DC1 if a disaster occurs. If production applications are also running in DC2, it is similarly possible to arrange for these to run in DC1. In the event of a disaster in DC2, DC1 would then be the standby data center. The roles of production DC and standby DC are then interchangeable and this architecture is referred to as a symmetric disaster tolerant architecture. If no relevant applications run in DC2, no disaster recovery concept for DC2 is required. DC2 is then always the standby DC. This architecture is then referred to as an asymmetric disaster tolerant architecture with work data center DC1 and standby data center DC2. It is also common practice to link two data centers of different enterprises and to provide failover resources for the “enterprise-external” applications if this is feasible in organizational terms.

2.4 Cold standby / hot standby / backup These are terms which describe the role of a standby data center. Cold standby (or backup data center) refers to a standby data center that is not in production use (but may well be used for test applications) and is activated only if a disaster occurs. Cold standby cannot therefore be part of a symmetric disaster tolerant architecture. Hot standby is a system that is up-and-running and is always ready to run the applications of the work data center in the event of a disaster. Downtime is therefore shorter and both symmetric and asymmetric disaster recovery are possible.

2.5 Synchronous and asynchronous disaster recovery configurations In synchronous disaster recovery configurations the critical data in the standby data center must be absolutely up-to-date in the event of a disaster. If the standby data center always has the same online data as in the original data center, work can continue in the standby data center without loss of data if a disaster occurs. Such a configuration of the two data centers, and in particular of their disk storage systems and remote copy function to support continued work, is referred to as a synchronous disaster recovery configuration. If the standby data center contains only earlier data (e.g. of the previous day) of the original data center, work can continue in the standby data center in the event of a disaster but without the data records created since the last backup (e.g. data records created on the day of the disaster). Such a configuration of the two data centers, and in particular of their disk storage systems, is referred to as an asynchronous disaster recovery configuration. In an asynchronous disaster recovery configuration, the data in the standby data center is less current after a disaster. In the standby data center, it is possible at any time to restore data that corresponds to the last synchronization point of the applications – synchronization points are generated at regular intervals in the standby data center by actions on the production host (e.g. during backup mode of an Oracle database) and are frozen until the next synchronization point is generated. It is necessary, even with an asynchronous disaster recovery configuration, to transfer data generated by the production host to a standby data center – asynchronous data mirroring, periodic file transfer, or physical transport of data media are possible ways of doing this.


2.6 Failover / failback If a disaster occurs, the production applications of the failed data center must be started in the standby data center. This operation and the sum total of all associated actions are known as failover. Failover can be carried out either manually or automatically. After failover and recovery of the failed work data center, the disaster recovery concept should permit the applications to be transferred back to the original resources. This operation and the sum total of all associated actions is known as failback.

2.7 Manual and automatic disaster recovery What is meant by automatic disaster recovery is that the failure of several components or even of an entire data center is detected by monitoring software (HIPLEX AF for BS2000 systems) and appropriate measures are triggered automatically. If possible, all software resources of an application are shut down and then restarted on the standby host with mirrored disks. If only the hardware and system prerequisites which enable an application to be run and to be controlled in the standby data center after a failure are satisfied but the application itself needs to be restarted by a member of staff (including setup of the required runtime environment), this is known as manual disaster recovery. A halfway-house solution is semi-automatic disaster recovery in which the monitoring software detects and reports a disaster in line with predefined criteria – and therefore simplifies any decisions needed – but does not trigger failover. This means that staff must be available on site or must be informed automatically so that a decision on failover can be made on the basis of a number of tests. However, this solution has the advantage that failover is not triggered unnecessarily in situations where the failure may be short-term or easily remedied. For failover itself, the same automated process is available as for automatic disaster recovery in the form of HIPLEX AF; in other words, “failover at the touch of a button”. Advantages of automated disaster recovery:

No personnel required in the event of a disaster (ideal situation) Prevention of human errors in critical situations Disaster recovery processes are coordinated by the software.

Nevertheless, multiple faults (e.g. rolling disasters) are conceivable in disaster scenarios. These require system administrators to make certain decisions during the automated recovery process, i.e. manual intervention is needed. In manual and automatic disaster recovery, it is equally important to document disaster recovery processes in detail and to test these at regular intervals (e.g. half-yearly). Furthermore, know-how relating to the individual processes should not be concentrated in the hands of just a few employees.

2.8 Symmetrix SRDF function

2.8.1 SRDF, SRDF modes The SRDF (Symmetrix Remote Data Facility) function of EMC supports data mirroring of a local Symmetrix system to a remote Symmetrix system. The two Symmetrix systems are linked to each other by at least two remote link directors. Each Symmetrix can be linked to up to four others by means of remote links. Regardless of the distances involved, SRDF can be used in unidirectional or bidirectional configurations. If one Symmetrix fails, the current data is still present in the other Symmetrix. Consequently, the data need not be reloaded after a failure, nor is it necessary to check whether backups are usable or consistent, and there is no need to roll back to an earlier state. A volume mirrored using SRDF consists of the source unit (original data) and the target unit (copy of the data) connected via a remote link. The source unit is on the Symmetrix that receives the write jobs in normal SRDF operation. It sends the updated data to the remote Symmetrix which stores the data initially in the cache and then (asynchronously) in the target unit. The service engineer is responsible for setting up SRDF pairs. There are three types of SRDF mode

synchronous mode semi-synchronous mode adaptive copy mode (asynchronous mode)

Synchronous mode means that write IO operations to an SRDF disk are not reported to the operating system as having been completed successfully until the data has been written to the local cache and to the cache of the remote Symmetrix via the SRDF link. Only then is the next write job to the same disk accepted. This ensures that the data of all executed write jobs is still available if a Symmetrix or a data center fails. By contrast, in semi-synchronous mode each write IO operation is reported as having been completed as soon as the data is written to the cache of the local Symmetrix. Alignment of the cache in the remote Symmetrix is performed only at specific intervals but at the latest when a new write job arrives for the same disk. In other words, if the local Symmetrix fails, one write job per disk may be lost. On the other hand, performance is better with this mode, particularly over long SRDF distances.


In adaptive copy mode, data alignment between Symmetrix systems has low priority; it takes place with a delay after write IO and also in an order that differs from the original IO operations. This means that if a Symmetrix fails, the mirrored data may be inconsistent. This mode should therefore be used for disaster recovery only if consistent datasets are regularly created and backed up on, for example, BCVs as described in section 4.4. It should not be used unless data loss between backups is tolerable if a disaster occurs. The following implications result from the mode selected. Synchronous mode always guarantees the matching status of all mirrored data provided logically dependent data is not spread over SRDF units on different Symmetrix systems. This mode is therefore ideally suited to disaster recovery. The only consideration is whether distance-dependent (and possibly network topology-dependent) performance losses are acceptable. Semi-synchronous mode guarantees that the status of the data of a Symmetrix matches the data of the remote Symmetrix with the exception of, at most, one write IO operation per volume. Data consistency is ensured in the sense that write IO operations on different volumes of the same Symmetrix are written to the target units in the same order as they are written by the application. If the loss of a maximum of one write IO operation per volume is tolerable for users in the event of a disaster, this mode is also suitable for disaster recovery providing it is ensured that logically dependent data is not spread over different Symmetrix systems when several Symmetrix systems are mirrored. Generally, applications whose data is mirrored in this mode do not suffer performance loss as compared to unmirrored mode. However, in write-intensive applications, performance loss is comparable to that of synchronous mode. Adaptive copy mode is of only limited suitability for disaster recovery of critical applications due to the inability to guarantee data consistency as described above and to ensure the matching status of all mirrored data at all times. The data on the target units in the standby data center may be inconsistent after a disaster – but by virtue of regularly generated consistency statuses that are frozen on additive mirrors (e.g. BCVs) in the standby data center, this mode is an alternative worth discussing if data currency requirements are low because it has only a minor impact on overall performance. In summary:

SRDF mode Data loss in the event of a disaster IO performance impact Synchronous No loss Each write IO operation is longer Semi-synchronous

Maximum of one write IO operation per volume Only write-intensive phases are longer

Asynchronous All write IO operations since the last consistency status

Only minimal impact

2.8.2 Domino mode Domino mode (or the domino effect) is a Symmetrix attribute that can be set for an SRDF pair. It ensures that the data on the source unit and target unit are always synchronous. If this attribute is activated for an SRDF pair, the Symmetrix system containing the source unit will always set the source unit to “disabled” (if still possible) and will reject each IO operation for the source unit with “intervention required / unit not ready” if one of the two mirrored units is not available or if there is a link failure between the two Symmetrix systems linked via SRDF. The application will then terminate itself with an error or will halt until the source unit is reactivated by means of a command. Domino mode must be deactivated before the source unit can be reactivated. Consequently, even if only the target unit or SRDF links fail, the system administrator must intervene when they are available again to make the source units available once more and, if necessary, to reactivate domino mode. However, failure of all links is unlikely if the SRDF links are redundant and geographically separated. In “normal” operation of an SRDF pair with deactivated domino mode, IO operations would continue to be executed on the SRDF pair if one of the two mirrored units were not available or if there were a link failure between the two Symmetrix systems linked via SRDF. As a result, write IO operations would be flagged (as invalid tracks, see below) in Symmetrix for later transfer to the non-available unit. When a failed SRDF link is recovered or when a failed unit is available again, automatic resynchronization between source and target units begins unless prevented by a corresponding Symmetrix attribute (“prevent automatic link recovery after all link failures = no”). When synchronization takes place, the order of the original write jobs is no longer preserved; i.e. if the source units were to fail during resynchronization (in the event of a disaster), the data in the target units may be inconsistent. This (albeit highly unlikely) case of a double failure with critical consequences is also known as a “rolling disaster with link flapping” (Fig. 2-1). In such cases, the only remedy is to resort to the last backup available in an external tape library or to a duplicated backup in the standby data center. Domino mode provides protection against this scenario because no further IO operations are permitted after failure of SRDF links. Domino mode is equally interesting for critical applications, in which the loss of just a few completed transactions (reservations, orders, etc.) can have expensive consequences. For example, a rolling disaster scenario is conceivable in which the SRDF links fail, the applications continue to execute IO operations on the source units, and then the production host fails so that after failover to the standby data center and to the associated target units the data is not fully up-to-date (corresponds to Fig. 2-1, but


without step 2). This scenario is easier to imagine than the scenario with link flapping described above but does not result in highly undesirable data inconsistency but “only” entails the possible loss of the data last written.

EMCEMC

Standby hostProd. host

EMC EMC

SRDF with autom. resynch.sources targets

3. Failure

1. Failure 2. Ready again

Fig. 2-1: Rolling disaster with “Link flapping” However, in domino mode failure of the SRDF links and failure of a target unit result in the failure of the source unit and consequently lead to application interruption (refer to section 4.2). This, in turn, reduces high availability. When weighing up the pro and cons of domino mode, the following should therefore be considered.

If maximum availability of applications is the highest priority, and disaster recovery risks must therefore be minimized, it is best to select a disaster tolerant architecture without domino mode.

If the risk of data loss in (highly unlikely) specific disaster scenarios must be avoided at all costs, application interruption in the event of individual failures in the work data center, standby data center, or on the link between the two sites must be accepted and a disaster tolerant architecture with domino mode must be selected.

Domino mode can be enabled in BS2000 using SHC-OSD subsystem commands for individual SRDF pairs. Domino mode can be generalized to refer to consistency groups of data volumes that go beyond pure SRDF configurations (refer to section 2.11). Section 4.2 deals particularly with domino mode for DAB caches in dual GS configurations.

2.8.3 Invalid Tracks If a target unit fails (and domino mode is disabled) or if all remote links fail, the local Symmetrix flags all new write data (and, if necessary, all data that has not been written through to the target unit) as invalid tracks. When the defect has been eliminated and depending on a special Symmetrix setting (prevent automatic link recovery after all link failures = no), the Symmetrix performs automatic resynchronization of all tracks flagged as invalid. If a link exists to the remote Symmetrix, it is also informed of the number of invalid tracks. An internal volume-specific setting of the Symmetrix subsystem, known as the “invalid tracks attribute”, prevents work continuing with a target unit if this volume is not synchronized. This setting can be disabled by means of an SHC parameter but it prevents invalid tracks going unnoticed in the event of failover. If, as recommended in chapter 4, there are multiple, geographically separate SRDF links, it should be possible to exclude the failure of all links so that no invalid tracks occur on the target units. However, in asynchronous configurations invalid tracks must be expected if a disaster occurs (refer to section 4.4).


2.8.4 SRDF settings for synchronous disaster recovery RAID1 is a requirement for all disks (with the exception of BCVs, see 2.9) in disaster tolerant architectures. As a result, the scenario of a write error on a single disk (e.g. disk with logging data) cannot occur. Only the remote links or entire Symmetrix subsystems must be regarded as failure units of a Symmetrix configuration. All disks involved must be RAID1-mirrored in order to provide (local) high availability. Logically dependent data should not be spread over different Symmetrix subsystems. Otherwise, rolling disaster situations are conceivable in which, for example, a transaction is logged in the standby data center but the transaction itself is not executed (refer to section 4.6). Provided the loss of data in the scenarios described in section 2.8.2 is tolerable in the event of a disaster, domino mode should not be enabled so as not to impair high availability of the configuration. When configuring Symmetrix systems, the EMC service engineer should ensure that

automatic resynchronization has been enabled for the systems (setting: “prevent automatic link recovery after all link failures = no”)

the invalid tracks attribute has been enabled for all SRDF disks.

2.9 Symmetrix TimeFinder function (SMMF) The TimeFinder function (SMMF, Symmetrix Multi Mirroring Facility) allows a Symmetrix disk to be mirrored by a disk of the same type and capacity that can be attached and detached for test or backup purposes during production operation without shutting down the applications (point-in-time copy). The original disk is referred to as the normal unit and the mirror disk as an additional mirror unit or Business Continuance Volume (BCV). This function is described in detail in the SHC User Guide [1]. As of Version 3.0 of SHC-OSD, it is additionally possible to use more than one BCV per disk in alternation (“multi BCVs”), which are aligned with the original on the basis of an incremental backup. One of the BCVs can always be in operation as a mirror whilst the other may contain, for example, consistent, “frozen” data. If the current mirror is detached and the second BCV is attached, only data blocks that have been changed in the meantime are copied onto the second BCV. Because SRDF source units and SRDF target units can be equipped with a BCV mirror, this function can also be used for (asynchronous) disaster recovery. Reference is made to this option in sections 4.4 and 4.5.

2.10 Global storage (GS) Global storage, or GS for short, is a fast, semi-conductor based storage expansion and can be connected to S series BS2000 systems. GS consists of one or two mutually independent hardware units (GS units), each with its own storage controller and power supply. If there are two GS units, they can be operated in parallel (dual mode) so that the same data is written to both GS units in one write operation. This can compensate for the failure of a complete GS unit. Normally, a GS unit is also fitted with a battery so that no data is lost even if the power supply fails. GS can be used to emulate hard disks (called GS volumes, a type of RAM disk) that are several times faster than conventional disks, or as a cache medium for conventional BS2000 disks. If the system fails, the cache data is retained in GS and is restored when the system is restarted so that dual GS (with battery) can be used without restriction to ensure reliable write caching. In our investigation, the use of GS as a cache medium is important because it is essential to include the cache areas of critical files in disaster recovery (section 4.2). Detailed information on the use of GS as a cache is provided in the DAB User Guide [2].

2.11 Consistency groups The term “consistency groups” is derived from the special EMC “consistency groups” feature and also from a function of the same name associated with the IBM “XRC” feature (XRC is a type of asynchronous remote mirroring) and is used more generally in this paper to enable, in particular, files buffered by the DAB in global storage to be included in disaster recovery concepts. Consistency groups are always defined in environments in which data is mirrored in a standby data center to ensure data availability in disaster recovery.

The term consistency group is used when the abstract grouping of all physical and logical data volumes, on which the logically dependent data of an application (standard example: database application, transaction and logging) is distributed, can be monitored and controlled on the host side to ensure data integrity and data consistency of the application on mirrored data volumes in the standby data center in any failure scenario.

Examples of consistency groups

The data of an application can be spread over several Symmetrix source units in a Symmetrix system. If each source unit is RAID1-protected, the term “consistency group” can be used.

The data of an application can be spread over more than one Symmetrix system. To obtain a consistency group, either the EMC concept of “consistency groups” must be used (see below) or the data must be spread over the Symmetrix systems in such a way that logically dependent IO operations always address the same Symmetrix.


Databases use their own write caches (in RAM and therefore “volatile” caches); these are synchronized with the disks bymeans of database-specific algorithms. If after a host failure the database system is able to reconstruct the database usingbefore-images, the disks and cache of the database form a consistency group.

Example of the lack of a consistency group: The data of an application can be distributed in a Symmetrix system and in a non-volatile write cache of system software(DAB write cache in GS, see 4.2). This type of distribution does not generally represent a consistency group because the twosubsystems (Symmetrix and caching software) do not maintain the requisite “intimate” communications.

Similarly, the data of an application can be distributed in a Symmetrix system and in a GS volume (see section 2.10). Thistype of distribution does not generally represent a consistency group because again the two subsystems (Symmetrix and theGSVOL software) do not communicate with each other.

At EMC (Symmetrix) and IBM (ESS), “consistency group” also refers to environments in which data is mirrored (SRDF, XRC) ina standby data center to ensure data availability, even if a disaster occurs.

EMC defines consistency groups as groups of SRDF devices which are grouped to support controlled data integrity and dataconsistency of a database distributed over several SRDF units. The SRDF source units of a consistency group may bespread over several Symmetrix systems. Consistency groups are implemented by means of a consistency protocol whichensures that SRDF mirroring is disabled for all subsequent IO operations on devices of a consistency group when a fatalerror occurs for IO operations for a source/target pair of the consistency group (e.g. communication not possible with thetarget unit). This prevents the remote data on the target units of the consistency group from becoming inconsistent. IBM defines consistency groups as groups of data records of XRC files whose transfer to the target units of the remote ESSsystems is guaranteed to take place in the same order as in the applications. Preservation of the order of the write IOoperations is of fundamental importance for dependent IO operations on various devices (standard example as above:database application, transaction and logging).

2.12 HIPLEX MSCF HIPLEX (Highly Integrated System Complex, described in [4]) is the Fujitsu concept for implementing capacity and availability clusters with several BS2000/OSD business servers. The HIPLEX MSCF (MSCF = Multiple SystemControl Facility) software product provides the necessary infrastructure for capacity and availability sharing, as well as basicfunctions for distributed applications.

Fundamental to MSCF clusters is communication between the participating processors on the basis of BCAM transport links.Jobs involving function execution and check messages for monitoring the cluster are exchanged between the processors.

In MSCF clusters, the shared pubset cluster is the most important cluster type. In addition to communication links, allparticipating processors in this type of cluster have access to shared disks – the shared pubset. The processors in the clustermonitor each other by means of two independent data routes, i.e. with the aid of the shared disks on the one hand andcommunication paths on the other, so that a processor crash can be reliably detected by the redundancy of the monitoringpaths. If a processor crashes or a communication path fails, appropriate reconfiguration measures guarantee the continuedoperational readiness of the shared pubset cluster.

XCS (Cross Coupled System) clusters are an extension of the shared pubset cluster. Synchronization mechanisms across allprocessors permit management of globally available resources and operation of distributed applications with data access to theshared data volumes.

The mechanisms for detecting processor failure are an integral part of the shared pubset cluster and the XCS cluster and alsoof the CCS (Closely Coupled System) cluster without shared pubset. They allow implementation of standby configurations tominimize application downtime. Implementation of this kind of configuration is supported by another member of the HIPLEXproduct family, the HIPLEX Availability Facility (HIPLEX AF).

In addition to monitoring via BCAM links, the participants in a shared pubset cluster monitor each other via shared pubsets. Forthis purpose, a watchdog file is automatically set up by DMS when a pubset is imported to the master processor in sharedmode. Each sharer (both master and slave processor) periodically writes an incremental counter to the watchdog file and readsthe vital-sign messages of the other sharers (disk log). Potential failure of a sharer is recognized by the fact that it ceases towrite vital-sign messages, i.e. its counter stops making increments.

2.13 HIPLEX AF The HIPLEX AF (Availability Facility) software product increases the availability of applications in the event of an application orsystem failure in a cluster of several BS2000/OSD business servers.

HIPLEX AF operation is based on the principle of redundancy. Instead of one system handling the load requirements of allapplications, several systems are installed to share the load during normal operation. If one of these systems fails completely orpartially, intact systems take over the most important applications although this may result in reduced performance. To allowswitchover to take place, the applications and their resources must be appropriately configured.


During normal operation, users of the monitored systems are unaware of the presence of HIPLEX AF. If a system failure occurs, HIPLEX AF switches applications that are running on the work system to a standby system. This automatic switching is done very quickly and also in unmanned operation. It is also possible to use a HIPLEX AF switch command to switch applications from the work system to a standby system at any time, for example before servicing the work system or installing a new software version. Downtime of applications is minimized in various ways through use of HIPLEX AF:

Time is saved by automatic failure detection. In unmanned operation, this time saving is crucial for resumption of production operation.

Time is saved when restarting the work system. Time is saved if it is necessary to repair hardware. Due to the reliability of switchover (guaranteed by proven procedures).

HIPLEX AF uses PROP-XT administration procedures that are started on the work and standby systems of the shared pubset cluster. The HIPLEX AF administration procedures communicate with each another by means of job variables on the shared pubset. A precondition for optimal use of HIPLEX AF in disaster recovery concepts is a shared pubset cluster and the deployment of HIPLEX MSCF which result in the type of configuration shown in Fig. 2-2.

EMCEMC

Schematic diagram of an optimal “HIPLEX AF-capable” disaster recovery configuration

Host 1

EMC 1 EMC 2

SRDFSources /Targets

Targets /Sources

Host 2MSCF

Fig. 2-2: “Automatable HIPLEX AF-capable” disaster recovery configuration Due to the “cross-cabling” between hosts and disk storage systems, configurations of this kind are also called X-link configurations. HIPLEX AF establishes a high-availability cluster between the two hosts by enabling the systems to monitor each other and also their applications. One or more applications and their hardware and software resources are defined as a “switch unit”. Supervision of the applications and their resources by HIPLEX AF is referred to as “monitoring” of the switch unit. In the event of system failure or failure of important resources, the switch unit is switched to the standby host in line with customer-defined criteria (this is known as “switchover”). In HIPLEX AF it is additionally possible to detect failure of a Symmetrix system, to automatically activate the target units, and to restart the application with these units (failover). If HIPLEX AF is deployed for automatic disaster recovery, the decision as to whether failover is performed is made by the monitoring software. Failover can also be performed in unmanned operation.


It is also possible to define several applications running on the work and standby host as separate switch units; these can thenbe switched over independently of each other. Lists of individual target units can be supplied in the switch unit definitions; theseare then activated either semi-automatically or fully automatically when switchover is performed.

2.14 Symmetrix Host Component in BS2000 (SHC-OSD) SHC-OSD is the control and management software for Symmetrix systems in BS2000. In conjunction with the associatedSymmetrix firmware (microcode), it controls the Symmetrix SRDF and TimeFinder functions integrated into the BS2000operating system or allows configuration data of the Symmetrix to be determined. This product is described in detail in the UserGuide [1].

The EMC service engineer is responsible for setting up SRDF units and BCV additional mirror units.

SHC-OSD (from V4.0 on) is a tool with which the Symmetrix actions needed to perform failover and failback can beimplemented either manually by means of commands (or procedures) or automatically under the control of HIPLEX AF. Dominomode for an SRDF disk pair can be enabled and disabled by means of SHC-OSD command. SHC-OSD also providesmonitoring functions to display status changes to the configuration of Symmetrix systems, the status of devices, and the statusof remote copy operation. If status changes are detected, SHC-OSD displays a conspicuous message on the console whichrequires a response.

2.15 Dual Recording by Volume (DRV) DRV (Dual Recording by Volume) is a software-driven recording procedure which allows data to be duplicated on two disks. Itscounterpart is SRV (Single Recording by Volume).

DRV is implemented in the I/O system of BS2000 and is transparent to the Data Management System and the applicationprograms. DRV operation is initiated, controlled, monitored and terminated by a series of commands issued by the operator orsystem administrator.

The DRV mode in which data is duplicated is called dual mode. It increases the availability of the data stored on the disks. Eachwrite job from the Data Management System is performed on both disks, and each read job is handled by the disk with theshorter IO queue or alternatively by a selected disk.

If a disk fails, the system can switch to mono mode. Operator intervention is not needed. The operator or system administratorcan replace the defective drive with a drive of the same type without interrupting the applications. Mono mode differs from SRVin that dual mode can be resumed during operation by attaching a disk with an identical volume serial number (VSN). Theswitch from mono mode to dual mode is known as reconstruction. Data is copied to the added disk and user input/output can beprocessed at the same time.

DRV supports pubsets and private disks of all device types but shared pubsets cannot be used for DRV.

Due to the geographic separation of original disk and mirror disk (the disks can also, theoretically, be a long distance from theCPU), interrupt-free data access is possible despite the non-availability of a disk storage system if, for example, a disasteroccurs. The DRV subsystem of BS2000 is therefore a software tool that offers similar functions as SRDF. It can be used as analternative to SRDF in disaster recovery scenarios.

Whereas, in the event of a disaster, the SRDF targets must be activated by means of SHC-OSD commands, the DRV mirrordisks can be used directly by the standby system after an UNLOCK-DISK.

More powerful functionality is offered by the SRDF variant. This variant also accords with the strategic alignment of Fujitsu.

Nevertheless, DRV is worth considering in disaster recovery concepts, particularly with ESCON links and where distancesbetween work and standby data center are short. To reduce the bulk of this paper, no disaster tolerant architectures with DRVrather than SRDF mirroring have been included in chapter 4. If concrete projects arise, this option can be described and discussed at any time in the future.


SRDF DRV Copy modes Synchronous, semi-synchronous,

asynchronous Synchronous

Extension of a write IO operation

In synchronous mode: + signal traveling time via SRDF link + data storage in cache of remote Symmetrix

Two IO operations are carried out in parallel, the second interrupt determines total IO operation duration. Performance better than SRDF over short distances.

Cabling Copy is made via SRDF link Copy is made via a second ESCON (or FC) connection, cross-cabling is mandatory

Distance between prod. DC and standby DC

Possible even over long distances Distance should not be more than 9 km for ESCON links. Long distances possible with FC link

Shared pubset mirroring Possible Not possible Home pubset mirroring Possible Possible Consistency groups RAID1 disks and logically dependent data

should not be distributed over more than one Symmetrix system

RAID1 disks and logically dependent data should not be distributed over more than one Symmetrix system

Resources required SRDF link CPU power, main memory, channel Costs SRDF license Leasing costs for DRV subsystem

Table 2-1: Comparison of SRDF and DRV


3 Prerequisites for disaster recovery in BS2000 This chapter describes the basic requirements for disaster tolerant architectures such as data mirroring and network connectionto work and standby hosts. Due to the large number of potential influencing factors as mentioned in the introduction, only anoverview of requirements is provided. Procedures to ensure the availability of online data in the standby data center andtechniques for bridging long distances for purposes of data mirroring are discussed and assessed. The organizational andadministrative measures needed to help implement a disaster recovery concept are also addressed.

3.1 Organizational measures We would first like to identify organizational measures that must be put in place in any disaster recovery concept. These dependlargely on the data center involved, on the data center staff, and on the existing infrastructure.

3.1.1 Emergency precautions In the view of the Bundesamt für Sicherheit in der Informationstechnik (German Information Security Agency) the followingissues must be addressed by IT operators with regard to emergency precautions (refer also to the IT Baseline ProtectionManual [6]) :

Development of a survey of availability requirements Definition of “emergency”, person-in-charge of emergencies Development of an Emergency Procedure Manual Documentation on the capacity requirements of IT applications Definition of “restricted IT operation” Study of internally and externally available alternatives Responsibilities in an emergency Alert plan Development of a restart plan Development of a data backup plan Replacement procurement plan Emergency preparedness exercises

Issues relating to the “restart plan” have already been covered by automatic failure detection and automatic or semi-automaticrestart of production applications in the standby data center, as provided for BS2000 systems by the HIPLEX AF product.However, an emergency procedure manual should also be developed in which not only individual responsibilities aredocumented but also all the steps automated by HIPLEX AF to restart applications.

3.1.2 Emergency procedure manual An emergency procedure manual describes the manual procedures adopted in both participating data centers in the event of adisaster. Such a manual is, in practically all respects, customer-specific and application-specific. To ensure a clear, unambiguous application restart process in the event of a disaster, roles must be defined for participatingstaff. Each role determines the individual tasks and responsibilities involved in restart.

Such roles could be:

Person-in-charge of emergencies Emergency manager Emergency network administrator Emergency system administrator Emergency application support

For example, one or more persons are nominated as persons-in-charge of emergencies. They are authorized and aresufficiently competent to decide whether the situation in the work data center requires a switchover; in other words, to “declare”that a disaster has occurred. Emergency managers can also be nominated. These coordinate further procedures after a disasterhas been declared and instruct system administrators, for example, to take certain necessary actions.

The emergency procedure manual could then specify the actions and reporting channels for each role in the form of checklists. On request, Fujitsu offers support in preparing an emergency guide. This type of support has already been provided by our Competence Centers.

3.1.3 Regular emergency exercises Extract from the IT Baseline Protection Manual of the Bundesamt für Sicherheit in der Informationstechnik (see [6]):


Emergency preparedness exercises serve to check the effectiveness of measures in the field of contingency planning. On the one hand, the effective and smooth execution of a contingency plan will be tested in an emergency preparedness exercise, and on the other hand, previously undiscovered shortcomings will be detected. Typical exercises are:

alerting exercise, conducting fire drills, functional testing of generators, restart after failure of a selected IT component; and restoring of data backups.

The results of emergency preparedness exercises have to be documented. Emergency preparedness exercises are to be held at regular intervals. Since such exercises can have a disruptive effect on normal operations, their frequency should be geared to the threat scenario; however, the pertinent exercises should, as a minimum, be held once a year. Staff training activities (first-aid, fire-fighting, etc.) must be carried out to a necessary extent. Additional controls:

Are emergency preparedness exercises held at regular intervals? Do detected shortcomings give rise to a revision of contingency plans and/or emergency procedure manual?

3.2 Setup of a standby data center A standby data center must be equipped in such a way that it can run the sum total of relevant production applications (of the work data center). Similarly, in a symmetric disaster tolerant architecture both data centers must be able to run the sum total of all relevant production applications of both data centers. Here particular reference is made to the RPF performance of the host, memory configuration, channel throughput and disk capacity. The RPF performance can be provided in a cost-effective way as “capacity on demand” with “hot-extra CPUs”. For this purpose, the hardware resource requirements of all applications involved must be determined. If several applications are to be switchable, sufficient LAN connections or HNCs (High-speed Net Connect, network connection for BS2000/OSD systems) must be available so that the required number of IP addresses can be provided. Likewise, the configuration of the networks at both sites must also be extended to ensure trouble-free operation with additional users in the event of a disaster. It may also be necessary to provide additional tape devices and printers so that the required tape device and printer capabilities are also available in the standby data center if a disaster occurs. In summary, two almost identically equipped (mirrored) data centers must be available.

3.3 Preparation of data network Users must have a network connection to the particular standby host. It would, of course, be desirable to have a configuration which would redirect all users affected by a disaster to the standby host with little or no intervention on network components (switches, routers, etc.). In many cases it would be equally desirable if redirection were possible without intervention on the clients because this could, in certain circumstances, entail substantial effort or even reinstallation work. The following two chapters outline possible options and also address the subject of firewalls. The diverse network topology options and the associated hardware are not dealt with in this document. Generally, the existing equipment can be used as a basis for devising an appropriate disaster recovery solution.

3.3.1 Redirection of the network connection Ideally, the network connection of both systems (hosts or set of VMs) should be designed so that the users who are generally located at a third or at different sites have access to the application regardless of which of the two hosts is currently running the application. It must also be ensured that the network components such as routers, switches, HUBs, HNCs in both data centers are dimensioned in such a way that they are able to handle the access operations of all users of both hosts in one of the data centers (in the event of a disaster). There are different procedures for redirecting the network connection. Which procedure is best for which customer should be decided on the basis of the (existing) network topology type. Two basic options are described in simplified form below. Use of virtual hosts and dynamic routing If an application is switched to a different system, the “connector” via which the application communicates with its environment in the network changes in physical terms. You can hide this change from users. You do this (in BS2000 environments) by defining virtual hosts in addition to the real host and by linking the applications with the virtual hosts. To enable applications to be addressed with the same network address even after a restart on a standby system, the host name must remain unchanged in the network. This can be achieved by opening the application whose network address is to remain unchanged (transparent to users) on a virtual host. The virtual host must also be generated as a virtual host on the standby system of the disaster tolerant architecture. The name of the virtual host must also be included in UTM generation. In the network, only one single system with the host name of this virtual host may be visible; in other words, the virtual host may be active on one real host only. Use of virtual hosts is the most transparent solution for users affected. A prerequisite is that the IP addresses of the work and standby host are either in the same (physical) LAN segment or that corresponding routing entries are made for the virtual host


address(es) in the routers of the standby data center. Further information on virtual hosts is provided in the Product Manual for HIPLEX AF [3]. Fig. 3-1 shows a simplified example of a configuration as could be used on a campus. The two hosts are in the same logical subnet 139.20.10.x and the LANs of both data centers are interconnected by, for example, an ATM backbone. The two hosts could also be in different subnets and be reachable via different routers. If a disaster occurs, virtual host 139.20.10.5 is started on host B and BCAM sends corresponding ARP (“Address Resolution Protocol”) packets so that the virtual host is then known again in this LAN segment, albeit at a network port of real host B. In this type of configuration, network switching would be possible in a very short time (a matter of milliseconds) without the need to make any changes in routers, partner servers or at the users (clients). For purposes of a disaster recovery concept we recommend locating the affected hosts in the same subnet. If, as recommended in section 3.4, a WDM link is used for data mirroring, it can be used simultaneously for the (broadband) interconnection of the two physical LANs.

HNC

Router/ Switch

Building1

Host AReal: 139.20.10.1Virt.: 139.20.10.5

Hub

HNC

Building2

Host BReal: 139.20.10.21Virt 139.20.10.5

Hub

Building3

Router/Switch DNS

Router/Switch

Public network

Backbone (e.g. ATM or WDM)

Fig. 3-1: Network connection with virtual host If both buildings are in different logical subnets, the virtual hosts must, as a preparatory step, be added to the tables of all participating routers so that access operations can also be routed in Building2. Here however, the two (logical) subnets are “mixed”. If more than the three routers shown in the above figure are involved, they should work with a dynamic routing protocol such as OSPF (“Open Shortest Path First”, RFC 2328 etc.) or IGRP (“Interior Gateway Routing Protocol”) which produce rapid routing convergence. Reconfiguration of the routing in the event of a disaster is certainly not an acceptable option. One way of establishing transparency of network addresses for users and of logically separating the possibly different LAN segments is to deploy virtual LANs (VLAN in accordance with IEEE 8021.Q). Special layer3 switches are needed for this purpose; for example, CISCO 4908G-L3. Use of DNS Another option is to work with DNS (Dynamic Name Server). If users address their application (or host) using DNS names, the entries in the DNS server of the user network and the IP addresses of the target systems must be changed in the event of a disaster and, if necessary, a zone transfer may have to be performed. The addresses or subnets needed in the event of a disaster must be defined beforehand. Consequently, nothing changes on the network addressing level. However, if a large number of DNS zones are involved, this approach may lack clarity. It is not a viable solution unless all the applications to be switched use DNS. Switchover by changing the IP address entries of the target systems in the DNS server of the user network is not offered as an automatic switching option by HIPLEX AF.

3.3.2 Firewalls If two data centers are included in a disaster tolerant architecture, the hosts will also be behind different firewall systems. Security policies of both data centers must then be aligned. If, for example, certain services needed by production system users on the work host are locked in the LAN segment of the standby host, it will be necessary to remove or redesign such


restrictions. If “Access Control Lists” (ACLs) are used, they must be extended. As there are countless firewall configuration options and parameters, this topic is not discussed in detail here. It is sufficient to point out that the firewall issue must be addressed at an early stage in disaster recovery planning.

3.4 Procedures relating to availability of online data in the standby DC A fundamental aspect of disaster recovery is provision of the data of the applications to be switched in the standby data center. If a disaster occurs, it must be assumed that current data of all applications on online data volumes (disk storage systems) is no longer available. In a worst case scenario, the disk storage systems are destroyed and it is impossible to restore their data. In many documents of diverse vendors (e.g. in the IBM Redbook entitled “SAN Distance Solutions” [8]) disaster recovery safeguards are listed in seven different categories (tiers) which are arranged as follows in ascending order of convenience and cost.

Tier 0: No disaster recovery protection Tier 1: Regular external storage of backups PTAM (“Pickup Truck Access Method”).

The backup platform is established only in the event of a disaster Tier 2: Regular external storage of backups. The backup platform already exists Tier 3: Each day selected files are transferred electronically to the backup platform Tier 4: Bidirectional asynchronous disaster recovery, high-bandwidth links between the two platforms Tier 5: Synchronous update in the standby data center, tape transport no longer needed Tier 6: Automatic switchover in the event of a disaster

At this point we would like to make explicit reference to three typical procedures relating to the availability of online data in the standby data center in BS2000 environments. In synchronous disaster recovery configurations Mirroring of data with synchronous SRDF (online mirroring) The online data of the production applications resides on one or more Symmetrix systems and is mirrored to Symmetrix systems in the standby DC using synchronous SRDF via remote links. The current data (locally protected) always resides on RAID1 source units in the production DC and also (again preferably locally protected) on RAID1 target units in the standby DC. The fundamental statement in this configuration is that no data is lost in the event of a disaster.

corresponds to tiers 5-6 in IBM terminology In asynchronous disaster recovery configurations 1. Mirroring of data with asynchronous SRDF (online mirroring)

The online data of the production applications resides on one or more Symmetrix systems and is mirrored to Symmetrix systems in the standby DC using asynchronous SRDF via remote links. The current data (locally protected) always resides on RAID1 source units in the production DC. BCV units on which consistent data states can be frozen are assigned to the target units in the standby DC. Because data alignment is not continuous in asynchronous SRDF operation (the order of application write IO operations is not preserved in asynchronous SRDF), the application must be stopped periodically in the production DC and the SRDF volumes must be synchronized in order to keep data consistent. The application can be restarted after synchronization and detachment of the assigned BCV mirror volumes from the target volumes in the standby DC.

corresponds to tier 4 in IBM terminology 2. Regular transport of backup data volumes (offline mirroring)

In this variant, data is not transferred electronically from the production DC to the standby DC but physically in the form of tape cartridges containing data backups. There are, necessarily, similarities with the procedure used to transfer data backups electronically from one DC to another. Exchanging of physical tapes in general data center operation has been purposely reduced over the last 20 years because of its disadvantages (need for tape devices, tape wastage, operator costs, and due to the fact that the data at the target site is not current). The move to data transfer via data links went hand in hand with progress from open tapes to tape cartridges and robotic tape mounting, and ultimately to virtual devices and volumes. In addition to backup runs, operators must carry out daily duplication runs in the production DC and restore runs in the standby DC using directly attached tape cartridge systems. This and the necessary tape transport logistics produce a very unsatisfactory result in terms of data currency at the target site.

corresponds to tier 2-3 in IBM terminology A different variant which corresponds to tier 1 is to engage a service provider (e.g. Restart) who is capable of delivering suitable equipment within a few days after a disaster. The sole precaution taken is that backup tapes are stored externally. Such service providers are also able to provide a Symmetrix system with target units for data mirroring as well as VM guest systems and extra CPUs as a cold standby. CPU performance is not enabled until a disaster occurs. This variant is of interest as a possible cooperation model between the data centers of different companies.


3.4.1 Assessment aspects for the configuration variants A. Currency of the data in the standby data center B. Time between declaration of a disaster and restart of the applications C. Costs of additional staff and resources required D. Impact on production performance E. Protection of data against unauthorized reading F. Robustness of the procedure An assessment matrix of the configuration variants for the above six criteria is shown below. The scores are for broad comparison purposes only: 1 = best, 3 = worst. The individual criteria are weighted differently in terms of importance.

Synch. SRDF Asynch. SRDF Tape Transport A. Data currency 1 2 3 B. Downtime 2 1 1 C. Costs 1 2 3 D. Performance 2 1 3 E. Data protection 1 1 3 F. Robustness 1 2 3

Table 3-1: Assessment matrix for the configuration variants Variant 1 is obviously functionally outstanding (= hot standby) and - due to the fact that data is always equally current at both sites, certainly not the case with the other two variants – clearly scores best. Explanation of assessment criteria 1. Data currency

In synchronous SRDF, the data in the standby data center have the same status as in the work data center at all times. There are two levels in the event of a disaster: with or without domino mode (2.8.2). With domino mode, the data is current in any conceivable disaster scenario – in return, a lesser degree of high availability must be accepted. Without domino mode, rolling disaster scenarios are conceivable in which the data in the standby data center is indeed always consistent but possibly not quite up-to-date (e.g. if the SRDF links fail first and a subsequent disaster causes the host and/or Symmetrix system to fail). Refer to 2.8.2. In asynchronous SRDF, data currency depends on how often consistency states are “frozen” (see 4.4). In the case of tape transport, data currency depends on the frequency of tape transport – at best, tapes will be transported once per day.

2. Downtime Downtime with the asynchronous variants is slightly less than for the synchronous variant because target activation can be dispensed with if work continues with the BCVs in the standby DC. Refer to section 4.4.2 for more details. We regard this criteria as being far less important than data currency.

Costs In the first two variants (synchronous SRDF mirroring and asynchronous SRDF mirroring) the functionally weaker second variant is probably slightly more expensive because the additional BCVs required for splitting (for each logical disk pair) cost more than perhaps a slightly improved network configuration for peak loads with the first variant. A mix of the two SRDF variants – synchronous SRDF mirroring for some applications and their disks and asynchronous SRDF mirroring for others – does not seem to be particularly practicable because network configuration planning is made difficult for continuous synchronous operation in conjunction with periodic asynchronous disk alignment. The tape transport variant is also unacceptable. It requires additional resources for tape transport, operating, tape material and tape devices. In terms of absolute costs, this variant also suffers greatly by comparison because it gives rise to by far the highest personnel costs. 3. Performance

Performance during current operation is least compromised by asynchronous SRDF. In synchronous SRDF the IO times for write IO operations are longer (see 3.4.3). In the tape transport variant, the IO times are unchanged but this must be set against the lengthy periods spent restoring data each day in the standby data center.

4. Data protection With the exception of the tape transport variant, all specified variants feature a high level of data security through the use of fibre-optic links on which data can be intercepted only with great technical effort. If WDM technology is used (see section 0), decryption of time division multiplexing (TDM) would also be necessary and a specific wavelength (λ) would have to be filtered out. Potential “spies” would therefore need their own WDM equipment as well as knowledge of the company’s equipment. On ESCON or FC protocol level, data is also fragmented into blocks with a maximum size of 2kB and these blocks can be spread over different disks. In asynchronous links, data interception is made even more difficult by the unordered sequence of data over the SRDF link. If WAN technology is used (see section 0), it is simpler to “intercept” data on the links – it might be necessary to use channel extenders with encryption functionality.

5. Robustness How robust or error-prone procedures are is determined by the additional activities that are necessary to provide the online


data at the standby DC. The first variant is particularly robust because no regular intervention is needed aftercommencement of synchronous SRDF mirroring. The second variant is just as robust because data alignment can beautomated. However, this requires certain maintenance activities where errors can, of course, occur. The third variantcannot be automated so there is a daily risk of errors.

Above, it has been our intention to clearly indicate that the synchronous procedures are, in almost every aspect, superior to theasynchronous procedures. The next section is devoted to a discussion of disaster tolerant architectures in which we havetherefore restricted our discussion almost exclusively to synchronous disaster tolerant architectures. As concerns asynchronousdisaster recovery configuration, only data mirroring using asynchronous SRDF is dealt with in any detail.

3.4.2 Remote link variants for long distances The following are needed to establish SRDF links over long distances (information on distances is provided in section 3.5,“Distance-dependent restrictions”).

Either, a WAN connection between the two Symmetrix sites and a channel extender at each site. The channel extender converts the SRDF protocol into a network protocol (T3, ATM, etc) and thus establishes the linkbetween Symmetrix and the network port. An alternative is a TCP/IP connection and a “storage to IP gateway” instead of achannel extender. Or, a fibre-optic connection (point-to-point) over the entire route between the two Symmetrix systems and WDM technology;a basic introduction is provided in [7].

WAN network connection A network connection between the two sites and one “channel extender” (i.e. a protocol converter for conversion from ESCONor FC to the network protocol (T3, ATM, etc.)) per site is required for WAN connection of the Symmetrix subsystems to be linkedvia SRDF. Vendors of such channel extenders are, for example, Inrange Technologies with their “9801 SNS” Storage NetworkSystem and Computer Network Technology (CNT) with products of the “UltraNet Product” family. The data to be transmitted iscompressed in the channel extenders and sent using the appropriate network protocol. The network must have broadbandcapacity to match the data stream and should be available exclusively for SRDF data transmission. The duration of a singlewrite IO operation is prolonged by the conversion times in the channel extenders, by the line delays which increase withdistance, and in particular due to the use of several routers or ATM switches on the network path. Data compression and dataencryption procedures in the channel extenders can also add to IO time.

Costs incurred relate to procurement of channel extenders and costs for the permanent, exclusive and broadband networkconnection (at least T3 34 Mbit/s, preferably ATM 155 Mbit/s or higher).

You can choose from amongst various technologies, hardware vendors, network operators, telecoms providers and service providers with different business and service models when deciding on the form of your WAN connection. Fujitsu offersconsulting and support services to help you make the right choice.

WDM technology Multiplexing allows existing communication lines to be shared for the transmission of data and is therefore more cost effective.Time division multiplexing (TDM) has long established itself in telephone networks. In TDM, individual connections are dividedinto time slots and a fixed clock pulse is allotted to each transmission channel.

However, the best method currently available for SRDF links over long distances is wavelength division multiplexing (WDM)technology or a combination of WDM and TDM.


Multiplexers: TDM - WDM

TDM Time Division Multiplexing

WDM Wavelength Division Multiplexing

Source: IBM Redbook “Introduction to SAN Distance Solutions” [8]

Transmission of combined signals over the same physical connection

TDM and WDM can be combined

Fig. 3-2: TDM and WDM multiplexers Wavelength division multiplexing is an optical method that works exclusively with optical components. Fibre-optic cable that transmits laser light is used. The fact that laser light is made up of light of different wavelengths is exploited. At the sender site the optical signals (FC, ESCON or others) are converted into electrical signals, modulated onto a laser of a specific wavelength, combined (multiplexed) with other lasers, and then transmitted in a single fiber, as shown in Fig. 3-2. At the receiving end, the reverse operation takes place; the individual signals are separated (demultiplexed) and then directed to their corresponding receiver diodes. In this way, 64 (and in the near future 128) bidirectional channels can be implemented on a single fiber. This number rises to a current 4000 bidirectional channels on a fibre-optic cable. Furthermore, due to their different wavelengths, there is no interference between the channels. The individual channels distinguished by their different wavelength are also known as λ‘s (lambdas). This optical method therefore permits transmission of a very high volume of data over a very limited number of cables. WDM has the potential to optimally exploit existing fibre-optic network infrastructures. An added advantage is that the laser signals can be amplified on an exclusively optical basis if long distances need to be covered. Erbium doped fiber amplifiers (EDFAs) are available for this purpose. They should be used at intervals of between 60 and 150 km (depending on the WDM equipment used and on the fibre-optic cable). This eliminates the time delays experienced when converting optical signals into electrical signals and vice-versa and any associated performance loss. If attenuation (db/km) of the fibre-optic cable is low, several hundred kilometers can be bridged without electrical regeneration of the light signal. In addition to using a λ as an ESCON channel with an ESCON channel board in the WDM device, it is also possible, with other boards, to deploy FC, ATM, Fast Ethernet, Voice, etc. using other λs. For example, an entire corporate network could be set up using WDM technology (refer to section 3.3) because optical routing is also possible (the optical add-drop multiplexer enables optical ring topologies to be built). The costs incurred relate to renting, purchasing or leasing individual WDM-compatible, fibre-optic cables or individual λs and the necessary WDM devices. Some providers offer ESCON or FC channels and “managed service” as plug-in solutions. Costs associated with WDM are more favorable as compared with WAN connections and functionality and performance are much better. For all these reasons (better performance, future-oriented, more affordable) we give definite preference to WDM technology over WAN connections. Brief overview of the most important technical characteristics: WDM = Wavelength Division Multiplexing, DWDM = Dense Wavelength Division Multiplexing


Monomode laser technology 9 micrometer diameter Up to 64 (128) applications per link (DWDM) 64 (128) x 2.5 (10) Gigabits bandwidth of a fiber Up to 150 km without amplification Optical amplification only

WDM technology can be used for: EMC² SRDF (ESCON with FarPoint) Fiber channels Shared DASD Remote peripherals CTC connections Fast Ethernet, Gigabit Ethernet, ATM

The performance of synchronous SRDF-ESCON links is improved by EMC “FarPoint” firmware which permits parallelprocessing of several IO operations for different volumes via the remote links. FarPoint can however be used only forunidirectional SRDF links. With bidirectional SRDF links, it is necessary to combine the remote links into two unidirectional “RAgroups” (remote adapter groups) on installation in order to profit from the performance gains delivered by FarPoint. Multiple remote links and therefore multiple ESCON boards are needed to implement high-performance, remote linkconnections via ESCON.

The performance of synchronous SRDF-FC links depends on the distance involved, the switches used and their maximum“buffer-to-buffer credit rate”. Performance exceeds ESCON performance over “shorter” distances. The “break-even point” interms of distance can be calculated for defined configurations. If, for example, 1 Gbit/s switches are used (the length of a dataframe in the fibre-optic route is therefore 4 km, more information is provided in, for example, Fujitsu White Paper [10]) and the buffer-to-buffer credit rate is 60 (refer also to section 3.5), FC links over distances of up to 4 km x 60 = 240km (or single route of 120 km) demonstrate better performance than ESCON links.

The required WDM equipment is available from many vendors. A current sample selection is given below (similar products arelisted in the same line).

(Waveline), CNT/Inrange (Spectrum), ADVA (FSP), Alcatel (Optinex) Sorrento Networks (GigaMux), Inrange (GigaMux) Nortel Network (OPTera Metro) Cisco (Metro, ONS) CNT (UltraNet Wave Multiplexer) ONI (Online)

You can choose from amongst various technologies, hardware vendors, fibre-optic network operators, telecoms providers andservice providers with different business and service models when deciding on the form of your WDM connection. Fujitsuoffers consulting and support services to help you make the right choice.

3.4.3 IO operation distribution It is evident that if synchronous SRDF is used over long distances each write IO operation will receive a certain delay. This isexplained briefly by reference to the following example. Further information is provided in [9]. In synchronous SRDF, write IOoperation is terminated as soon as the data is written to the cache of the remote Symmetrix and an acknowledgment signal hasbeen returned. The time needed by a write IO operation over the SRDF link and remote Symmetrix is made up as shown below.In our example, we assume an ESCON-SRDF link using a WDM multiplexer over a distance of 250 km.

tlocal Time required by the local Symmetrix to write the job to the cache;we assume a time of 0.5 ms.

tprot Time required to convert the optical ESCON signal into a WDM laser signal.

tdata Time required by a 4-KB data block over the link; in our example, this time is determinedby the transmission speed of ESCON (SRDF) and is 0.3 ms (hypothetical value, minimum value at 200Mbit/s is 0.2ms)

tline Signal propagation delay caused by 250 km of fibre-optic cable; given a propagation speedof 200 000 km/s for light in fibre-optic cable, this delay is 1.25 ms over 250 km.

tremote Time required by the remote Symmetrix to write the job to the cache and to return anacknowledgment; we assume a time of 0.5 ms.


Because four conversions are involved (once each in each direction and at each site) and the route over the WDM fibre-optic cable must be travelled twice (by the data block and by the acknowledgment signal), the time delay when using SRDF and WDM can be calculated as follows.

Symm1cache

Symm2cache

WDM WDM

tdata

tremote

tlocal

tprot

tline

Write job

tprot

Fig. 3-3 Time needed for a write IO operation with synchronous SRDF

tsum = tlocal + tdelay + tdata + tremote (total time for a 4-KB write IO operation)

tdelay = 4 * tprot + 2 * tline (delay due to distance of the SRDF link)

Even if the conversion time of the WDM multiplexer is assumed to be minimal (tprot = 0), the total delay due to the long distance is

tdelay = 2 * 1.25 ms ~ 2.5 ms This results in total times of

tsum = 0.5 ms + 0.3 ms + 0.5 ms = 1.3 ms for a distance of 0 m and

tsum = 0.5 ms + 2.5 ms + 0. 3 ms + 0.5 ms = 3.8 ms for a distance of 250 km If a single disk and synchronous write IO operations are assumed, the following I/O operations are possible:

for a distance of 0: up to 806 write IO operations/s for a distance of 250 km: only 1000/3.74 ~ 267 write IO operations/s.

Comparison: If one automobile is used, only a limited number of persons can be transported over a distance of 250 km in a limited period, even if the highway is 10 lanes wide and high speeds are allowed. If several disks are used in parallel, it is possible to achieve the maximum data rate over the SRDF link. For this reason, before deploying SRDF links over long distances, it is necessary to investigate whether there are “hot-spot” applications that make sporadic high throughput demands with regard to write IO operations to disks. EMC offers an “SRDF Distance Kit” for customers planning to deploy a disaster tolerant architecture with SRDF over long distances. This kit includes a loading analysis of existing disks, identification of hot spots, and calculation of the required bandwidth over (D)WDM. Test equipment is also available. Essentially, this equipment comprise WDM Mux/ Demux from Sorrento Networks or Inrange Technologies and six fibre-optic cable rolls with additional hardware. It enables a “long-distance” SRDF link based on ESCON or FC to be simulated. The fibre-optic cables allow distances of up to 100 km (for 4 links) or 200 km (for 2 links) to be bridged in steps of 10 km. Consequently, it is possible to verify whether the required overall IO rate is achieved over the full appropriate distance.


3.5 Distance-dependent restrictions In many documents dealing with disaster recovery, configurations or clusters are allotted to one of three classes, each with adifferent geographic reach.

Campus/local area: covers a few kilometers at most Metropolitan area: covers an area of approx. 100 km Continental area: covers several hundred kilometers

Using the ESCON or FC protocols currently relevant for BS2000 systems, it is possible to implement campus solutions withoutadditional measures providing fibre-optic cable can be freely routed between the data centers (in particular in genuine corporatecampuses where there is no need to route cables across public ground). Greater reach in metropolitan areas will generallyrequire a network carrier to be called in to provide the required bandwidth (with acceptable runtime) on one or more fibers (“darkfiber”). WDM technology is best suited for such areas. The same applies for data centers that are only a few kilometers apartbut cannot be linked over corporate ground. For connections in the continental area it will be necessary to use a WANconnection as discussed in section 0. In this case, latency times of individual IO operations and the costs incurred must becritically examined.

On the physical level, the first choice is the fibre-optic cable type. Further information can be found in, for example, the FujitsuWhite Paper entitled “Storage connections over long distances” [10] and in the relevant literature. Today’s FC links bridge a maximum of 10 km between two ports. ESCON links may be up to 3 km and links between two ESCONdirectors (SCDs) may be 20 km. As a result of the links between SCD and host and between SCD and device controller anESCON route can be “stretched” to 26 km. These distance limitations can be overcome by using WDM routes. Currently, routes of up to 100 km are possible withoutamplification of the optical signal and up to about 200 km if optical amplification is used. For long distances, it is worthconsidering whether an ATM or TCP/ IP link is an available option. However, issues of this kind will need to be clarified with anetwork carrier or with a vendor of full solutions.

Performance-related limits can arise on ESCON or FC protocol level. Although these function over any distance, the maximumtransmission rate between two components is no longer achieved for certain distances. What’s known as a performance “droop”occurs as soon as the propagation time of an optical signal in the fibre-optic cable reaches the transmission duration of a dataframe due to the distance between sender and receiver. The sender waits a relatively long time for the acknowledgment signaland this decreases the effective data rate on the link. With ESCON for example, the transmission rate is 200 Mbit/s: as a result, a 2-KB frame (i.e. 20Kbit) is transmitted in 100μs. In fibre optics the propagation speed of light is 200,000 km/s, i.e. 10 km iscovered in 100μs. Consequently, for return distances of 20 km (10 km one-way), the sender waits a further 100μs for the acknowledgment signal and is therefore sending at only half speed (this is, however, a simplification because a 2-KB block onthe ESCON channel is sent in smaller portions, depending on the disk controller architecture, each with its ownacknowledgment signal. A detailed analysis of this problem is provided in [12]). However, wait times can be reduced by sendingseveral frames without waiting for an acknowledgement signal. With FC, “Buffer-to-buffer credits” are available for this purpose.As indicated in [8], section 2.4.5, ESCON droop begins after 9 km and, at 23 km, only half of the throughput is achieved. WithFC, this limit is reached at 4 km and 2 km due to the high data rate of 1Gbit/s and 2Gbit/s respectively. However, this limit cancurrently be extended to approx. 120 km in each case by means of the buffer credits of the deployed Fiber Channel switches orFC/IP gateways. How large the additional, distance-dependent propagation times may be will depend on the requirements ofthe applications.

If ESCON and FarPoint are used for the SRDF links, droop starts much later. In measurements with simulated delay times ofseveral ms (1ms delay corresponds to 200 km) we were able to identify only a slight, linear reduction of the transfer rates. Thehost to host links used primarily by BCAM and MSCF in our considerations are least time-critical because no high load isgenerated.

Unless otherwise stated, in the descriptions below we have limited the “campus area” to 10 km, the “metropolitan area” toapprox. 100 km. Anything beyond these distances is a “continental area”.

Summary of maximum distances: High-perf. host-peripherals connection for ESCON link: up to 9 km High-perf. host-peripherals connection for FC link: up to120 km High-perf. SRDF link using ESCON and FarPoint: up to several hundred km High-perf. SRDF link using FC and FC switches: up to 120 km High-perf. SRDF link using FC and FCIP gateways1 up to 200 km

SRDF links with ESCON directors and FarPoint are superior to SRDF links with FC directors as of 120 km (current “breakdownpoint” of FC).

All statements are based on the current state-of-the-art; values are not therefore absolute.

1 Refer in this context to [8] Appendix A; as compared with FC switches, these devices achieve a greater maximum buffer-to-buffer credit rate


3.6 Administrative preparations for disaster recovery In addition to the measures specified in the above sections, various administrative preparations must also be made to ensure trouble-free disaster recovery. A sample list of such preparations is given below but does not claim to be complete.

Entry of user IDs on the standby host; these IDs are switched if a disaster occurs Entry of host names / IP addresses of users of the switch unit (also if VM2000 is used) Creation of MRSCAT entries on the standby host for all data pubsets to be switched If OMNIS is used, the OMNIS IDs must also be set up on the standby system Creation of extended network configuration files (SOF, RDF) for the various sites and, if necessary, setup of virtual hosts

If VM2000 is used to separate applications (see also section 4.7), the following should be noted:

The mirror guest systems must be set up in the standby DC. The production DC values determined for CPU and main memory requirements are assigned to these systems

The network addresses of the guest systems for the switch units must be entered in the monitor system, virtual consoles or SKP consoles must be set up on the monitor systems for the additional guest systems

The disks for data and home pubsets set up in the standby DC are assigned to the VM; the same applies for the network access points (HNC) and the virtual consoles.

If HIPLEX AF is used, the requisite preparations must be made; these include:

Establishment of an MSCF cluster Additional setup of a shared pubset cluster for X-link configurations Definition of the switch units for the applications Creation/adaptation of failover/failback phases.

These preparations are described in [3] and [4] and are therefore not dealt with here.

3.6.1 Handling of system and paging volumes System volumes In a disaster tolerant architecture, it is possible to mirror system volumes using SRDF in exactly the same way as application volumes and to start the systems in the standby DC only in the event of a disaster or to make systems ready at the standby site in a VM2000 guest system framework that takes over applications if a disaster occurs. Alternatively, all applications run in a native system, even in the event of a disaster. The “mirrored home pubsets” variant has the following disadvantages:

If a disaster occurs, the time to start up the system is added to the time to start the application In normal operation a standby system cannot be used until after separation of the SRDF pairs Files (such as console logging files) associated with the session are also copied using SRDF although they are not needed in the home pubset of the standby system

All special files for the standby system (such as network configuration files) must already be incorporated in the work system A specific selection logic (that makes use of different parameter files) is needed to ensure that the correct special file is automatically selected from those available.

This model has the following advantages: Updates of general files (rep loader, for example) are automatically effective on both systems Changes to system settings maintained in system files (e.g. new ID with default Cat ID) are automatically effective for the standby system.

An alternative model would be a pair of identical but separate home pubsets (without SRDF mirroring). The list of advantages and disadvantages is practically the reverse of that for mirror mode. Both models cannot be meaningfully combined. In the case of independent home pubsets, an update of general files would have to be performed, by means of file transfer for example, on both home pubsets (initial testing of, for example, new rep loaders on the standby system would also be an advantage) and some system commands would have to be issued additionally for the standby VM. In the case of independent home pubsets, it is mandatory that in normal circumstances the standby system is already running (possibly with reduced main memory) and is supported by the same system administrator as for the original system. Paging volumes Mirroring of paging data has no advantages. Consequently, paging files should be created, if possible, on volumes that are not mirrored and do not contain mission-critical data. Alternatively, they should be created on the home pubset if it is not mirrored. In addition to disk storage capacity savings and the prevention of unnecessary loads on the SRDF link, there is the added advantage that startup of the systems is not also dependent on the SRDF status of the corresponding source or target unit.


3.7 Data backup concept Application data includes not only online data but also archived data. Archived data is data associated with completed business transactions that must be retained for longer period for legal reasons, e.g. so that complaints can be dealt with or past business transactions can be reconstructed if irregularities are suspected. The archived data must also be available in the standby data center so that normal business operations can be resumed in the event of a disaster. A basic requirement of any backup concept which is part of any disaster recovery concept is the separation of the location where backup volumes are kept. This location may be in a fire-protection zone or may simply be a location which is as remote as possible from the location of the server and storage systems and which is specially protected against various potential threats such as fire, flooding, explosion, theft, sabotage, etc. Generally speaking, a remote backup concept is always recommended. After all, access to backup volumes is the ‘ultimo ratio’ in the event of “logical errors”; in other words, when human error – and human error can never be ruled out – renders current production data (and, in mirroring concepts, also the mirrored data in a standby DC) unusable. In the configurations we discuss, tape backups are not needed for application restart (except in the case of a specific rolling disaster, see section 2.8.2). Consequently, planning of backup peripherals and mirrored data to be kept in the standby location depends on the necessity to access long-term backups in the standby data center at any time. Basically, several data backup concepts are conceivable in order to keep redundant tape archives: 1. Backups are kept in a second archive (which would not be affected by a disaster) by means of tape transfer and tape

copying. This archive can, of course, be located at the standby site. The cartridge systems used in the work data center must also be available in the standby data center. It should be noted that tape transport requires personnel resources. HSMS archives must be set up and maintained at the standby site so that long-term backup data can be imported at the standby site in the event of a disaster.

2. Tape backups are also made at the standby location. When backups are performed in the work data center and when the applications are not running, SRDF mirroring for the data pubsets is terminated by SHC-OSD command, the target units in the standby Symmetrix are activated, and the pubsets – on the target units – are also imported there. The same backup as in the work data center is then performed.

If the backup windows are not sufficiently long, backup using additional mirror units (BCVs) is an available option (refer to section 2.9 and [1]). Backup can be carried out simultaneously in the work and standby data centers. At defined synchronization points the BCVs are split off the source and target units and backed up on tape. This requires additional disk capacity. The backup procedure calls for additional routine work which can however be automated. File migration We do not recommend file migration on tape (S2) for mission-critical data. However, if this type of migration is necessary, it should be ensured that the files are also first placed in a protected backup archive.

3.8 Step-by-step establishment of disaster recovery prerequisites Because the prerequisites and preparations discussed in this chapter may entail great effort and substantial costs where high processing, disk storage and network capacities are involved, it may be appropriate to deploy the chosen disaster recovery concept step-by-step. For example, in a first step a standby data center could be equipped with the required host and disk capacity to permit rapid restart using backup cartridges in the event of a disaster. In a second step, data mirroring on disk level (SRDF) could be implemented, and in a later step the data network could be extended, if this were necessary. Alternatively, it would be possible to start with local data mirroring in which the SRDF target units are accommodated in a Symmetrix subsystem which is physically separated from the original Symmetrix subsystem with the source units. In a second step, a WDM infrastructure could be set up, and even later the Symmetrix mirror systems could be relocated. Obviously, different sequences of events are possible but exact planning is required.


4 Disaster tolerant architectures This chapter discusses various disaster tolerant architectures that can be recommended in terms of availability and currency of online data. We have restricted our discussion almost exclusively to synchronous disaster tolerant architectures. Of the asynchronous disaster recovery configurations only data mirroring using asynchronous SRDF is dealt with in any detail. Online concepts with real-time data mirroring allow operation to be resumed quickly with current data in the event of a disaster. In contrast, restart using backup tapes generally means losing one day’s data and involves time-consuming data recovery. All variants make use of data mirroring with SRDF. There are differences with regard to the SRDF mode (synchronous or asynchronous), Symmetrix host connections (cross-cabling = X-link configuration, no cross-cabling = U-link configuration), the distance between the two sites, and the incorporation of global storage. The necessary preparations, as described in sections 3.1 to 3.7, must be made for all configurations. Instead of synchronous SRDF, semi-synchronous SRDF (see 2.8.1) is always an available alternative if the possible loss of one write IO operation per (SRDF-) mirrored volume can be tolerated in the event of a disaster. When planning a synchronous disaster tolerant architecture, careful consideration must be given to the need to use domino mode. Decision criteria are described in section 2.8.2. The configuration described in section 4.2 where global storage is used as a write cache even mandates the use of domino mode.

4.1 X-link disaster recovery configuration The figure below shows a schematic diagram of an X-link disaster recovery configuration comprising two hosts (generally production and standby host). The hosts are geographically separate and each has a Symmetrix system. The hosts are linked via a network connection and each has access to both Symmetrix systems. All components are connected to all other components. This is an example of cross-cabling which facilitates shared pubset clusters. Basically, all hardware resources should be redundant in order to prevent a single point of failure, i.e. all disks should be RAID1 protected, all connections from the hosts to the disk controllers as well as the SRDF links should be at least two-path and follow different routes, and network connections to the users should also be redundant. Similarly, each SRDF source unit should be connected to its target unit via at least two remote link directors. This ensures a great degree of high availability. This redundancy is an essential precondition for the deployment of HIPLEX AF and, above all, for automatic disaster recovery. The configuration described is referred to as the “base configuration” in the HIPLEX AF Product Manual [3].

E M CE M C

S ch em atic d iag ram o f an X -lin k co n fig u ra tio n

P rod . h ost

E M C E M CS R D F (E S C O N / FC )R A ID 1

so urce sR A ID 1target s

S ta n d by h ost

ESCO

N/ F

C

D C / L AN

ESCO N / FCESCON / FC

ESCO

N/ F

C

Fig. 4-1: X-link disaster recovery configuration In an asymmetric disaster tolerant architecture (2.3) one of the two host acts as the production host and the other as the standby host or, in a symmetric disaster tolerant architecture, both hosts play the role of production and standby host.


This configuration is suitable for data centers with two rooms separated at least by a fire-resistant wall, each with its own power supply. Better still are separate buildings for the two data centers, both of which can act as the production and host data center. Usually the buildings are not too far apart and, if there is no intervening public ground, the term “campus solution” is normally used. If, however, local circumstances between the two sites prevent the direct routing of (fibre-optic) cables, a network provider must be commissioned and “U-link” or even asynchronous configurations (see 4.4) may have to be considered. The use of cross-connections can restrict the geographical distance between the data centers. If applications are to run on the standby host with access to the disks of the work Symmetrix in the event of system crash, provision must be made for the fact that performance of the ESCON protocol decreases noticeably over distances of more than 9 km. In principle, a configuration of this kind can be also operated over longer distances via a leased network connection. If FC technology is used (possible on S1/SX systems with BS2000 OSD V5) which, in turn, deploys special flow control mechanisms (buffer credits), the distance that can be bridged without loss of performance can be increased to approx. 100 km (refer also to section 3.5). X-link disaster recovery configurations are characterized by the following:

Synchronous SRDF mirroring Current data in the standby data center in the event of a disaster Cross-cabling of mirrored EMC systems, shared pubset cluster (MSCF) Most suitable for distances of up to approx. 10 km between production and standby DC Downtime in the event of a disaster of approx. 30 min Symmetric disaster tolerant architecture is possible Optional deployment of HIPLEX AF with automatic failure detection and automatic restart

Due to the cross-cabling, X-link configurations have the advantage that both disk controllers on both hosts can be used (if distance permits). In other words, in the event of a system failure on the work host, a switchover to the standby host can be made at any time (without failover to the standby disk controllers and time-consuming failback). This option can also be used for test and maintenance purposes. It is also possible to perform failover to the work Symmetrix for maintenance purposes and to continue work on the work host – via the cross-cabling. In terms of high availability, this configuration is superior to the U-link configurations described below. If global storage is used, additional issues must be considered when developing a disaster recovery concept. These are discussed in the next section.

4.1.1 Deployment of HIPLEX AF HIPLEX AF can be used to deliver high availability for applications by switching quickly to the second host if a system fails or important application requirements are not fulfilled. It can also be deployed for automatic disaster recovery, in which case high availability is not compromised by disaster recovery. In this configuration, HIPLEX AF ideally makes use of two shared pubsets which act as monitoring data volumes for watchdog functionality (see section 2.12) and on which the data of the switch units is stored (see section 2.13). This configuration provides the best prerequisites for automatic disaster recovery with HIPLEX AF. A sample generation procedure for a switch unit with failover functionality is shipped with HIPLEX-AF. This procedure also assumes this configuration and is geared to the simplest case of such a configuration with asymmetric disaster recovery and use of just one switch unit (“base configuration”). The course of automatic failover using HIPLEX AF is described in section 5.1.


4.2 X-link disaster recovery configuration with GS A dual GS (two GS units with hardware duplication feature and each with its own battery unit) is needed in order to be able to use global storage in a disaster tolerant architecture to store the write data of critical applications (in DAB GS caches or on GS volumes, see 2.10). In terms of a high-availability configuration, this corresponds to the use of RAID1 disks. The two failsafe GS units (GSUs) must also be geographically separated. Both production and standby host must have access to each of the two GSUs and both BS2000 systems must form an XCS cluster (and, as a result, a “parallel HIPLEX”). GS connection via fibre-optic cable allow distances of 70m ( )2 to be covered so that this configuration may represent only a campus solution.

EMC

Schematic diagram of X-link disaster recovery configuration with GS

Prod. host

EMC EMC

GSU1 GSU2

sources targets

Standby host

Fig. 4-2: X-link disaster recovery configuration with GS As compared to a disaster tolerant architecture without GS, the X-link disaster recovery configuration with GS does not open up any new aspects unless global storage is used for mission-critical data.

GS volumes (in dual GS partitions) can only be used for mission-critical data if logically dependent data (like transaction and logging data) is not spread over GS volumes on the one hand and over storage subsystems on the other hand because this dependent data are not able to form a consistency group (see section 2.11).

If the write data of an application whose associated pubsets reside in a Symmetrix subsystem is buffered using the DAB in GS, the pubsets in the Symmetrix subsystem and the DAB caches in GS must be combined to form a consistency group in order to ensure synchronous disaster protection for the application (all data in GS originating from a DAB write cache resides only temporarily in GS and is displaced from GS once written asynchronously by the DAB to the associated Symmetrix volumes). The need for this consistency group is explained as follows. In this configuration there are two conceivable failure variants in which data inconsistencies could result (either writing to target units continues whilst GSU2 becomes obsolete, or writing to GSU2 continues and the target units become obsolete):

1. The SRDF fails and the applications continue to run, then the entire production DC fails. In this case, data

inconsistencies can arise in the standby DC if data continues to be written back to the source units from the GS caches of the DAB (but no longer to the targets unit due to the failed SRDF link).

2. The production host link to GSU2 fails and the applications continue to run, then the entire production DC fails. As a result, the data on the target units would be more current than the data in GSU2.

2 up to a maximum of 280m is possible but this must be clarified, on request, on a case-by-case basis


The above considerations show that the attributes of a consistency group for application data are not necessarily present in a parallel HIPLEX using DAB write caches in GS for files in a Symmetrix SRDF cluster. In case 1 above, domino mode is available in the Symmetrix system. This prevents further IO operations on an SRDF unit as soon as one of three components (source unit, target unit or SRDF link) fails. To prevent inconsistencies in case 2 and to establish a consistency group, domino mode is also required in GS for the DAB caches (see section 2.8.2). More information on DAB domino mode is provided in the DAB User Guide [2]. Domino mode in the DAB causes termination with error of all write IO operations after failure of a GS unit of a dual GS (and thus in the event of a link failure). As in case 1, the application halts or terminates. Once the cause of the error has been ascertained and the risk of a disaster can be excluded, domino mode can be disabled and work can continue temporarily with only one GSU. Whereas SRDF domino mode can be set for each logical volume, DAB domino mode affects all DAB caches in global storage.

As a consequence, both domino mode for SRDF and domino mode for the DAB caches in GS should be used in order to ensure all-round, reliable disaster recovery in a parallel HIPLEX in which GS is deployed as a DAB write cache. Currently, no domino mode is available for GS volumes. X-link configurations which use GS for mission-critical data are therefore characterized by the following:

Synchronous SRDF mirroring with domino mode Current data in the standby data center in the event of a disaster Cross-cabling of mirrored EMC systems, shared pubset cluster (MSCF) “Short” physical distance between production and standby data centers (up to 70m) Downtime after detection of a disaster in the region of approx. 45 min Symmetric disaster tolerant architecture is possible Optional deployment of HIPLEX AF with automatic failure detection and semi-automatic restart Use of DAB write caching in GS with domino mode is possible

As a result of domino mode, interruption of the application must be accepted if SRDF links fail and if the link between the work host and a GSU fails. The same applies if the standby Symmetrix system or the remote GSU fails. In such cases, domino mode must be disabled for SRDF or DAB and, if necessary, the applications must be restarted. If the fault cannot be eliminated quickly, work must continue without domino mode (or, depending on the type of failure, without SRDF mirroring or without dual GS) and the above-mentioned danger of inconsistencies or data loss in the event of a disaster during this period must be accepted. This danger cannot be accepted until the fault has been pinpointed and it is certain that the fault was not part of an emerging disaster. Domino mode excludes the risk of data loss and inconsistencies in rolling disaster scenarios and also forms a consistency group of disks together with GS caches. The probability of failure of such a group is higher than that of an SRDF pair without domino mode and GS cache. Consequently, a reduction of high availability must be taken into account. When weighing up the pros and cons of domino mode, the following basic statements should be considered. 1. Almost every single failure of the relevant Symmetrix data container and global storage leads to termination of the

applications – and thus impairs availability. Because this is generally followed, after analysis by an engineer, by switchover to the standby system, the degree of impairment of availability depends on the length of time needed to perform analysis and to switch systems.

2. The highest safety level applies for maximum performance: no conceivable failure may result in inconsistent data in the standby system and IO bottlenecks are prevented by global storage.

If the impairment of high availability mentioned in 1 above is not acceptable, write caching in GS for mission-critical data must be dispensed with.

4.2.1 Use and mode of operation of HIPLEX AF Domino mode for SRDF mirroring and for the use of DAB in global storage is a prerequisite for the deployment of HIPLEX AF for disaster recovery. Further steps must be implemented in the switch unit to support automatic restart by HIPLEX AF. For example, DAB domino mode must first be disabled on the standby host to permit work to continue with just one GS unit in all circumstances. Then, due to switchover of the DAB caches from the production host to the standby host, each EGC0502 message of the GS manager must be answered automatically when the data pubsets buffered by the DAB in global storage are imported. Automatic restart with failover to the standby Symmetrix by HIPLEX AF is critical for the following reasons. When domino mode is used, failure of the standby Symmetrix, of all SRDF links, of one of the two GS units or of a link between a host and a GS unit always results in application failure. It is the responsibility of the system administrator or service engineer to decide whether failure is limited and will not therefore lead to failure of the entire DC (or only of the work host or of a Symmetrix system at the production site). If it is decided that the fault is limited and domino mode is disabled by means of system administration


command, inconsistent data could be activated in the standby DC if this decision were wrong and HIPLEX AF were to subsequently trigger automatic switchover in the event of a disaster. This somewhat critical issue can be defused by starting a procedure – in the context of the decision whether to continue working after the first failure – which disables domino mode and also ensures that, if further faults occur, HIPLEX AF does not trigger automatic failover to the SRDF target units. If domino mode is used, it is possible to select a semi-automatic disaster recovery concept and to supplement monitoring of the switch units in such a way that a service engineer or system administrator is informed in the event of failure of all SRDF links, of one of the Symmetrix subsystems, or of a GS unit. The engineer or administrator first checks the situation before deciding whether to let the application continue running without domino mode (because the fault has been identified and can be recovered) or whether to initiate failover because a disaster is involved. With some additional effort, and as suggested above, a fully automatic disaster recovery concept with HIPLEX AF resources is feasible. In the failback procedure, a parameter must be set for test purposes. This disables domino mode prior to restart of the SRDF source units and re-enables domino mode at the end of failback because the source units are “disabled” by domino mode in the event of a failure (refer also to section 2.8.2). If a real Symmetrix failure occurs, this parameter is generally set by a service engineer. Similarly, a corresponding command for DAB domino mode can be incorporated in the switch unit.

4.2.2 Special characteristics of failover and failback Failover takes place as described in section 5. As soon as the target units in the standby Symmetrix are activated, the data pubsets can be imported and the data in GSU2 still to be written back on disk is automatically written to the target units by the DAB during import. Pending transfers for GS-cached private disks are performed during first-time disk allocation. From a caching perspective, no additional measures are required. If domino mode is used, it must be disabled in the DAB before the target units are allocated. Domino mode for Symmetrix need only be disabled for failover if the work Symmetrix is still running, e.g. if switchover is made for test purposes.

4.3 U-link disaster recovery configuration Like X-link configurations, U-link configurations comprise a work and a standby data center so configured that if one of the data centers fails totally the other is able to run all applications of both data centers. In other words, the computing power, disk capacity and network connection are almost identical in the two data centers. As described in section 4.1, hardware resources should also be redundant. Here, the term “U-link” means that there are no cross-connections between the production host and the standby Symmetrix nor between the standby host and the work Symmetrix (and therefore, more particular, no shared pubset cluster). Cross-cabling can be dispensed with particularly where long distances are involved over which cross-site application IO operations cannot take place with the required performance (in contrast to SRDF transfers). Due to the absence of cross-connections there are no shared pubsets for HIPLEX AF and the systems can monitor each other mutually only by communicating over the network links of the hosts (BCAM/ MSCF). The option of working with both Symmetrix subsystems from both hosts is not available and, as a result, the high-availability features are somewhat reduced as compared to X-link configurations. In summary, U-link disaster recovery configurations are characterized by the following.

Synchronous SRDF mirroring Current data in the standby data center in the event of a disaster Suitable for distances of 10 km to 200 km between production and standby data center Downtime in the event of a disaster of approx. 30 min WDM technology (or WAN connection) for SRDF links Use of HIPLEX AF with (semi-) automatic failure handling is possible but requires two separate, independent BCAM links (HIPLEX MSCF V3.0)

Configuration can be operated symmetrically Prerequisite: Application hotspots are still executable even with SRDF time delays (see section 3.4.3).


EMCEMC

Prod. host

EMC EMCSynch. SRDFsources targets

WDM WDM

Standby host

DWDM station

WDMDC

Fig. 4-3: U-link disaster recovery configuration

4.3.1 Use and mode of operation of HIPLEX AF Because no shared pubsets are possible in this configuration, only the HIPLEX AF monitoring facility on the work system has access to the data of the switch unit for the applications. For this reason, an additional auxiliary switch unit is defined in the standby system. This monitors the work host and the Symmetrix subsystem only and performs failover for the standby Symmetrix in the event of a disaster. Once the target units have been activated, the data of the production switch unit is available and the unit can be started. Monitoring of systems is carried out here only via the MSCF links and not by means of the watchdog function with shared pubsets (see section 2.12). Consequently, at least two redundant MSCF/BCAM links must be available and HIPLEX MSCF 3.0 must be deployed in order to support automated monitoring. Which criteria trigger (semi) automatic failover must ultimately be decided by the system operator. Definitions of switch units for the applications are stored on a data pubset or on the system pubset in the work Symmetrix which, like the application data, is mirrored using SRDF. Due to the absence of cross-cabling, switchover without failover (i.e. switchover to the standby host without switchover to the standby Symmetrix, refer also to Fig. 5-2) is not possible. Automatic switchover in the event of a system failure (not a disaster) therefore always entails activation of the target units and thus complete failover (and subsequent failback). As this usually involves greater effort than restarting the system, we suggest that preference be given to a semi-automatic mechanism in U-link configurations with HIPLEX AF. In other words, the switch units should be programmed in such a way that a system administrator is informed and failover and switchover are performed after an appropriate prompt. Of course, fully automated disaster recovery is also a possible option. Switchover without failover for test purposes is not an available option in U-link configurations due to the lack of cross-cabling.

4.3.2 Special characteristics of failover and failback There is no shared pubset cluster and therefore no actions of any kind are needed in conjunction with shared pubsets (e.g. no establishment of a shared pubset cluster in the case of manual failback). If HIPLEX AF V3 is used for automatic failure detection and failure handing, the monitoring facility of an auxiliary switch unit on the standby host detects failure of the production host and of the work Symmetrix and performs failover to the standby Symmetrix as described in the previous chapter. The auxiliary switch unit then starts the production switch unit whose data is accessible after failover.


Failback can be automated by means of HIPLEX AF (refer also to chapter 6). It can also be started by an auxiliary switch unit in the work system. In summary, failover and failback with HIPLEX AF are similar, in this configuration, to failover and failback for “synchronous X-link configurations” but the preparations needed are rather more extensive.

4.4 Asynchronous U-link disaster recovery configuration There are, of course, many different variants of asynchronous disaster recovery configurations because their prime characteristic is simply that in the event of a disaster the data available in the standby data center is always consistent but generally not up-to-date. As a result, the variant involving regular tape transport (see section 3.4) falls into the category of asynchronous disaster recovery configurations. The asynchronous U-link configuration presented here corresponds largely to the synchronous U-link configuration in Fig. 4-3. Here, synchronous SRDF is replaced with asynchronous SRDF, known as adaptive copy mode (refer also to section 2.8.1). This enables the potential negative performance impact (see section 3.4.3) of synchronous SRDF mirroring on critical hotspot applications to be avoided. This functionally weaker configuration as compared to synchronous U-link configurations should therefore be given preference only when this is necessary for performance reasons (e.g. very long distances between the two sites). Due to asynchronous SRDF operation, it may be possible to replace the WDM connection by an ATM connection with less bandwidth.

WAN/WDM

EMCEMC

Prod. host

EMC EMCAsynch. SRDF

sources targets

Standby hostDC

WAN/WDM

Fig. 4-4: Asynchronous U-link disaster recovery configuration Because data alignment in asynchronous SRDF operation is not continuous and does not take place in the correct order of the write IO operations, the applications must be halted at regular intervals or the databases (Oracle) must be switched into “backup mode” in order to ensure consistent pubset data on the target units. Once the SRDF targets have been aligned in this application break, and therefore no longer have invalid tracks, this synchronized, consistent state can be simply “frozen” if BCV volumes are used for the target units. The application can be restarted after detaching the BCV mirror volumes from the target units in the standby data center. The duration of such an interruption will be a matter of minutes only. Fig. 4-5 below shows a solution of this kind with multi BCVs and their processing sequence when synchronization points are periodically written. Here, two so-called multi BCVs per target unit are used. BCV 1 and BCV 2 alternately contain the current status of the target unit and the last frozen status. Changeover between BCV 1 and BCV 2 consists of a difference alignment (Symmetrix “Multi BCVs” function) and therefore requires little time. In order to create consistent data at regular intervals, the applications must be halted until the target units and source units are synchronized and the current BCV is detached from the target unit. Mirroring of the target unit with the second BCV is then resumed and the application can continue to run.


Source

BCV 2

Pubset S

Target Asynchr. SRDF

BCV 1

Remote siteLocal site

Multi BCV1. Synchronization point, application interruption2. SRDF synchronization3. Detachment of the current BCV4. Application resumes5. Resumption of copy of second BCV

Shor

t bre

ak

Fig. 4-5: Asynchronous SRDF with multi BCVs In this configuration there is therefore an additional disk requirement (over and above SRDF target units and RAID1 mirrors) of two BCVs per user data disk. However, working with two BCVs has the advantage that the data to be frozen is always available on one of the BCVs. If a second BCV is to be dispensed with, a larger time window is needed because the BCV must be attached, synchronized, and then detached for each alignment. This kind of asynchronous U-link disaster recovery configuration is characterized by the following.

Asynchronous SRDF mirroring (adaptive copy mode) Data at the time of the last BCV ”freeze” is in the standby data center in the event of a disaster Maximum geographic distances without noticeable loss of performance Downtime in the event of a disaster of approx. 30-45 min Use of HIPLEX AF with (semi-) automatic failure handling is possible but requires two separate, independent BCAM links Configuration can be operated symmetrically Additional disk requirements due to multi-mirroring

While EMC offers this type of solution labeled Data Mobility (or SRDF/DM), they offer a further and newer variant SRDF/A of asynchronous disaster recovery configuration – this one requires Symmetrix models of type DMX, posesses the advantages of affording less bandwidth than synchronous SRDF, thus being a budget-priced disaster recovery configuration, and in case of a disaster delivers a far superior up-to-date-ness of the critical customer data compared with the use of the asynchronous configuration described above. Further information can be found in [13]. For consultancy and implementation towards a SRDF/A disaster recovery configuration in a BS2000 environment please contact your FSC Competence Center or one of the authors

4.4.1 Use and mode of operation of HIPLEX AF The mode of operation of HIPLEX AF is similar to that of synchronous U-link configurations. However, we again recommend that automated disaster recovery should not be used. In addition to the reason given in section 4.3.1, there is the fact that, due to asynchronicity, each switchover involves reactivating frozen data in the standby data center and discarding current data of the work data center that might still be readable. Consequently, each switchover entails loss of data. For this reason, automatic failure detection should not be responsible for switchover – which is a major intervention. With asynchronous configurations, it is therefore always appropriate to allow a qualified member of staff to assess the particular situation before failover is performed. HIPLEX AF can, of course, be used as a monitoring instance and to support “failover at the touch of a button” but it is best to refrain from automatic switchover in the event of a disaster.


4.4.2 Special characteristics of failover and failback As also described in section 5 of chapter 5, the SRDF target units are activated when failover takes place. Here it is ascertained whether write jobs are still pending for target units (invalid tracks). If so, and this situation is normally assumed in production operation, multi-mirroring for the target units must first be terminated. In the next step, the (inconsistent) target units are reconstructed with the help of the BCVs backed up at the previous synchronization point. The process is then continued as described in chapter 5. Even when HIPLEX AF is used, these additional steps can be implemented as procedures which are started by the switch unit. In this configuration, failover can be accelerated by continuing to work with the BCVs containing the last consistent backup level and temporarily not using the target units. To do this, the BCVs are attached and the data pubsets are then imported on the standby host. Consequently, the target units and the BCVs last attached remain unused. At an appropriate time, shortly before failback at the latest, operation must be interrupted and a switchover must be made to the target units which have previously been aligned with the data of the current BCVs.

4.5 Combined disaster recovery configurations If an existing, local high-availability configuration is to be extended in order to build a disaster tolerant architecture, it is possible to add a further host and a further Symmetrix system to the configuration at a remote site to be responsible only for disaster recovery. Additional mirroring of user data to the third Symmetrix can then be implemented by intervening BCVs. This configuration is known as a cascaded disaster recovery configuration. Fig. 4-6 shows a configuration of this kind. User data is mirrored using a volume with the associated SRDF target units and BCVs. The variant shown illustrates an asynchronous disaster recovery configuration because synchronization of data with the remote target unit is not performed until the local BCV is detached. In other words, this configuration represents the cascading of a synchronous X-link configuration with an asynchronous U-link configuration.

Host A Host B

Symm 1 Symm 2 Symm 3

WDM

R1 R2

BCV/ R1

R2

BCV

Site I Site II

Host C

SRDF

MSCF

Optional

WDM

SRDF

Optional

Fig. 4-6: Combined disaster recovery configuration Each SRDF target unit is also mirrored by a BCV from which, at defined times, SRDF synchronization is performed on a remote target unit in Symmetrix 3. If a disaster occurs at site I, there is the option of resuming production operation on host C with Symmetrix 3 (with data from the last synchronization point). To guard against the situation in which the Symmetrix systems at site 1 fail during synchronization of Symmetrix 2 and Symmetrix 3 with the result that Symmetrix 3 contains inconsistent data, the target units in Symmetrix 3 can also be backed up by BCVs which must be detached from the target units prior to each synchronization and must then be re-attached afterwards. If BCVs are not used in Symmetrix 3, tape backup must be available instead. The benefits of this configuration variant are as follows.


Definitive high-availability configuration permitting, for example, maintenance work on a shutdown Symmetrix system during production without switching production to resources of a remote standby DC.

No performance loss due to synchronous SRDF over long distances; mirroring to the remote standby data center is asynchronous.

4.6 Configurations with more than one Symmetrix subsystem If more than one Symmetrix subsystem is attached to a production host, the distribution of data over the various subsystems should be planned very carefully to achieve a consistent disaster recovery concept. The problem areas of such a configuration with regard to disaster recovery are obvious. In a rolling disaster, the Symmetrix subsystems could fail at different times. If logically dependent data is distributed over two Symmetrix subsystems, a failure scenario of this kind could lead to inconsistent data – if, for example, the SRDF links of one Symmetrix subsystem to the corresponding target system fail and then the SRDF links of the other Symmetrix subsystem to the corresponding target system fail, and at the same time domino mode is not enabled so that writing to the source units continues but the mirrored Symmetrix subsystems in the standby data center can no longer accept any further write IO operations of the work data center at different times. As a result, the consistency of transactions executed for a database and the associated logging of the transactions in the standby data center could be at risk. To eliminate this risk, either domino mode can be used (for all SRDF pairs with critical data) or alternatively (and without compromising high availability) administrative steps could be taken to ensure that all data of an application is stored together on a single Symmetrix subsystem; in other words, a consistency group concept would be applied (see section 2.11). When it is a question of reacting to the failure of only one Symmetrix subsystem, automatic application switchover via HIPLEX AF is also less complicated if the Symmetrix subsystems are clearly separated. In principle, even X-link disaster recovery configurations with several Symmetrix subsystems per site are manageable using HIPLEX AF.

4.7 Separation of applications by VM2000 To conclude this chapter we would like to point to the potential benefits of using VM2000, also with regard to disaster recovery – regardless of the configuration types discussed here. Because different applications often have different system requirements, it can be very useful to give each production application its own guest system (VM). For purposes of disaster protection, a “mirror VM” is set up for each application in the standby data center and is supplied with the appropriate values for CPU and memory requirements. An application that can therefore run on its own VM in both data centers can be referred to as a switch unit, like switch units in HIPLEX AF, even if HIPLEX AF is not used. The guest system can then run continuously or is not activated unless a disaster occurs (refer also to section 3.6.1). This results in greater clarity in the event of failover. If the host in the second data center is used productively and not only as a test system or exclusively as a standby host, we recommend this approach in order to logically separate applications which would possibly operate with totally different system parameters and different equipment levels. By way of example, Fig. 4-7 shows the distribution of three applications (Finances, Personnel, Warehouse) in a symmetric disaster tolerant architecture. In normal operation the first two run in the work DC and the third in the standby DC (from the perspective of the “Warehouse” application, the work and standby DC are reversed). VMs with gray shading are unused in normal operation or are used only as test systems. If, for maintenance work or to test disaster recovery, only “Personnel” is to be switched, VM2000 ensures that the other two applications are not affected, assuming sufficient CPU power is available on the standby host.


VM4 Warehouse (disaster)

Host name …IP addr. 1.2.3.7Virt. host name …Virt. IP addr. 1.2.3.30CPU: … %MM: … MB

VM4 Warehouse

Host name …IP addr. 1.2.3.17Virt. host name …Virt. IP addr. 1.2.3.30CPU: … %MM: … MB

Work host

VM1 Monitor

Host name …IP addr: 1.2.3.4

CPU: … %MM: … MB

VM3 Personnel

Host name …IP addr. 1.2.3.6Virt. host name …Virt. IP addr.1.2.3.20CPU: … %MM: … MB

VM2Finances


Standby host

VM1 Monitor

Host name …IP addr:1.2.3.14

CPU: … %MM: … MB

VM3 Personnel (disaster)


VM2Finances (disaster)

Host name …IP addr. 1.2.3.15Virt. hostname …Virt. IP addr.1.2.3.10CPU: … %MM: … MB

Fig. 4-7: Application separation using VM2000


5 Failover process Failover is the switching of applications from the production DC to the standby DC in the event of a disaster. Failover is performed when a disaster has occurred; in other words, when operation of the applications in the production DC is no longer possible due to one of the situations named in section 1.1. This is the case if the entire DC fails or can no longer be used but also if only one Symmetrix system containing critical data fails in the production DC and operation of the production host with the mirrored data of a standby Symmetrix is not a viable alternative due to lack of performance (because of the long distance involved).

EMCEMC

Prod. host

EMC EMCSRDFsources targets

( HIPLEX AF )

Standby host

Fig. 5-1: Failover in the event of total failure In the following, we briefly describe the automatic processes performed when HIPLEX AF is used for X-link configurations and the series of actions during manual switchover to the standby DC which would otherwise also be performed by HIPLEX AF. It is, of course, also possible to manually start HIPLEX AF-controlled failover “at the touch of a button” (this failover is therefore based on semi-automatic disaster recovery). The failover and failback processes described below should not be seen as complete technical guidelines. They are based on the project and test experience of the authors and are meant only to give an impression of what is involved.

5.1 Automatic failover with HIPLEX AF If HIPLEX AF is used (recommended primarily for X-link configurations) and is configured for automatic disaster recovery, the HIPLEX monitoring facility on the standby host triggers MSCF reconfiguration and initiates failover to the standby Symmetrix because it has detected Symmetrix failure by means of SHC-OSD messages (refer also to section 2.13). If the entire work Symmetrix system or the entire production DC fails, the following, simplified automatic process is triggered: 1. The SYSRES monitor, part of the monitoring facility of HIPLEX AF that monitors the first disk of the home pubset, detects

disk failure and terminates BS2000 in the work DC.

2. The monitoring facility of HIPLEX AF on the standby system detects Symmetrix failure by means of SHC-OSD messages as well as failure of the work system by means of MSCF messages.

3. The monitoring facility on the standby system activates the target units in the standby Symmetrix. Depending on the definition in the switch unit, either all or a preset list of target units is activated.

4. The monitoring facility on the standby system imports the data pubsets, activates any virtual hosts and starts the applications and their associated resources.

If failure of the BCAM link and failure of the work Symmetrix do not occur at the same time (within a time interval specified by MSCF parameter) or if “fail reconfiguration” with HIPLEX MSCF does not start automatically (detailed description in [4]), MSCF


failure handling must be started manually on the standby host because HIPLEX MSCF must not automatically assume total failure if failures do not occur simultaneously. The HIPLEX AF monitoring facility on the standby host then starts the applications.

5.1.1 More failure scenarios In addition to total failure, further failure scenarios that can also be handled automatically are conceivable. Failure of the work host as illustrated in the figures below is generally handled as shown in Fig. 5-2 if HIPLEX AF is used. The applications are restarted on the standby host but with the source units in the work Symmetrix.

EMCEMC

Prod. host


HIPLEX AF

Standby host

Fig. 5-2: Failure of work host switchover If a configuration over a long distance is involved and IO performance is below desired levels, failover to the standby Symmetrix must be carried out as shown in Fig. 5-3.

EMCEMC

Prod. host


HIPLEX AF

Standby host

Fig. 5-3: Failure of work host failover + switchover


If only the Symmetrix fails in the work DC, the application continues on the standby host, as in Fig. 5-1. BS2000 also ceases to run on the work host in this case. If HIPLEX AF is used, the work system is automatically terminated. Failure of standby host and/or Symmetrix If production applications also run on the standby host with source units in the standby Symmetrix, the roles of both hosts and Symmetrix systems are swapped in the view of the applications. However, the sequence of events that take place when the host and/or the Symmetrix fail in the standby data center is the same. Failure of links If, as already described in section 4.1, all links of the individual components are redundant and are routed differently (i.e. no “single point of failure”), a link failure is handled by the particular driver software and does not represent a failure scenario. Redundant links are recommended for disaster recovery.

5.2 Manual failover In disaster tolerant architectures without HIPLEX AF automation, an emergency plan or procedures described in an emergency procedure manual (see section 3.1.2) are put in place in the event of a disaster and the emergency procedure manual obviously is drawn up specifically for each data center and application. Therefore the description of a manual failover given here covers only the basic steps. These steps are based on synchronous X-link disaster recovery configurations and apply broadly for all configurations in chapter 4. Possible differences are described in the configuration-specific chapters. A prerequisite for non-automated failover is what is known as “disaster declaration”. There should be one or more persons (persons-in-charge of emergencies) who are authorized and competent to decide whether it is necessary to switch applications over to the standby system or whether only a temporary failure of specific hardware components is involved. When events generally recognized as disasters are involved (major fire or flooding), the decision is an easy one but what should be done, for example, in the case of a power outage. Sequence of steps for manual failover 1. Switch off components still running in the work DC. Before restarting an application in the standby DC, it should be ensured

that the work host is not still active in the failed DC. If possible, hosts and possibly also the Symmetrix systems in the work DC should be switched off. If this is not possible because, for example, the DC is not accessible, network connections to the outside world must be interrupted so that the virtual host or application is not available twice in the network after restart in the standby DC.

2. If an MSCF cluster is in place, MSCF reconfiguration must be initiated. In other words, MSCF messages are answered in such a way that the BCAM connection is terminated and, if necessary, a master change is carried out for the shared pubset.

3. If, as described in section 0, virtual hosts and dynamic routing are not used, the network components must now be reconfigured (DNS entries, VLAN assignment or reconfiguration of routing) to permit user access to the work host. However, this is independent of the following steps.

4. Activate the SRDF target units in the standby DC by means of SHC-OSD commands. 5. If a cold standby concept is deployed (in which the home pubsets of the failed systems are also mirrored by SRDF), the

(guest) systems must now be started by means of IPL from the target units. 6. Because the disks of the data pubsets were still occupied by the work system, they must be released by means of an

UNLOCK-DISK command. 7. Import mirrored data pubsets from the target units. 8. Check application resources: The application support staff or system administrator check the availability and completeness

of the data pubsets. 9. Network switchover: Activate the virtual hosts 10. Application restart: Start the applications on the standby host after the above steps. The above steps can, of course, be largely automated and invoked in procedures. How long can a failover last? There is no one answer to this question. In tests, we always reached item 8 within a few minutes. However, how long application restart and database recovery take depends on the type of application and the associated software components as well as on IO load at the time of failure. The period up to declaration of a disaster and shutdown of the resources of the work DC depends on local circumstances and cannot be planned with absolute precision. The downtime of 30-45 min specified in chapter 4 is, however, a good guide.


6 Failback process In contrast to failover, which is a response to an unforeseeable disaster, failback of applications to original resources can be planned. The application interruption caused by failback can be scheduled to reduce associated adverse effects to a minimum. Failback cannot be performed until all failed components and links have been recovered to such an extent that the applications can be transferred back to the work DC. First, an EMC engineer puts the Symmetrix systems in the work DC into a usable SRDF state. All channel connections and all remote links are deactivated, all associated cables are re-connected, and the Symmetrix systems are restarted. Surface tests are then carried out on all disks. The SRDF links are set up but SRDF operation does not yet begin. If the entire DC was damaged, failback commences once all hosts and associated components have been re-cabled and tested, and all network connections have been reestablished and also tested. A test version for each production application should already have run in the recovered DC prior to failback. How long failback takes depends largely on the amount of data involved but the total time needed is considerably more than is required for failover.

6.1 Failback with HIPLEX AF If HIPLEX AF is used, it assumes responsibility for restarting the applications and, before doing this, can also perform failback to the Symmetrix systems in the work DC. In the event of a disaster, failback is not generally carried out fully automatically because activities must be coordinated on the work and standby host and on the Symmetrix subsystems. If HIPLEX AF is used, the system on the standby host will always run with its own home pubset (not on target units of the work system). It need not therefore be shut down. Nevertheless, system pubsets can also be mirrored using SRDF if specific data on these pubsets is required in the event of a disaster. Sequence of basic steps for HIPLEX AF-based failback 1. The applications and associated software resources on the standby host are terminated by means of HIPLEX AF com-

mand. This also exports the data pubsets and, if necessary deactivates virtual hosts.

2. The target units in the standby Symmetrix are prepared for SRDF operation and are therefore deactivated. This can be done by means of HIPLEX AF failback procedure. This is done by the EMC engineer if new Symmetrix systems were installed.

3. If new Symmetrix systems were installed in the work DC, an EMC engineer must now start resynchronization of target and source units. If not, this can be done with the failback procedure of HIPLEX AF. The work system must not already be running when synchronization starts (if necessary, the host remains shut down) because synchronization is from target units to source units and the source units must not be accessed during synchronization.

4. If, as described in section 0, virtual hosts and dynamic routing are not used, the network components must be reconfigured (DNS entries, VLAN assignment or reconfiguration of routing) to ensure user access to the work host.

5. Once synchronization has started for all disks, the (guest) system on the work host can be started. When this is done, the BCAM files are generally started automatically by command file, the MSCF cluster is set up, and HIPLEX AF monitoring is activated.

6. Switch units for the applications can now be started. If necessary, these also activate the virtual hosts.

7. Checking and releasing the applications for use. If necessary, test procedures are started beforehand. The duration of such a failback depends mainly on the volume of data changed in the meantime and on the bandwidth of the SRDF link. In our experience, downtime is from one to a few hours.

6.2 Manual failback The activities associated with manual failback are listed below. Some of these activities can be automated by means of procedures. Activities 1 – 4 should be performed on the standby host. If the standby host is used only in the event of a disaster, the system can then be shut down. Sequence of basic steps for manual failback 1. The application and associated software resources in the standby DC are terminated.

2. The target units in the standby Symmetrix are prepared for SRDF operation and are thus deactivated. This is done by SHC-

OSD command or procedure on the standby host.

3. If virtual hosts are used, they are deactivated in the standby DC.


4. The data pubsets are exported. If the home pubset is also mirrored using SRDF, the system must be shut down after export.

5. If, as described in section 0, virtual hosts and dynamic routing are not used, the network components must be reconfigured (DNS entries, VLAN assignment or reconfiguration of routing) to ensure user access to the work host.

6. If new Symmetrix systems were installed in the work DC, an EMC engineer must now restart resynchronization of target and source units. If not, this can be done by SHC-OSD commands or procedures.

7. Once synchronization has started for all disks, the (guest) system on the work host can be started and then the BCAM files.

8. If a HIPLEX-MSCF cluster is used, this can now be reestablished.

9. The data pubsets are re-attached. If the disks are still occupied by the SHC-OSD of the standby system (by action 2 above), it may be necessary to issue an UNLOCK-DISK command for the source units before the pubsets can be imported.

10. A member of application support staff performs a standard check of the status of the applications and associated resources such as databases and files (logging and, if necessary, manual closing of files).

11. If virtual hosts are used, they are re-activated in the work DC.

12. The applications can be started on the work host. Corresponding to this White Paper there is a succeeding White Paper [14] which quite analogously to the disaster recovery concepts described here gives an overview of the possibilities for building up disaster recovery configurations with business servers in the SX series. Especially the new aspects arising from the different HW platform, from the coexistence of the two operating systems BS2000 and Solaris and last not least from the storage systems of type FibreCAT are highlighted.


7 Related publications and online references [1] User Guide “SHC-OSD V4.0A Symmetrix Host Component” U41000-JZ125-4-76

http://manuals.ts.fujitsu.com/servers/bs2_man/man_us/mscf/ v3_0/mscf.pdf January 2003

[2] User Guide “DAB V8.0 Disk Access Buffer” U2431-J-Z125-13-76 http://manuals.ts.fujitsu.com/servers/bs2_man/ man_us/dab/v8_0/dab.pdf March 2002

[3] Product Manual “HIPLEX AF V3.0 – High-Availability of Applications in BS2000/OSD” U24401-J-Z125-3-76 http://manuals.ts.fujitsu.com/servers/bs2_man/man_us/ hiplexaf/v3_0/hiplexaf.pdf July 2002

[4] User Guide “HIPLEX MSCF V3.0 – BS2000 Processor Networks” U3615-JZ125-8-76 http://manuals.ts.fujitsu.com/servers/bs2_man/man_us/mscf/ v3_0/mscf.pdf March 2002

[5] HIPLEX: The BS2000/OSD Cluster PDF document http://extranet.ts.fujitsu.com/products/bs2000/media/pf/ product-facts.html

[6] IT Baseline Protection Manual / Standard Security Safeguardshttp://www.bsi.bund.de/gsh/english/menue.htm Bundesamt für Sicherheit in der Informationstechnik (German Information Security Agency)

[7] Wavelength Division Multiplex Sebastian Groller, paper for seminar, computer and operating systemshttp://www.uni-weimar.de/~grolla/docs/wdm/index.html

[8] Introduction to SAN Distance Solutions IBM Redbook SG24-6408-00, J.Tate, R.Khattar, KW.Lee, S.RichardsonJanuary 2002 http://www.redbooks.ibm.com/

[9] Understanding the Performance Characteristics of Synchronous Remote CopyDr. H. Pat Artis, Performance Associates, Inc. 1998 http://www.perfassoc.com/publishedpapers.html

[10]Storage connections over long distances Fujitsu White Paper, Simon Kastenmüller

[11]EMC Support Matrix http://www.emc.com/interoperability/index.jsp

[12]DIBs, Data Buffers, and Distance: Understanding the Performance Characteristics of ESCON LinksDr. H. Pat Artis, Performance Associates, Inc. 1998 http://www.perfassoc.com/publishedpapers.html

[13]Global Recovery Demonstration: SRDF/A and PRIMECLUSTERhttp://www.emc.com/pdf/news/h1063_srdfa_primecluster_ldv.pdfFebruar 2004

[14] FSC White Paper: Disaster Recovery Concepts for SX servers PDF document http://extranet.ts.fujitsu.com/products/bs2000/media/wp/ white-paper.html

Delivery subject to availability, specifications subject to change without notice, correction of errorsand omissions excepted. All conditions quoted (TCs) are recommended cost prices in EURO excl. VAT (unless statedotherwise in the text). All hardware and software names used are brand names and/or trademarksof their respective holders.

Copyright © Fujitsu, 01/2006

Published by department:

Dr. Michael Wester Phone: ++49 (0)89 636 47561Fax: ++49 (0)89 636 49974 [email protected] http://ts.fujitsu.com/bs2000

Extranet:

http://extranet.ts.fujitsu.com/bs2doc

http://manuals.fujitsu-siemens.com/servers/bs2_man/man_de/shc_osd/v3_0/shc_osd.pdf

http://manuals.fujitsu-siemens.com/servers/bs2_man/man_us/dab/v8_0/dab.pdf

http://manuals.fujitsu-siemens.com/servers/bs2_man/man_us/hiplexaf/v3_0/hiplexaf.pdf

http://manuals.fujitsu-siemens.com/servers/bs2_man/man_us/mscf/v3_0/mscf.pdf

http://extranet.fujitsu-siemens.com/products/bs2000/media/pf/product-facts.html

http://extranet.fujitsu-siemens.com/products/bs2000/media/pf/product-facts.html

http://www.bsi.bund.de/gsh/english/menue.htm

http://www.uni-weimar.de/%7Egrolla/docs/wdm/index.html

http://www.redbooks.ibm.com/

http://www.perfassoc.com/publishedpapers.html

http://www.emc.com/interoperability/index.jsp

http://www.perfassoc.com/publishedpapers.html

http://www.emc.com/pdf/news/h1063_srdfa_primecluster_ldv.pdf

http://extranet.fujitsu-siemens.com/products/bs2000/media/wp/white-paper.html

mailto:[email protected]

http://www.fujitsu-siemens.com/bs2000

http://extranet.fujitsu-siemens.com/bs2doc

Date post:	15-May-2018
Category:	Documents
Upload:	vandan
View:	224 times
Download:	1 times

Disaster Recovery, Concepts in BS2000/OSD -...

Documents