Mittwoch, 1. September 2010 HV Stufe 3: GDPS · 2010-09-03 · Performance is in Internal...

© 2010 IBM Corporation

Hanseatic Mainframe Summit 2010Mittwoch, 1. September 2010

HV Stufe 3: GDPS

Martin Sö[email protected] – 734 42 32


Trademarks

The following are trademarks of the International B usiness Machines Corporation in the United States a nd/or other countries.

The following are trademarks or registered trademar ks of other companies.

* Registered trademarks of IBM Corporation

* All other products may be trademarks or registered trademarks of their respective companies.Notes : Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

IBM*IBM (logo)*Ibm.com*AIX*DB2*DS6000DS8000Dynamic Infrastructure*

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.InfiniBand is a trademark and service mark of the InfiniBand Trade Association.Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.

ESCON*FlashCopy*GDPS*HyperSwapIBM*IBM logo*Parallel Sysplex*POWER5

Redbooks*Sysplex Timer*System p*System z*Tivoli*z/OS*z/VM*


� Addresses Planned / Unplanned Hardware / Software Outages

� Flexible, Non-disruptive Growth- Capacity beyond largest CPC- Scales better than SMPs

� Dynamic Workload/Resource Management

� Built In Redundancy� Capacity Upgrade on

Demand� Capacity Backup� Hot Pluggable I/O

1 to 32 Systems

Single System Parallel Sysplex

12 1

2

3

4567

8

9

10

11

GDPS

Site 1 Site 2

1212

34

56

789

1011 12

12

345

67

89

1011

� Addresses Site Failure and Site Maintenance

� Disk / Tape Remote Copy- Eliminates SPOF- No / Some Data Loss

� Application Independent

System z Continuous Availability


Agenda

� Why GDPS ?

� GDPS Family of Offerings

– Continuous Availability of Data within a Data Center

– Continuous Availability (CA) / Disaster Recovery (DR) in Metropolitan Region

– DR at Extended Distance

– CA Regionally and DR Out-of-Region

� Extensions to Heterogeneous Platforms

� GDPS Customer Experiences

� Summary

Note: slide deck has more slides than what will be presented – for your reference


Vision For a Business Continuity Solution

Business Continuity…

� Provides both Continuous Availability and Disaster Recovery for heterogeneous distributed IT business applications

• "Recover my business rather than my platform technology“

� Ensures successful recovery via automated processes

• Pre-canned, pre-tested

• Can be handled by less-skilled operators

� Allows management on the “business impact” level

• Granular

• Customer flexibility on desired RTO, RPO quality of service

� Leverages existing customer/IBM investments and technology

System z is the Business Continuity management hub


Typical planned outages:� Backups� PTF and OS installs� Application maintenance� Hardware/Software upgrades

Typical unplanned, non-disaster outages:� Application failure� Operator error� Local power outages� Network failure� Hardware failure

Typical disaster outages:� Outages that are caused by natural

disasters or other catastrophes that damage the production facilities beyond usability (examples – fire, flood, earthquake, bombing)

� Outages that require a recovery procedure at an off-site location

� Failure of regional power grid � Outages not caused by computer

hardware or software defects

� A disaster is a rare event that most customers will never experience but must plan for

Non-Disaster Events Disaster Events

Market Need for Business Continuity


Aspects of IT Business Continuity

IT BusinessContinuity

High Availability �A resilient IT infrastructure that masks individual (or single) component failures

�Infrastructure continues to provide access to applications

�Often provided by resilient HW

Continuous Operations�No need to take applications down to do ongoing IT procedures

- Scheduled backups- Planned maintenance

�Ability to keep continuous access to applications when everything is working properly

Disaster Recovery�Ability to recover from unplanned outages at a different site

- Usually on different hardware

�Performed after something has gone wrong on a site-wide basis

Operations continue after a disaster Costs are predictable and manageable

Protection of critical business data Recovery is predictable and reliable


Evolution into an Enterprise Wide Solution

Enterprise wide solution

IBM-AIX, Sun-Solaris, HP-UX, Linux, Windows

Site 1

GDOC

IP NetworkIP Network

DiskDisk DiskDisk

Distributed Systems

Site 2+

Single Windows or Unix Server

Site 1

DiskDisk

)

Site 1

Cluster

DiskDisk

IP Network

Parallel Sysplex

Site 1

GDPS

Site 2+

System z

SingleSystem z

Site 1Site 1

12 1

456

1011 12 1

2345 678

910

11

12 1

456

1011 12 12345678

91011

Site 1 Site 2+ Site 2+Site 1

Disk Disk

GDPS = Geographically Dispersed Parallel Sysplex GDOC = Geographically Dispersed Open Clusters

single serversingle server clustersclusters multi-site clustersmulti-site clusters

end-to-endmulti-site clusters

end-to-endmulti-site clusters


What is GDPS?

� Integrated / Automated solution

� Manages application and data availability in and across sites– Monitors systems, disk & tape subsystems

– Manages planned and unplanned activities

� System/disk maintenance / failure

� Site maintenance / failure

� Builds on proven high availability technologies– Clustering

– Remote copy (disk and tape)

– Automation

� Easy to use interface– Panel/GUI interface– Policy based commands


PPRC, XRC and GM Overview

z/OS Global Mirror (XRC)� Asynchronous remote data mirroring� Unlimited distance support� System z data� System Data Mover (SDM) provides data

consistency

z/OS Global Mirror(XRC)

1 2

SDM

3 4

Metro Mirror (PPRC)� Synchronous remote data mirroring� Metropolitan Distance� System z and distributed data� GDPS/PPRC provides data consistency

Metro Mirror (PPRC)1 4

2

3

Sys zSys zz/OSz/OS

UNIXUNIXWinWin

Global Mirror� Asynchronous remote data mirroring� Unlimited distance support� System z and distributed data� Global Mirror provides data consistency

UNIXUNIXWinWin

Global Mirror

4

1 2

FCA

B6

C

3

5

Sys zSys zz/OSz/OS


The GDPS® family of offeringsThe right level of business continuity protection f or your business

GDPS: an end-to-end continuous availability and disaster recovery solution:

� Automated recovery removes people as Single Point of Failure

� A single point of control for heterogeneous data across enterprise

Continuous Availability /

Disaster Recovery Metropolitan Region

Continuous Availability

Regionally and Disaster Recovery Extended Distance

Continuous Availability of Data

Within a Data Center

Disaster Recovery

at ExtendedDistance

Single Data Center Applications remain

active

Near-continuous availability to data

GDPS/PPRC HM

Two Data Centers Systems remain active

Automated D/R across site or storage failure

No data loss

GDPS/PPRC HMGDPS/PPRC

Two Data Centers

Automated Disaster Recovery

“seconds” of Data Loss

GDPS/GMGDPS/XRC

Three Data Centers

Data availabilityNo data loss

Extended distances

GDPS/MGMGDPS/MzGM


Best D/R practice is blend tiers of solutions in or der to maximize applicationcoverage at lowest possible cost . One size, one te chnology, or one

methodology does not fit all applications.

Mission Critical Applications

Not so Critical Applications

Value

Tiers of Disaster Recovery: Level Setting GDPS

SomewhatCritical

Applications

Time to Recover

15 min 1-4 hr 4-8 8-12 12-16 24 hr days

Point-in-Time Backup

Active Secondary Site

Tier 1 - PTAM

Tier 2 - PTAM, Hot Site

Tier 3 - Electronic Vaulting

Tier 4 - Batch/Online database shadowing & journaling, repetitive PiT copies, fuzzy copy disk mirroring

Tier 5 - software two site, two phase commit (transaction integrity); or repetitive PiT copies with small data loss

Tier 6 - Near zero or zero Data Loss remote disk mirroring helping with data integrity and data consistency

Tier 7 - Near zero or zero Data Loss: Highly automat ed takeover on a complex-wide or business-wide basis,

using remote disk mirroring

Dedicated Remote Hot Site

GDPS/PPRC HyperSwap ManagerRTO depends on customer automation;

RPO 0

GDPS/XRC GDPS/GMRTO < 1 hr; RPO secs

GDPS/PPRC RTO < 1 hr; RPO 0

� RTO does not include decision time

� Variable Subsystem specific recovery time

Tiers based on Share Group 1992PTAM = Pickup Truck Access Method

RTO = Recovery Time Objective - how long being without service?RPO = Recovery Point Objective - how much data to recreate?


Agenda

� Why GDPS ?

� GDPS Family of Offerings– Continuous Availability of Data within a Data Center






� Summary


VTAM TCP/IP

Network

z/OS Parallel Sysplex – High Availability

Comm Server

TOR

DB2

Comm Server

TOR

DB2

AORAOR

CFs

CPSM CPSM

� Removes SPOF1) of

–Server–LPAR–Subsystems

� Planned and Unplanned Outages

� Single System Image

� Dynamic Session Balancing

� Dynamic Transaction Routing

� Disk is SPOF

Designed for application availability of 99.999%

1) Single Point of Failure

Shared

z/OS-A z/OS-B


Disk Mirroring

� Replicated disk

–Synchronous remote copy� Provides second copy of data

However …

� Does this guarantee usability of second copy?

� How long to switch?

VTAM TCP/IP

Network

Comm Server

TOR

DB2

Comm Server

TOR

DB2

AORAOR

CPSM CPSM

Primary Secondary

Metro Mirror (PPRC)

CFs

z/OS-A z/OS-B

Demand for rapiddatabase availability


F R

O Z

E N

Systems can be ‘restarted‘using this data. Recovery

not needed

Need for Time Consistency

� Start of one write may be time dependent on the completion of a previous write

– Database & log– Index & data components– Time sequence could be exposed

� GDPS automation ensures consistency– Any number of primary subsystems

� Consistency enables Restart instead of Recovery

� Even if second copy can be trusted, disk switch is disruptive for the entire workload

[1] or [1,2] or [1,2,3] is OK

but [1,3] is NOT OK

Recovery Measured in hours or daysRestore Image Copy tapes

Apply log changes

Restart Measured in minutesStandard restart process

Protection against mirroring failures

SLOG

PLOG

PDB

SDB[2] DB update

[1] Log update[3] Mark DB update complete


ApplicationApplication

UCB UCB

HyperSwap – the Technology

� Extends Parallel Sysplex availability to disk subsystems

� Substitutes Metro Mirror (PPRC) secondary for primary device

– Automatic – No operator interaction– Fast – Can swap large number of devices– Non-disruptive – applications keep running– Includes volumes with Sysres, page DS, catalogs

� Disk no longer a single point of failurePPRCSP

UCB = Unit Control Block

Comprehensive application and data availability solution


� The new primary volumes (at the remote site) records changes while in failover mode

� The original mode of the volumes at the local site is preserved as it was when the failover was initiated.

� Only need to resynchronize from time of failover, not entire volume

Faster ResynchronizationLess Resource Consuming

Facilitates D/R testing in production environmentFaster resynchronizationLess resource consuming

PPRC Failover / Failback (FO/FB)

CR = Change Recording

Failover(PPRC suspended)

CR

Failback(PPRC full duplex )

Freeze


GDPS - Freeze Policy Options

FREEZE & GO

� Freeze secondary disk configuration

� Allow applications to continue– Optimize for remote restartability– Least impact on application availability– May lose data in case of real disaster

FREEZE & STOP


� Stop all z/OS images– Optimize for remote restartability– Impacts application availability– No data loss on primary site disaster

SWAP, [GO | STOP]

� If Swap trigger and HyperSwap enabled– Swap primary / secondary disks– If swap cannot complete, STOP

� Else (Freeze trigger)– If GO option: Freeze & Go– If STOP option: Freeze & Stop

SP

PPRC

P S

PPRC

HyperSwap Freeze

FREEZE & COND


� Determine reason for suspend– If secondary HW problem then Freeze

& Go– Other reason: Freeze & Stop


GDPS – Freeze policy

� GO - Continue running with primary disks– Risk of data loss if mirroring failure was signal of true rolling disaster– High availability if mirroring failure was false trigger

� STOP – Stop all systems before further data is written– The only way to guarantee zero data loss– Risk of availability loss if mirroring failure was a false alert


Planned Disk Reconfiguration

Freeze, PPRC Failover, swap the primary & secondary PPRC UCBs, systems continue

15 Seconds (6,545 volume pairs across 46 LSS)

shutdown systems, remove systems from Sysplex, terminate & reverse/suspend PPRC, restart systems

PLANNED ACTION

1-2 hrs(approx)

Without HyperSwap

With HyperSwap

CDS_p/a

CDS_p CDS_ad u p l e x d u p l e x

s u s p e n d e d s u s p e n d e d

P P P S S SL L

P P P LL S S S


CDS_p/a

CDS_p CDS_ad u p l e x d u p l e x

(s u s p e n d e d) s u s p e n d e d

P P P S S SL L

P P P LL

Freeze, PPRC Failover, swap the primary & secondary PPRC UCBs, systems continue

13 Seconds (6,545 volume pairs across 46 LSS)

TAKEOVERPROMPT

TAKEOVER STARTEDassess time

remove systems from Sysplex, recover secondary PPRC volumes, restart systems

Freeze, systems quiesced(if Freeze&Stop)

30–60 min or more(assess time not included)

Without HyperSwap

With HyperSwap

Unplanned Disk Reconfiguration


Planned and unplanned exceptional conditions

GDPS/PPRC: a Continuous Availabilty and/or Disaster Recovery Solution- Metropolitan Distance

SITE 1

NETWORK

SITE 2NETWORK

112

2

3

4567

8

9

10

11

11 2

2

3

4

56

7

8

9

1 0

1 1

� Manages Multi-Site Parallel Sysplex, Processors, CBU, CF, Couple Data Sets

� Manages Disk RC (System z & open LUN)� Manages Tape Remote Copy (PtPVTS)� Exploits Hyperswap & FlashCopy Function� Automated, customized procedures to

execute planned and unplanned actions (z/OS, CF, disk, tape, site)

� Improves availability of heterogeneous System z business operations


GDPS/PPRC active/standby – Singe Site WorkloadNear Continuous Availability Solution

Continuous access to data via HyperSwapHighly automated site failover in the event of cata strophic systems,

multiple components or data center failure

site-1P

CDSK

SCDS K

GRS, XCF;DB2 Lock,

SCA, GBP-p; XCF,DB2 GBP-s

BTS

CFSysplex/GDPS

Standby Production

SystemsPTS/CTS

CFSysplex/GDPS

Active Production

Systems

site-2

up to 100 km

(K) CMC

(with RPQup to 200 km)

CF signal latency impactdue to the distanceUnplanned & planned

HyperSwapRPO 0 sec

KCMC

CBUOOCoD

RTO < 1h; systems and applications

(failed site-1 workload)need to be restarted on secondary disks


Continuous access to data via HyperSwapHighly automated site failover – failing site-1 oper ating systems

and applications need to recycled

site-1P

CDSK1

SCDS K2

GRS, XCF;DB2 Lock,

SCA, GBP-s; XCF,DB2 GBP-p

BTS

CFSysplex/GDPS

Active Production

SystemsPTS/CTS

CFSysplex/GDPS

Active Production

Systems

site-2

K1 CMC

CF signal latency impactdue to the distanceUnplanned & planned

HyperSwapRPO 0 sec

K2CMC

CBUOOCoD

GDPS/PPRC active/active – Multi-Site WorkloadConfiguration with most potential for continuous availability

RTO couple min ; site-2 continues to run and the failed site-1 workload

needs to be restarted

10-20 km


site-2

BTS(Arbiter)

PTS/CTS

P

CDSK1

S

K2CDS

PPRC

DB2 Lock, SCA, GBP-s|p;

XCF, ...

site-1

CF CFCF Duplexing

All critical data must be PPRCed and HyperSwap enabledAll critical CF structures must be duplexed

Applications must be parallel sysplex enabled

Time reference must not be lost

DB2 GBP-p|s; GRS, XCF, ...

Failures in one site do not cause applicationsrunning in the other site to fail

� RTO = 0; the surviving site continues to run and the failed site's workload needs to be restarted to release shared resources

� RPO = 0; no data loss

� Potential application impact due to CF signal latency (access rate, distance)

Cross-site Sysplex – GDPS/PPRC active / active- GDPS CA Model

P3...K1 P1 ...P4P2 K2


Agenda

� Why GDPS ?







� Summary


GDPS/XRCLong distance failover for System z only

GDPS/XRC manages: XRC failover automation

FlashCopy, CBU, PtP VTS

Note: needed "insurance copies" are not shown for clarity sake

XRC manages secondary consistency across any number

of primary subsystems

All writes time-stamped and sorted before committed to

secondary devices

Production Site

XRC Secondary

z/OS Global Mirror (XRC)

Recovery Site

Journals

SDM systems

XRC Primary

� Manage XRC� Manage SDMs� Manage FlashCopy� Virtually unlimited distance

� Once initiated, totally automated failover– Recovery of secondary disks– Activation of CBU– Shut down SDM / discretionary LPARs– Reconfiguration of the recovery site servers– Restart of production systems in the recovery site

SUSE Linux Enterprise

Server GDPS/XRC

Mono, Base or Parallel Sysplex


GDPS/GM

� Production site can have single z/OS systems, Open systems, systems in a Sysplex

� All data (z/OS and Open Systems) is planned to be mirrored using Global Mirror

� K-sys activities– manages multiple Global

Mirror sessions– sends device information,

scripts, alerts to R-sys

� R-sys activities: – secondary disk recovery,

CBU activation, activate backup LPARs, IPLs systems

� Time consistent data at the recovery site

GDPS/Global MirrorLong distance failover for System z and Open

zP3zP2zP1

zP5zP4

zP6

O7

Blue Sysplex

Red Sysplex

Non Sysplex

O8Open

Systems

Global Mirror over Unlimited Distance

P P P

P PL

Production Site Recovery Site

Backup

SSS

S S L

z/OS and Open Systems sharing disk subsystem

z/OS and Open Systems sharing disk subsystem

Bandwidth is determined by RPO

K sys

Rsys

Discretionary

CBU

Recover

Netview Communication


Agenda

� Why GDPS ?








� Summary


Topologies for GDPS 3-site solutions

Three physical sites Two physical (3 logical) sitesCampus & remote site

A GDPS 3-site solution combines both GDPS/PPRC or GDPS/PPRC H Mand one of GDPS/XRC or GDPS/GM

� The most expensive option

� Mostly considered by companies who already own two data centers within a synchronous distances

� User access network and resilient replication connectivity are large portion of the cost

� A relatively common approach

� Some companies effectively have this topology with site1 and site2 in the same building on two different floors

� With HyperSwap this provides the ability to transparently handle disk subsystem failures

� Provides a compromise between two synchronous copies in a single machine room and two distinct sites

� Cost of user and replication networking can be reduced while still keeping a level of resilience


GDPS/MzGM and GDPS/MGM

MzGM Metro Mirror + z/OS Global Mirror

MGMMetro Mirror + Global Mirror

A B

C

A B

C

GDPS is providing similar functionalities for both of the solutions

Multi-target Cascading


Connectivity for SDMs to Metro Mirror secondary devices in case of primary

site failure

A

FA

z/OS Metro Global Mirror topologies

B

z/OS Global Mirror

Mandatory

Recommended

Optional

FB

SDM

SDMMetroMirror

C

FC

Metro Mirror = PPRC, z/OS Global Mirror = XRC

Additional copy in intermediate site for

testing and resynchronization

protection Ability to run SDMs in either local or intermediate site for

return after disaster

Additional copy for testing disaster recovery

in remote site

Additional copy in local site if

symmetrical configuration is

required

SDM now enabled for zIIP

Local (site-1)

Remote (site-R)

Intermediate (site-2)


GDPS/MzGM – multi-target: A � B, A � C


GDPS/PPRC designed to provide continuous availability in the event of catastrophic campus

site failures involving multiple components

� No data loss� System z only� Peak bandwidth (no RPO impact)� B to C network connectivity required for IR

(incremental resynch)� Mitigates system logger overhead (XRC+)� If A fails, A restarted in B & reconfiguration

needed to restore DR� If B fails, no reconfiguration needed to

restore DR

A B

C

GDPS/XRC capability provides out of region data and application resilience and

protection against a local site disaster


GDPS/MzGM configuration

K1P1VM1

site-R

Incremental Resync

� Optional: CFs / Prod systems in Site2

K1

Metro Mirror (PPRC)

z/OS Global Mirror(XRC)

CF

P2 K2VM2

site-1

site-2

CF1

CF2

K2

A

C

KxSDM

F

Backup

Mandatory Recommended

B

SDM Kx

SNA

� No data loss; System z only� Very tight multi-vendor integration E2E

� B to C network connectivity required for IR� Coordinated DR� Common consistency E2E

CKD

ETRor

STP

zGM (XRC)


Connectivity from local to remote site for cases of intermediate site failure or switch of

roles of local and intermediate

A

FA

Metro Global Mirror topologies

B

Global Mirror

Mandatory

Recommended

Optional

FB

MetroMirror

C

FC

Additional copy in intermediate site for testing and resynchronization protection and for

reverse GM from remote site

Additional copy for testing disaster

recovery in remote site

Additional copy in local site if

symmetrical configuration is

required D

Local (site-1)

Intermediate (site-2)

Remote (site-R)


GDPS/MGM – cascading: A � B � C


� No data loss� System z & open� Scalable bandwidth (trade-off RPO)� A to C network connectivity required for IR

(incremental resynch)� If A fails, A restarted in B and DR

maintained� If B fails, reconfig needed to restore DR

A B

C

GDPS/GM capability provides out of region data and application resilience and

protection against a local site disaster

GDPS/PPRC designed to provide continuous availability in the event of catastrophic campus

site failures involving multiple components


Non-z

site-1

site-2 site-R

Global Mirror

� Optional: CFs / Prod systems in Site2� Non-z: UNIX®, Linux, Linux on z, Windows®

K1

Metro Mirror (PPRC)

Incremental Resync

CF3

CF1Non-

z

(P2)

P1K1

(Kg)

CF2Non-

zP2(P1)

K2

K2

Non-z AKg

C

Non-z

D

RNon-

zP2P1

R

F

Backup

Mandatory Recommended

KgIP

Non-z B

� No data loss; System z & open � A to C network connectivity required for IR

(Incremental Resynch)� If A fails, HyperSwap to B and DR maintained� If B fails, Incremental Resynch A � C and DR

position maintained

FlashCopy

Global MirrorETRor

STP

GDPS/MGM configuration


Agenda

� Why GDPS ?








� Summary


CKD

FBA

FBA

FBA

CKD

FBA

FBA

FBA

Primary Secondary

CKD

FBA

FBA

FBA

CKD

FBA

FBA

FBA

Metro Mirror

over FC

Global Mirror

z/OS

GDPS API / TSO WINTEL / UNIX

ethernet

Open LUN Management

� Data consistency across z/OS and open systems

� GDPS single point of control

� Supports cross-platform or platform level Freeze– Suspends reported through

SNMP alert

� GDPS/PPRC, GDPS/PPRC HM and GDPS/GM

Helps provide enterprise-wide Disaster Recovery with Data Consistency


GDPS/PPRC Multiplatform Resiliency for System z (xDR)- Architectural Building Blocks

GDPS

SA MP

System z Hardware

SA zOS

NetView

z/OSLinux on z

z/VM

Business Continuity solution for z/OS and Linux applications on System z

� Leverage of existing and proven solutions

– GDPS

– SA z/OS

– IBM Tivoli System Automation for Multiplatform (SA MP)

� Coordinated cross platform business resiliency for operating systems (OS) running on System z hardware

� Integration point of z/OS and Linux on System z

Linux on System z can also run nativelyin its own partition


xDR Guest Linux Clusters

NetViewEAS TCP/IP

z/OS

NetView

z/VM

SA MPLinux

ProxyNode

SAPSA MPLinux

MasterNode

SAPSA MPLinux

Node

SAPSA MPLinux

Node

TC

P/IP

Con

nect

ions

� Multiple z/VM systems supported– z/VM guests running Linux & SA MP

� Proxy Guest used by GDPS to communicate commands to z/VM and to monitor for disk errors

– Is a cluster of just one node

� A second cluster made up of multiple nodes

– One node will be the master

– All guests run an application workload, e.g., SAP

� TCP/IP connections between the Linux guests and Netview Event Automation Address Space (EAS) are used to inform GDPS of

– Their existence

– Their ongoing presence, and

– Any change in their status Clusters

SA zOS

GDPS


CF1

K1

z/OSz/VMLinux

P1

z/OS

K2

z/OS

P2

z/OS

SITE 1 SITE 2CF2

S S LPL PL

L

k-sys

LP

LS

Native Linux

VMLinux Guests

Native Linux

Linux Guests

CDSCDS

LL

k-sys backup

[native Linuxswap disks]

Linux GuestProxy

GDPS automates

Linux startup

CBU

Planned & Unplanned

HyperSwap

Requires Tivoli System Automation for Multiplatform

Coordinated near-continuous availability and DR sol ution

� GDPS support for SLES (zVM guest or native) and RHEL (zVM guest)

GDPS/xDR – outage use cases


Distributed Cluster Management (DCM)

� “End-to-end recovery solution”– Provides management and coordination of

• Planned and unplanned outages

• System z and distributed servers using clustering solutions

– Helps optimize operations across heterogeneous platforms– Helps meet enterprise-level RTO and RPO

� DCM function added to GDPS– Main management code runs in GDPS – GDPS DCM agent code runs on one of the distributed servers in each cluster

� DCM Support for– Veritas Cluster Server (VCS)– Tivoli System Automation Application Manager (SA AppMan)

� Highly recommend IBM Geographically Dispersed Open Clusters (GDOC) services

Integrated, Automated, Industry-unique


GDPS (PPRC/XRC/GM) DCM for VCS

DR at extended distance

Rapid systems recoverywith only “seconds” of data loss

DR at extended distance

Rapid systems recoverywith only ‘seconds” of data loss

CA / DR within a metropolitan region

Two data centers - systems remain active; designed for no data loss

GDPS/PPRC

GDPSmanaged

PPRC

VCS GCO

VCS managed replication

VCS andGDPS DCM

Agent

VCS

GDPS/XRC

GDPSmanaged

XRC


VCS andGDPS DCM

Agent

VCS andGDPS DCM

AgentVCS GCO

VCSVCS GCO


GDPSmanaged

GM

GDPS/GM

VCS andGDPS DCM

Agent

VCS

K

RKSDMK1

K2

site-2si

te-1

site-2

site

-1

site

-1

site-2


Agenda

� Why GDPS ?







� Summary


10

Logical Subsystems

(LSS)

8 sec8 sec5 sec (*)730 pairs

Unplanned HyperSwap

UIT

Planned HS SUSPEND

UIT

Planned HS RESYNCH

UIT

Number PPRC Volumes

16-way Parallel Sysplex – active/active(CICS, DB2/IMS, Websphere)

Business Requirements :No data loss (RPO = 0)

Dynamic disk switching without system and/or application outage

HyperSwap UIT < 60 seconds for planned and < 30 seconds for unplanned disk reconfigurations

UIT = User Impact Time (seconds)

Hamburg, Germany

CF

site-2site-1

... z/OSP2n-1

z/OSK1

L

CF...z/OSP2n

z/OSK2

L10 km

n=1 to 7

P P SSIBM

GDPS Freeze Policy

SWAP,GO

z/OS 1.10, NV 1.4, SA 3.1, GDPS HM 3.5

(*) GDPS controlling system only10/2008

BTSPTS/CTS

Web site: www.signal-iduna .de

GDPS/PPRC HyperSwap Manager- HyperSwap Experience


SP

PTS BTS

ARZ GDPS/PPRC – active / active- HyperSwap Experience

10-way Parallel Sysplex(CICS, DB2)

20 sec

Planned HSSuspend

UIT

32

Logical Subsystems

(LSS)

122 sec

Planned HS Resynch

UIT

4,679

PPRCVolume Pairs

16 sec

Unplanned HyperSwap

UIT

Business Requirements� No loss of committed data (RPO = 0)

and continuous data availability

� Supported site maintenance without application outage

� No more than 10 minutes disruptionin the event of catastrophic systemsor data center failure (RTO < 10 min)

HDS

Innsbruck, Austria

ICFz/OS(n=4)

z/OSk-sys (*)

ICF z/OSk-sys (*)

z/OS(n=4)

(*) GDPS Controlling Systems

UIT = User Impact Time (seconds)

Web site: www.arz.at

GDPS Freeze Policy

SWAP,STOPHyperSwap

CF Duplexing

2 KM

site-1 site-2

12/2009

z/OS 1.10, NV 5.3, SA 3.2, GDPS 3.5

TS7700

IBM

TS7700


S LS

Web site: www.postbank .de

10-way Parallel Sysplex (CICS, SAP/DB2)

GDPS/PPRC – active / active (CF in 3rd site)- HyperSwap Experience

Bonn, Germany

Business Requirements� No loss of committed data

(RPO = 0)

� Ability to recover catastrophic logical site failures involving multiple components (RTO < 1 hour)

� Support of site maintenance without application outage

� Creating a FlashCopy from DB2 volumes once the day using DB2 Backup System Utility and Remote Pair FC

� Dumping FC volumes to tape as secondary backup

16

Logical Subsystems

(LSS)

8 sec 1)

Planned HS RESYNCH

1,611

PPRCVolume Pairs

26 sec 1)

Unplanned HS (site failure)

site-1 site-X site-2

PL

Campus (3 logical sites)HyperSwap

PTS

ICFSysplex/GDPS

K1

CF(Arbiter)

BTS

ICF Sysplex/GDPS

K2

12/2009 1) User Impact Time (seconds)

P

Remote Pair FlashCopy

(planned 2010)

IBM


36

z/OSLSS

4-5 sec17( 11 | 6 )

578( 374 | 204 )

1,224

Planned HS SUSPEND UIT

z/VMLSS

z/VMPPRC Pairs

z/OSPPRC Pairs

10-way zOS Parallel Sysplex (CICS, DB2, Websphere) & two zVM Clusters

Business Requirements :No data loss (RPO = 0)Continuous data availability for

z/OS and Linux hosted by z/VMCoordinated disaster recovery for

heterogeneous System z applications (RTO < 1 hour) UIT = User Impact Time (seconds)

Nuremberg, Germany

site-2site-1

z/OS 1.11, NV 5.3, SA 3.2, GDPS 3.6, z/VM 5.3

11/2009

GDPS/PPRC xDR – active / active - HyperSwap Experience

S SS

CF1z/VM1

Linux guestszOS Sysplex

CF2z/VM2

Linux guestszOS Sysplex

1.2 km

BTSPTS/CTS

HDS P PP

planned & unplannedHyperSwap

K1 CMCSys

K2CMCSys

K1local K2 p2p1 local

p2p1proxyproxy

(p2)(p1)

Web site: www.sdv -it.de

17 15


Web site: www.ubs .com

CFs

HDSIBM

GDPS/PPRC – active / active- Simulated Site Failure Experience

Zurich, Switzerland

5 min 05 sec

3 min 35 sec

--

Simulatedsite-1 failure

RTO (*)

33 sec

14 sec

6 sec

Unplanned HyperSwap UIT

20 sec956 pairs6-way

41 sec6,303 pairs(64 LSS, 344TB)

14-way

11 sec162 pairs 6-way

Planned HS SUSPEND

UIT

Number PPRC Volumes

GDPSplex

site-2

site-1

Hyp

erS

wap

PP

zOS(k-sys)

zOS(n=6)

L

L

PS

CFszOS(k-sys)

zOS(n=6)

10 k

m (

DW

DM

)

Business Requirements :No loss of committed data (RPO 0 sec)Few minutes service impact in the event of

catastrophic systems, multiple components or data center failure (RTO couple minutes)

Single component maintenance or failure without application outage

Operational Requirement :Recovery (RTO) without any manual

intervention

14-way Parallel Sysplex(CICS/DB2, SAP, Websphere MQ)

UIT = User Impact Time, (*) Service Impact Time Middleware Recovery7/2008


Web site: www.ubs .com

GDPS/MzGM Experience- Simulated Regional Disaster

Parallel Sysplex/GDPS Model(CICS/DB2, SAP, Websphere MQ)

CFsGDPS/PPRC

B

K2

IBM

CFsGDPS/PPRC

A

K1

site-2

site-1

6,600 KM

z/OS Global Mirror

(XRC)Metro Mirror

(PPRC)10 KM

Incremental Resync

(planned 1Q’10)

HDS

CFGDPS/XRC

Kx SDM

Note: needed "insurance copies" and backup LPARs are not

shown for clarity sake

site-R

USA

Switzerland

Business Requirements :RPO < 20 min (for regional events)

RTO < 4h failover time to the recovery site in the event of a regional disaster (wide scale disruption)

Operational Requirement :Recovery (RTO) with two manual

interventions only – FlashCopy and XRC Recover followed by IPLing backup systems

3-5 sec

3-5 sec

Average Data in Flight

(RPO)

45 min

45 min

SimulatedRegional D/R

(RTO)

7 min

7 min

FC & XRC Recover Time

196 pairs201 pairs2 LSS

2-way

402 pairs465 pairs

5 LSS4-way

Number XRC Volumes

Number PPRC Volumes

GDPS /PPRC

11/2009

F C

FC

Sys

Sys

Zurich, Switzerland


GO

STOP

FreezePolicy

SWAP,xxx

4 CKD

3 CKD2 FB

Logical Subsystems

(LSS)

3

4

GDPS/PPRCSystems

< 60 min 131 sec< 5 sec 248 CKD

31 FB3369 CKD

31 FB

131 sec

Global Mirror Recovery

< 5 sec

AverageData in Flight(remote RPO)

< 60 min

Simulated Regional D/R and Failover

to site-R

267 CKD

Global MirrorVolumes

3294 CKD

Planned HS Suspend

UIT

PPRCVolumes

1/2010Kp = Controlling System (PPRC) Kg = Controlling System (GM) Rg = Recovery System (GM)

Zurich, Switzerland

GDPS/MGM Experience- HyperSwap and Global Mirror

Parallel Sysplex/GDPS Model(CICS, DB2, MQ)

B

IBM

zOSSysplexCF

A

site-2

site-1

60 KM

Metro Mirror (PPRC)

8 KM

Global Mirror (GM)

Backup LPARs

Note: needed "insurance copies" and system infrastructure

volumes not shown for clarity sake

site-R

Business Requirements :Local RPO 0 sec (no data loss)Local RTO < 1h failover time in

event of catastrophic systems, multiple components or site failure

Single component maintenance or failure without application outage

Remote RPO < 10 min (data loss), disk recover time couple minutes

Remote RTO < 2h failover time for regional D/R events

D C

FC

Rg

Kp

Kg

zOSSysplexCF

Kp

Kg

Local

Local

Web site: www.six-group .com

Rg

z/OS 1.10, NV 5.3, SA 3.2, GDPS 3.6


Agenda

� Why GDPS ?








� Summary


GDPS Value Proposition

Customer Focus

Support

� +530 GDPS Licenses installed in 38 countries worldwide

� Proven technology, automated, and repeatable result

� Complete implementation by experienced consultants

� GDPS supports industry accepted, open replication architectures (PPRC, XRC, GM, and FC)

� Architectures licensed by all enterprise storage vendors

� (new) GDPS qualification program (IBM and Hitachi)

Open IndustryStandards

Product Maturity

The Ultimate Availability Solution

Customer Acceptance

InvestmentProtection

� Easily upgradeable

� Common code base for each product

� Generally availablesince 1998

� Suite of products� E2E capability� Several years of Sys z

production experience� CA/DR best of breed� Continually enhanced

� GDPS Design Council

� Synergy with IBM development labs

� Incorporates several IBM patents

� Dedicated dev & solution test lab

� New V.R every year

Commitment� Fully supported

via standard IBM support structure

� Fixes through normal Sys z channels

ValueVision

Experience

”Using the GDPS/PPRC HyperSwap technology is a significant step forward in achieving continuous availability. The benefits in our GDPS environments are that planned switches of the disk

configuration took 13-33 seconds without application outage. The user impact time of unplanned disk reconfigurations was 9-16 seconds; with 16 seconds to swap a configuration of over 4,600

PPRC volume pairs. Without HyperSwap planned and unplanned reconfigurations had resulted into a service outage of almost two hours in our Sysplex/GDPS with 10 systems."

Wolfgang Dungl, Manager of Availability, Capacity and Performance ManagementWolfgang Schott, GDPS Project Manager iT-AUSTRIA


Grow with GDPS

GDPS/MzGMCedacri S.p.A.

SecetiUBS

ICBCGaranti Bank

Royal Bank of Canada

Wells Fargo

CommerzbankGDPS HyperSwap Manager / zGM

More than 530 licenses in 38 countriesDozens of references

GDPS installed in many of the largest banks in the world

Introduced in 1998Over 10 years of experience

American ExpressGDPS/XRC

Barclays BankKey BankPrincipal Financial GroupRegions Financial Corp.Sun Trust Bank

Credit AgricoleIntesa Sanpaolo

BPVN

GDPS/MGMBaloise

Six Group

Signal IdunaGDPS HyperSwap Manager

dm-drogerie markt

Generali Informatik Services

La CaixaCentral Bank of TurkeyCredit SuisseDanske BankDeere & CompanyFinanz Informatik

Bancaja

Banca Popolare di MilanoBank of MontrealDeutsche Bank

BankinterBRZ

ARZGDPS/PPRC

Halifax Bank of ScotlandMonte Paschi di SienaiT-Austria

and more ….

PostbankRoyal Bank of Scotland

Svenska HandelsbankenToronto Dominion BankUBS

Sparda Bank (SDV)

GAD


Summary

HyperSwapHyperSwapVendorIndependent

VendorIndependent

HeterogeneousHeterogeneous

Remote CopyManagement

Remote CopyManagement

Planned Actions

Planned Actions

3-site3-site

SkillTransfer

SkillTransfer

GDPSGDPSGDPS

AutomatedRecovery

AutomatedRecovery

Freedom of choice

Date post:	14-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Mittwoch, 1. September 2010 HV Stufe 3: GDPS · 2010-09-03 · Performance is in Internal...

Documents