© 2010 IBM Corporation
Hanseatic Mainframe Summit 2010Mittwoch, 1. September 2010
HV Stufe 3: GDPS
Martin Sö[email protected] – 734 42 32
© 2010 IBM Corporation
Trademarks
The following are trademarks of the International B usiness Machines Corporation in the United States a nd/or other countries.
The following are trademarks or registered trademar ks of other companies.
* Registered trademarks of IBM Corporation
* All other products may be trademarks or registered trademarks of their respective companies.Notes : Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
IBM*IBM (logo)*Ibm.com*AIX*DB2*DS6000DS8000Dynamic Infrastructure*
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license there from. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.InfiniBand is a trademark and service mark of the InfiniBand Trade Association.Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
ESCON*FlashCopy*GDPS*HyperSwapIBM*IBM logo*Parallel Sysplex*POWER5
Redbooks*Sysplex Timer*System p*System z*Tivoli*z/OS*z/VM*
© 2010 IBM Corporation
� Addresses Planned / Unplanned Hardware / Software Outages
� Flexible, Non-disruptive Growth- Capacity beyond largest CPC- Scales better than SMPs
� Dynamic Workload/Resource Management
� Built In Redundancy� Capacity Upgrade on
Demand� Capacity Backup� Hot Pluggable I/O
1 to 32 Systems
Single System Parallel Sysplex
12 1
2
3
4567
8
9
10
11
GDPS
Site 1 Site 2
1212
34
56
789
1011 12
12
345
67
89
1011
� Addresses Site Failure and Site Maintenance
� Disk / Tape Remote Copy- Eliminates SPOF- No / Some Data Loss
� Application Independent
System z Continuous Availability
© 2010 IBM Corporation
Agenda
� Why GDPS ?
� GDPS Family of Offerings
– Continuous Availability of Data within a Data Center
– Continuous Availability (CA) / Disaster Recovery (DR) in Metropolitan Region
– DR at Extended Distance
– CA Regionally and DR Out-of-Region
� Extensions to Heterogeneous Platforms
� GDPS Customer Experiences
� Summary
Note: slide deck has more slides than what will be presented – for your reference
© 2010 IBM Corporation
Vision For a Business Continuity Solution
Business Continuity…
� Provides both Continuous Availability and Disaster Recovery for heterogeneous distributed IT business applications
• "Recover my business rather than my platform technology“
� Ensures successful recovery via automated processes
• Pre-canned, pre-tested
• Can be handled by less-skilled operators
� Allows management on the “business impact” level
• Granular
• Customer flexibility on desired RTO, RPO quality of service
� Leverages existing customer/IBM investments and technology
System z is the Business Continuity management hub
© 2010 IBM Corporation
Typical planned outages:� Backups� PTF and OS installs� Application maintenance� Hardware/Software upgrades
Typical unplanned, non-disaster outages:� Application failure� Operator error� Local power outages� Network failure� Hardware failure
Typical disaster outages:� Outages that are caused by natural
disasters or other catastrophes that damage the production facilities beyond usability (examples – fire, flood, earthquake, bombing)
� Outages that require a recovery procedure at an off-site location
� Failure of regional power grid � Outages not caused by computer
hardware or software defects
� A disaster is a rare event that most customers will never experience but must plan for
Non-Disaster Events Disaster Events
Market Need for Business Continuity
© 2010 IBM Corporation
Aspects of IT Business Continuity
IT BusinessContinuity
High Availability �A resilient IT infrastructure that masks individual (or single) component failures
�Infrastructure continues to provide access to applications
�Often provided by resilient HW
Continuous Operations�No need to take applications down to do ongoing IT procedures
- Scheduled backups- Planned maintenance
�Ability to keep continuous access to applications when everything is working properly
Disaster Recovery�Ability to recover from unplanned outages at a different site
- Usually on different hardware
�Performed after something has gone wrong on a site-wide basis
Operations continue after a disaster Costs are predictable and manageable
Protection of critical business data Recovery is predictable and reliable
© 2010 IBM Corporation
Evolution into an Enterprise Wide Solution
Enterprise wide solution
IBM-AIX, Sun-Solaris, HP-UX, Linux, Windows
Site 1
GDOC
IP NetworkIP Network
DiskDisk DiskDisk
Distributed Systems
Site 2+
Single Windows or Unix Server
Site 1
DiskDisk
)
Site 1
Cluster
DiskDisk
IP Network
Parallel Sysplex
Site 1
GDPS
Site 2+
System z
SingleSystem z
Site 1Site 1
12 1
456
1011 12 1
2345 678
910
11
12 1
456
1011 12 12345678
91011
Site 1 Site 2+ Site 2+Site 1
Disk Disk
GDPS = Geographically Dispersed Parallel Sysplex GDOC = Geographically Dispersed Open Clusters
single serversingle server clustersclusters multi-site clustersmulti-site clusters
end-to-endmulti-site clusters
end-to-endmulti-site clusters
© 2010 IBM Corporation
What is GDPS?
� Integrated / Automated solution
� Manages application and data availability in and across sites– Monitors systems, disk & tape subsystems
– Manages planned and unplanned activities
� System/disk maintenance / failure
� Site maintenance / failure
� Builds on proven high availability technologies– Clustering
– Remote copy (disk and tape)
– Automation
� Easy to use interface– Panel/GUI interface– Policy based commands
© 2010 IBM Corporation
PPRC, XRC and GM Overview
z/OS Global Mirror (XRC)� Asynchronous remote data mirroring� Unlimited distance support� System z data� System Data Mover (SDM) provides data
consistency
z/OS Global Mirror(XRC)
1 2
SDM
3 4
Metro Mirror (PPRC)� Synchronous remote data mirroring� Metropolitan Distance� System z and distributed data� GDPS/PPRC provides data consistency
Metro Mirror (PPRC)1 4
2
3
Sys zSys zz/OSz/OS
UNIXUNIXWinWin
Global Mirror� Asynchronous remote data mirroring� Unlimited distance support� System z and distributed data� Global Mirror provides data consistency
UNIXUNIXWinWin
Global Mirror
4
1 2
FCA
B6
C
3
5
Sys zSys zz/OSz/OS
© 2010 IBM Corporation
The GDPS® family of offeringsThe right level of business continuity protection f or your business
GDPS: an end-to-end continuous availability and disaster recovery solution:
� Automated recovery removes people as Single Point of Failure
� A single point of control for heterogeneous data across enterprise
Continuous Availability /
Disaster Recovery Metropolitan Region
Continuous Availability
Regionally and Disaster Recovery Extended Distance
Continuous Availability of Data
Within a Data Center
Disaster Recovery
at ExtendedDistance
Single Data Center Applications remain
active
Near-continuous availability to data
GDPS/PPRC HM
Two Data Centers Systems remain active
Automated D/R across site or storage failure
No data loss
GDPS/PPRC HMGDPS/PPRC
Two Data Centers
Automated Disaster Recovery
“seconds” of Data Loss
GDPS/GMGDPS/XRC
Three Data Centers
Data availabilityNo data loss
Extended distances
GDPS/MGMGDPS/MzGM
© 2010 IBM Corporation
Best D/R practice is blend tiers of solutions in or der to maximize applicationcoverage at lowest possible cost . One size, one te chnology, or one
methodology does not fit all applications.
Mission Critical Applications
Not so Critical Applications
Value
Tiers of Disaster Recovery: Level Setting GDPS
SomewhatCritical
Applications
Time to Recover
15 min 1-4 hr 4-8 8-12 12-16 24 hr days
Point-in-Time Backup
Active Secondary Site
Tier 1 - PTAM
Tier 2 - PTAM, Hot Site
Tier 3 - Electronic Vaulting
Tier 4 - Batch/Online database shadowing & journaling, repetitive PiT copies, fuzzy copy disk mirroring
Tier 5 - software two site, two phase commit (transaction integrity); or repetitive PiT copies with small data loss
Tier 6 - Near zero or zero Data Loss remote disk mirroring helping with data integrity and data consistency
Tier 7 - Near zero or zero Data Loss: Highly automat ed takeover on a complex-wide or business-wide basis,
using remote disk mirroring
Dedicated Remote Hot Site
GDPS/PPRC HyperSwap ManagerRTO depends on customer automation;
RPO 0
GDPS/XRC GDPS/GMRTO < 1 hr; RPO secs
GDPS/PPRC RTO < 1 hr; RPO 0
� RTO does not include decision time
� Variable Subsystem specific recovery time
Tiers based on Share Group 1992PTAM = Pickup Truck Access Method
RTO = Recovery Time Objective - how long being without service?RPO = Recovery Point Objective - how much data to recreate?
© 2010 IBM Corporation
Agenda
� Why GDPS ?
� GDPS Family of Offerings– Continuous Availability of Data within a Data Center
– Continuous Availability (CA) / Disaster Recovery (DR) in Metropolitan Region
– DR at Extended Distance
– CA Regionally and DR Out-of-Region
� Extensions to Heterogeneous Platforms
� GDPS Customer Experiences
� Summary
© 2010 IBM Corporation
VTAM TCP/IP
Network
z/OS Parallel Sysplex – High Availability
Comm Server
TOR
DB2
Comm Server
TOR
DB2
AORAOR
CFs
CPSM CPSM
� Removes SPOF1) of
–Server–LPAR–Subsystems
� Planned and Unplanned Outages
� Single System Image
� Dynamic Session Balancing
� Dynamic Transaction Routing
� Disk is SPOF
Designed for application availability of 99.999%
1) Single Point of Failure
Shared
z/OS-A z/OS-B
© 2010 IBM Corporation
Disk Mirroring
� Replicated disk
–Synchronous remote copy� Provides second copy of data
However …
� Does this guarantee usability of second copy?
� How long to switch?
VTAM TCP/IP
Network
Comm Server
TOR
DB2
Comm Server
TOR
DB2
AORAOR
CPSM CPSM
Primary Secondary
Metro Mirror (PPRC)
CFs
z/OS-A z/OS-B
Demand for rapiddatabase availability
© 2010 IBM Corporation
F R
O Z
E N
Systems can be ‘restarted‘using this data. Recovery
not needed
Need for Time Consistency
� Start of one write may be time dependent on the completion of a previous write
– Database & log– Index & data components– Time sequence could be exposed
� GDPS automation ensures consistency– Any number of primary subsystems
� Consistency enables Restart instead of Recovery
� Even if second copy can be trusted, disk switch is disruptive for the entire workload
[1] or [1,2] or [1,2,3] is OK
but [1,3] is NOT OK
Recovery Measured in hours or daysRestore Image Copy tapes
Apply log changes
Restart Measured in minutesStandard restart process
Protection against mirroring failures
SLOG
PLOG
PDB
SDB[2] DB update
[1] Log update[3] Mark DB update complete
© 2010 IBM Corporation
ApplicationApplication
UCB UCB
HyperSwap – the Technology
� Extends Parallel Sysplex availability to disk subsystems
� Substitutes Metro Mirror (PPRC) secondary for primary device
– Automatic – No operator interaction– Fast – Can swap large number of devices– Non-disruptive – applications keep running– Includes volumes with Sysres, page DS, catalogs
� Disk no longer a single point of failurePPRCSP
UCB = Unit Control Block
Comprehensive application and data availability solution
© 2010 IBM Corporation
� The new primary volumes (at the remote site) records changes while in failover mode
� The original mode of the volumes at the local site is preserved as it was when the failover was initiated.
� Only need to resynchronize from time of failover, not entire volume
Faster ResynchronizationLess Resource Consuming
Facilitates D/R testing in production environmentFaster resynchronizationLess resource consuming
PPRC Failover / Failback (FO/FB)
CR = Change Recording
Failover(PPRC suspended)
CR
Failback(PPRC full duplex )
Freeze
© 2010 IBM Corporation
GDPS - Freeze Policy Options
FREEZE & GO
� Freeze secondary disk configuration
� Allow applications to continue– Optimize for remote restartability– Least impact on application availability– May lose data in case of real disaster
FREEZE & STOP
� Freeze secondary disk configuration
� Stop all z/OS images– Optimize for remote restartability– Impacts application availability– No data loss on primary site disaster
SWAP, [GO | STOP]
� If Swap trigger and HyperSwap enabled– Swap primary / secondary disks– If swap cannot complete, STOP
� Else (Freeze trigger)– If GO option: Freeze & Go– If STOP option: Freeze & Stop
SP
PPRC
P S
PPRC
HyperSwap Freeze
FREEZE & COND
� Freeze secondary disk configuration
� Determine reason for suspend– If secondary HW problem then Freeze
& Go– Other reason: Freeze & Stop
© 2010 IBM Corporation
GDPS – Freeze policy
� GO - Continue running with primary disks– Risk of data loss if mirroring failure was signal of true rolling disaster– High availability if mirroring failure was false trigger
� STOP – Stop all systems before further data is written– The only way to guarantee zero data loss– Risk of availability loss if mirroring failure was a false alert
© 2010 IBM Corporation
Planned Disk Reconfiguration
Freeze, PPRC Failover, swap the primary & secondary PPRC UCBs, systems continue
15 Seconds (6,545 volume pairs across 46 LSS)
shutdown systems, remove systems from Sysplex, terminate & reverse/suspend PPRC, restart systems
PLANNED ACTION
1-2 hrs(approx)
Without HyperSwap
With HyperSwap
CDS_p/a
CDS_p CDS_ad u p l e x d u p l e x
s u s p e n d e d s u s p e n d e d
P P P S S SL L
P P P LL S S S
© 2010 IBM Corporation
CDS_p/a
CDS_p CDS_ad u p l e x d u p l e x
(s u s p e n d e d) s u s p e n d e d
P P P S S SL L
P P P LL
Freeze, PPRC Failover, swap the primary & secondary PPRC UCBs, systems continue
13 Seconds (6,545 volume pairs across 46 LSS)
TAKEOVERPROMPT
TAKEOVER STARTEDassess time
remove systems from Sysplex, recover secondary PPRC volumes, restart systems
Freeze, systems quiesced(if Freeze&Stop)
30–60 min or more(assess time not included)
Without HyperSwap
With HyperSwap
Unplanned Disk Reconfiguration
© 2010 IBM Corporation
Planned and unplanned exceptional conditions
GDPS/PPRC: a Continuous Availabilty and/or Disaster Recovery Solution- Metropolitan Distance
SITE 1
NETWORK
SITE 2NETWORK
112
2
3
4567
8
9
10
11
11 2
2
3
4
56
7
8
9
1 0
1 1
� Manages Multi-Site Parallel Sysplex, Processors, CBU, CF, Couple Data Sets
� Manages Disk RC (System z & open LUN)� Manages Tape Remote Copy (PtPVTS)� Exploits Hyperswap & FlashCopy Function� Automated, customized procedures to
execute planned and unplanned actions (z/OS, CF, disk, tape, site)
� Improves availability of heterogeneous System z business operations
© 2010 IBM Corporation
GDPS/PPRC active/standby – Singe Site WorkloadNear Continuous Availability Solution
Continuous access to data via HyperSwapHighly automated site failover in the event of cata strophic systems,
multiple components or data center failure
site-1P
CDSK
SCDS K
GRS, XCF;DB2 Lock,
SCA, GBP-p; XCF,DB2 GBP-s
BTS
CFSysplex/GDPS
Standby Production
SystemsPTS/CTS
CFSysplex/GDPS
Active Production
Systems
site-2
up to 100 km
(K) CMC
(with RPQup to 200 km)
CF signal latency impactdue to the distanceUnplanned & planned
HyperSwapRPO 0 sec
KCMC
CBUOOCoD
RTO < 1h; systems and applications
(failed site-1 workload)need to be restarted on secondary disks
© 2010 IBM Corporation
Continuous access to data via HyperSwapHighly automated site failover – failing site-1 oper ating systems
and applications need to recycled
site-1P
CDSK1
SCDS K2
GRS, XCF;DB2 Lock,
SCA, GBP-s; XCF,DB2 GBP-p
BTS
CFSysplex/GDPS
Active Production
SystemsPTS/CTS
CFSysplex/GDPS
Active Production
Systems
site-2
K1 CMC
CF signal latency impactdue to the distanceUnplanned & planned
HyperSwapRPO 0 sec
K2CMC
CBUOOCoD
GDPS/PPRC active/active – Multi-Site WorkloadConfiguration with most potential for continuous availability
RTO couple min ; site-2 continues to run and the failed site-1 workload
needs to be restarted
10-20 km
© 2010 IBM Corporation
site-2
BTS(Arbiter)
PTS/CTS
P
CDSK1
S
K2CDS
PPRC
DB2 Lock, SCA, GBP-s|p;
XCF, ...
site-1
CF CFCF Duplexing
All critical data must be PPRCed and HyperSwap enabledAll critical CF structures must be duplexed
Applications must be parallel sysplex enabled
Time reference must not be lost
DB2 GBP-p|s; GRS, XCF, ...
Failures in one site do not cause applicationsrunning in the other site to fail
� RTO = 0; the surviving site continues to run and the failed site's workload needs to be restarted to release shared resources
� RPO = 0; no data loss
� Potential application impact due to CF signal latency (access rate, distance)
Cross-site Sysplex – GDPS/PPRC active / active- GDPS CA Model
P3...K1 P1 ...P4P2 K2
© 2010 IBM Corporation
Agenda
� Why GDPS ?
� GDPS Family of Offerings– Continuous Availability of Data within a Data Center
– Continuous Availability (CA) / Disaster Recovery (DR) in Metropolitan Region
– DR at Extended Distance
– CA Regionally and DR Out-of-Region
� Extensions to Heterogeneous Platforms
� GDPS Customer Experiences
� Summary
© 2010 IBM Corporation
GDPS/XRCLong distance failover for System z only
GDPS/XRC manages: XRC failover automation
FlashCopy, CBU, PtP VTS
Note: needed "insurance copies" are not shown for clarity sake
XRC manages secondary consistency across any number
of primary subsystems
All writes time-stamped and sorted before committed to
secondary devices
Production Site
XRC Secondary
z/OS Global Mirror (XRC)
Recovery Site
Journals
SDM systems
XRC Primary
� Manage XRC� Manage SDMs� Manage FlashCopy� Virtually unlimited distance
� Once initiated, totally automated failover– Recovery of secondary disks– Activation of CBU– Shut down SDM / discretionary LPARs– Reconfiguration of the recovery site servers– Restart of production systems in the recovery site
SUSE Linux Enterprise
Server GDPS/XRC
Mono, Base or Parallel Sysplex
© 2010 IBM Corporation
GDPS/GM
� Production site can have single z/OS systems, Open systems, systems in a Sysplex
� All data (z/OS and Open Systems) is planned to be mirrored using Global Mirror
� K-sys activities– manages multiple Global
Mirror sessions– sends device information,
scripts, alerts to R-sys
� R-sys activities: – secondary disk recovery,
CBU activation, activate backup LPARs, IPLs systems
� Time consistent data at the recovery site
GDPS/Global MirrorLong distance failover for System z and Open
zP3zP2zP1
zP5zP4
zP6
O7
Blue Sysplex
Red Sysplex
Non Sysplex
O8Open
Systems
Global Mirror over Unlimited Distance
P P P
P PL
Production Site Recovery Site
Backup
SSS
S S L
z/OS and Open Systems sharing disk subsystem
z/OS and Open Systems sharing disk subsystem
Bandwidth is determined by RPO
K sys
Rsys
Discretionary
CBU
Recover
Netview Communication
© 2010 IBM Corporation
Agenda
� Why GDPS ?
� GDPS Family of Offerings
– Continuous Availability of Data within a Data Center
– Continuous Availability (CA) / Disaster Recovery (DR) in Metropolitan Region
– DR at Extended Distance
– CA Regionally and DR Out-of-Region
� Extensions to Heterogeneous Platforms
� GDPS Customer Experiences
� Summary
© 2010 IBM Corporation
Topologies for GDPS 3-site solutions
Three physical sites Two physical (3 logical) sitesCampus & remote site
A GDPS 3-site solution combines both GDPS/PPRC or GDPS/PPRC H Mand one of GDPS/XRC or GDPS/GM
� The most expensive option
� Mostly considered by companies who already own two data centers within a synchronous distances
� User access network and resilient replication connectivity are large portion of the cost
� A relatively common approach
� Some companies effectively have this topology with site1 and site2 in the same building on two different floors
� With HyperSwap this provides the ability to transparently handle disk subsystem failures
� Provides a compromise between two synchronous copies in a single machine room and two distinct sites
� Cost of user and replication networking can be reduced while still keeping a level of resilience
© 2010 IBM Corporation
GDPS/MzGM and GDPS/MGM
MzGM Metro Mirror + z/OS Global Mirror
MGMMetro Mirror + Global Mirror
A B
C
A B
C
GDPS is providing similar functionalities for both of the solutions
Multi-target Cascading
© 2010 IBM Corporation
Connectivity for SDMs to Metro Mirror secondary devices in case of primary
site failure
A
FA
z/OS Metro Global Mirror topologies
B
z/OS Global Mirror
Mandatory
Recommended
Optional
FB
SDM
SDMMetroMirror
C
FC
Metro Mirror = PPRC, z/OS Global Mirror = XRC
Additional copy in intermediate site for
testing and resynchronization
protection Ability to run SDMs in either local or intermediate site for
return after disaster
Additional copy for testing disaster recovery
in remote site
Additional copy in local site if
symmetrical configuration is
required
SDM now enabled for zIIP
Local (site-1)
Remote (site-R)
Intermediate (site-2)
© 2010 IBM Corporation
GDPS/MzGM – multi-target: A � B, A � C
Metro Mirror = PPRC, z/OS Global Mirror = XRC
GDPS/PPRC designed to provide continuous availability in the event of catastrophic campus
site failures involving multiple components
� No data loss� System z only� Peak bandwidth (no RPO impact)� B to C network connectivity required for IR
(incremental resynch)� Mitigates system logger overhead (XRC+)� If A fails, A restarted in B & reconfiguration
needed to restore DR� If B fails, no reconfiguration needed to
restore DR
A B
C
GDPS/XRC capability provides out of region data and application resilience and
protection against a local site disaster
© 2010 IBM Corporation
GDPS/MzGM configuration
K1P1VM1
site-R
Incremental Resync
� Optional: CFs / Prod systems in Site2
K1
Metro Mirror (PPRC)
z/OS Global Mirror(XRC)
CF
P2 K2VM2
site-1
site-2
CF1
CF2
K2
A
C
KxSDM
F
Backup
Mandatory Recommended
B
SDM Kx
SNA
� No data loss; System z only� Very tight multi-vendor integration E2E
� B to C network connectivity required for IR� Coordinated DR� Common consistency E2E
CKD
ETRor
STP
zGM (XRC)
© 2010 IBM Corporation
Connectivity from local to remote site for cases of intermediate site failure or switch of
roles of local and intermediate
A
FA
Metro Global Mirror topologies
B
Global Mirror
Mandatory
Recommended
Optional
FB
MetroMirror
C
FC
Additional copy in intermediate site for testing and resynchronization protection and for
reverse GM from remote site
Additional copy for testing disaster
recovery in remote site
Additional copy in local site if
symmetrical configuration is
required D
Local (site-1)
Intermediate (site-2)
Remote (site-R)
© 2010 IBM Corporation
GDPS/MGM – cascading: A � B � C
Metro Mirror = PPRC, z/OS Global Mirror = XRC
� No data loss� System z & open� Scalable bandwidth (trade-off RPO)� A to C network connectivity required for IR
(incremental resynch)� If A fails, A restarted in B and DR
maintained� If B fails, reconfig needed to restore DR
A B
C
GDPS/GM capability provides out of region data and application resilience and
protection against a local site disaster
GDPS/PPRC designed to provide continuous availability in the event of catastrophic campus
site failures involving multiple components
© 2010 IBM Corporation
Non-z
site-1
site-2 site-R
Global Mirror
� Optional: CFs / Prod systems in Site2� Non-z: UNIX®, Linux, Linux on z, Windows®
K1
Metro Mirror (PPRC)
Incremental Resync
CF3
CF1Non-
z
(P2)
P1K1
(Kg)
CF2Non-
zP2(P1)
K2
K2
Non-z AKg
C
Non-z
D
RNon-
zP2P1
R
F
Backup
Mandatory Recommended
KgIP
Non-z B
� No data loss; System z & open � A to C network connectivity required for IR
(Incremental Resynch)� If A fails, HyperSwap to B and DR maintained� If B fails, Incremental Resynch A � C and DR
position maintained
FlashCopy
Global MirrorETRor
STP
GDPS/MGM configuration
© 2010 IBM Corporation
Agenda
� Why GDPS ?
� GDPS Family of Offerings
– Continuous Availability of Data within a Data Center
– Continuous Availability (CA) / Disaster Recovery (DR) in Metropolitan Region
– DR at Extended Distance
– CA Regionally and DR Out-of-Region
� Extensions to Heterogeneous Platforms
� GDPS Customer Experiences
� Summary
© 2010 IBM Corporation
CKD
FBA
FBA
FBA
CKD
FBA
FBA
FBA
Primary Secondary
CKD
FBA
FBA
FBA
CKD
FBA
FBA
FBA
Metro Mirror
over FC
Global Mirror
z/OS
GDPS API / TSO WINTEL / UNIX
ethernet
Open LUN Management
� Data consistency across z/OS and open systems
� GDPS single point of control
� Supports cross-platform or platform level Freeze– Suspends reported through
SNMP alert
� GDPS/PPRC, GDPS/PPRC HM and GDPS/GM
Helps provide enterprise-wide Disaster Recovery with Data Consistency
© 2010 IBM Corporation
GDPS/PPRC Multiplatform Resiliency for System z (xDR)- Architectural Building Blocks
GDPS
SA MP
System z Hardware
SA zOS
NetView
z/OSLinux on z
z/VM
Business Continuity solution for z/OS and Linux applications on System z
� Leverage of existing and proven solutions
– GDPS
– SA z/OS
– IBM Tivoli System Automation for Multiplatform (SA MP)
� Coordinated cross platform business resiliency for operating systems (OS) running on System z hardware
� Integration point of z/OS and Linux on System z
Linux on System z can also run nativelyin its own partition
© 2010 IBM Corporation
xDR Guest Linux Clusters
NetViewEAS TCP/IP
z/OS
NetView
z/VM
SA MPLinux
ProxyNode
SAPSA MPLinux
MasterNode
SAPSA MPLinux
Node
SAPSA MPLinux
Node
TC
P/IP
Con
nect
ions
� Multiple z/VM systems supported– z/VM guests running Linux & SA MP
� Proxy Guest used by GDPS to communicate commands to z/VM and to monitor for disk errors
– Is a cluster of just one node
� A second cluster made up of multiple nodes
– One node will be the master
– All guests run an application workload, e.g., SAP
� TCP/IP connections between the Linux guests and Netview Event Automation Address Space (EAS) are used to inform GDPS of
– Their existence
– Their ongoing presence, and
– Any change in their status Clusters
SA zOS
GDPS
© 2010 IBM Corporation
CF1
K1
z/OSz/VMLinux
P1
z/OS
K2
z/OS
P2
z/OS
SITE 1 SITE 2CF2
S S LPL PL
L
k-sys
LP
LS
Native Linux
VMLinux Guests
Native Linux
Linux Guests
CDSCDS
LL
k-sys backup
[native Linuxswap disks]
Linux GuestProxy
GDPS automates
Linux startup
CBU
Planned & Unplanned
HyperSwap
Requires Tivoli System Automation for Multiplatform
Coordinated near-continuous availability and DR sol ution
� GDPS support for SLES (zVM guest or native) and RHEL (zVM guest)
GDPS/xDR – outage use cases
© 2010 IBM Corporation
Distributed Cluster Management (DCM)
� “End-to-end recovery solution”– Provides management and coordination of
• Planned and unplanned outages
• System z and distributed servers using clustering solutions
– Helps optimize operations across heterogeneous platforms– Helps meet enterprise-level RTO and RPO
� DCM function added to GDPS– Main management code runs in GDPS – GDPS DCM agent code runs on one of the distributed servers in each cluster
� DCM Support for– Veritas Cluster Server (VCS)– Tivoli System Automation Application Manager (SA AppMan)
� Highly recommend IBM Geographically Dispersed Open Clusters (GDOC) services
Integrated, Automated, Industry-unique
© 2010 IBM Corporation
GDPS (PPRC/XRC/GM) DCM for VCS
DR at extended distance
Rapid systems recoverywith only “seconds” of data loss
DR at extended distance
Rapid systems recoverywith only ‘seconds” of data loss
CA / DR within a metropolitan region
Two data centers - systems remain active; designed for no data loss
GDPS/PPRC
GDPSmanaged
PPRC
VCS GCO
VCS managed replication
VCS andGDPS DCM
Agent
VCS
GDPS/XRC
GDPSmanaged
XRC
VCS managed replication
VCS andGDPS DCM
Agent
VCS andGDPS DCM
AgentVCS GCO
VCSVCS GCO
VCS managed replication
GDPSmanaged
GM
GDPS/GM
VCS andGDPS DCM
Agent
VCS
K
RKSDMK1
K2
site-2si
te-1
site-2
site
-1
site
-1
site-2
© 2010 IBM Corporation
Agenda
� Why GDPS ?
� GDPS Family of Offerings– Continuous Availability of Data within a Data Center
– Continuous Availability (CA) / Disaster Recovery (DR) in Metropolitan Region
– DR at Extended Distance
– CA Regionally and DR Out-of-Region
� Extensions to Heterogeneous Platforms
� GDPS Customer Experiences
� Summary
© 2010 IBM Corporation
10
Logical Subsystems
(LSS)
8 sec8 sec5 sec (*)730 pairs
Unplanned HyperSwap
UIT
Planned HS SUSPEND
UIT
Planned HS RESYNCH
UIT
Number PPRC Volumes
16-way Parallel Sysplex – active/active(CICS, DB2/IMS, Websphere)
Business Requirements :No data loss (RPO = 0)
Dynamic disk switching without system and/or application outage
HyperSwap UIT < 60 seconds for planned and < 30 seconds for unplanned disk reconfigurations
UIT = User Impact Time (seconds)
Hamburg, Germany
CF
site-2site-1
... z/OSP2n-1
z/OSK1
L
CF...z/OSP2n
z/OSK2
L10 km
n=1 to 7
P P SSIBM
GDPS Freeze Policy
SWAP,GO
z/OS 1.10, NV 1.4, SA 3.1, GDPS HM 3.5
(*) GDPS controlling system only10/2008
BTSPTS/CTS
Web site: www.signal-iduna .de
GDPS/PPRC HyperSwap Manager- HyperSwap Experience
© 2010 IBM Corporation
SP
PTS BTS
ARZ GDPS/PPRC – active / active- HyperSwap Experience
10-way Parallel Sysplex(CICS, DB2)
20 sec
Planned HSSuspend
UIT
32
Logical Subsystems
(LSS)
122 sec
Planned HS Resynch
UIT
4,679
PPRCVolume Pairs
16 sec
Unplanned HyperSwap
UIT
Business Requirements� No loss of committed data (RPO = 0)
and continuous data availability
� Supported site maintenance without application outage
� No more than 10 minutes disruptionin the event of catastrophic systemsor data center failure (RTO < 10 min)
HDS
Innsbruck, Austria
ICFz/OS(n=4)
z/OSk-sys (*)
ICF z/OSk-sys (*)
z/OS(n=4)
(*) GDPS Controlling Systems
UIT = User Impact Time (seconds)
Web site: www.arz.at
GDPS Freeze Policy
SWAP,STOPHyperSwap
CF Duplexing
2 KM
site-1 site-2
12/2009
z/OS 1.10, NV 5.3, SA 3.2, GDPS 3.5
TS7700
IBM
TS7700
© 2010 IBM Corporation
S LS
Web site: www.postbank .de
10-way Parallel Sysplex (CICS, SAP/DB2)
GDPS/PPRC – active / active (CF in 3rd site)- HyperSwap Experience
Bonn, Germany
Business Requirements� No loss of committed data
(RPO = 0)
� Ability to recover catastrophic logical site failures involving multiple components (RTO < 1 hour)
� Support of site maintenance without application outage
� Creating a FlashCopy from DB2 volumes once the day using DB2 Backup System Utility and Remote Pair FC
� Dumping FC volumes to tape as secondary backup
16
Logical Subsystems
(LSS)
8 sec 1)
Planned HS RESYNCH
1,611
PPRCVolume Pairs
26 sec 1)
Unplanned HS (site failure)
site-1 site-X site-2
PL
Campus (3 logical sites)HyperSwap
PTS
ICFSysplex/GDPS
K1
CF(Arbiter)
BTS
ICF Sysplex/GDPS
K2
12/2009 1) User Impact Time (seconds)
P
Remote Pair FlashCopy
(planned 2010)
IBM
© 2010 IBM Corporation
36
z/OSLSS
4-5 sec17( 11 | 6 )
578( 374 | 204 )
1,224
Planned HS SUSPEND UIT
z/VMLSS
z/VMPPRC Pairs
z/OSPPRC Pairs
10-way zOS Parallel Sysplex (CICS, DB2, Websphere) & two zVM Clusters
Business Requirements :No data loss (RPO = 0)Continuous data availability for
z/OS and Linux hosted by z/VMCoordinated disaster recovery for
heterogeneous System z applications (RTO < 1 hour) UIT = User Impact Time (seconds)
Nuremberg, Germany
site-2site-1
z/OS 1.11, NV 5.3, SA 3.2, GDPS 3.6, z/VM 5.3
11/2009
GDPS/PPRC xDR – active / active - HyperSwap Experience
S SS
CF1z/VM1
Linux guestszOS Sysplex
CF2z/VM2
Linux guestszOS Sysplex
1.2 km
BTSPTS/CTS
HDS P PP
planned & unplannedHyperSwap
K1 CMCSys
K2CMCSys
K1local K2 p2p1 local
p2p1proxyproxy
(p2)(p1)
Web site: www.sdv -it.de
17 15
© 2010 IBM Corporation
Web site: www.ubs .com
CFs
HDSIBM
GDPS/PPRC – active / active- Simulated Site Failure Experience
Zurich, Switzerland
5 min 05 sec
3 min 35 sec
--
Simulatedsite-1 failure
RTO (*)
33 sec
14 sec
6 sec
Unplanned HyperSwap UIT
20 sec956 pairs6-way
41 sec6,303 pairs(64 LSS, 344TB)
14-way
11 sec162 pairs 6-way
Planned HS SUSPEND
UIT
Number PPRC Volumes
GDPSplex
site-2
site-1
Hyp
erS
wap
PP
zOS(k-sys)
zOS(n=6)
L
L
PS
CFszOS(k-sys)
zOS(n=6)
10 k
m (
DW
DM
)
Business Requirements :No loss of committed data (RPO 0 sec)Few minutes service impact in the event of
catastrophic systems, multiple components or data center failure (RTO couple minutes)
Single component maintenance or failure without application outage
Operational Requirement :Recovery (RTO) without any manual
intervention
14-way Parallel Sysplex(CICS/DB2, SAP, Websphere MQ)
UIT = User Impact Time, (*) Service Impact Time Middleware Recovery7/2008
© 2010 IBM Corporation
Web site: www.ubs .com
GDPS/MzGM Experience- Simulated Regional Disaster
Parallel Sysplex/GDPS Model(CICS/DB2, SAP, Websphere MQ)
CFsGDPS/PPRC
B
K2
IBM
CFsGDPS/PPRC
A
K1
site-2
site-1
6,600 KM
z/OS Global Mirror
(XRC)Metro Mirror
(PPRC)10 KM
Incremental Resync
(planned 1Q’10)
HDS
CFGDPS/XRC
Kx SDM
Note: needed "insurance copies" and backup LPARs are not
shown for clarity sake
site-R
USA
Switzerland
Business Requirements :RPO < 20 min (for regional events)
RTO < 4h failover time to the recovery site in the event of a regional disaster (wide scale disruption)
Operational Requirement :Recovery (RTO) with two manual
interventions only – FlashCopy and XRC Recover followed by IPLing backup systems
3-5 sec
3-5 sec
Average Data in Flight
(RPO)
45 min
45 min
SimulatedRegional D/R
(RTO)
7 min
7 min
FC & XRC Recover Time
196 pairs201 pairs2 LSS
2-way
402 pairs465 pairs
5 LSS4-way
Number XRC Volumes
Number PPRC Volumes
GDPS /PPRC
11/2009
F C
FC
Sys
Sys
Zurich, Switzerland
© 2010 IBM Corporation
GO
STOP
FreezePolicy
SWAP,xxx
4 CKD
3 CKD2 FB
Logical Subsystems
(LSS)
3
4
GDPS/PPRCSystems
< 60 min 131 sec< 5 sec 248 CKD
31 FB3369 CKD
31 FB
131 sec
Global Mirror Recovery
< 5 sec
AverageData in Flight(remote RPO)
< 60 min
Simulated Regional D/R and Failover
to site-R
267 CKD
Global MirrorVolumes
3294 CKD
Planned HS Suspend
UIT
PPRCVolumes
1/2010Kp = Controlling System (PPRC) Kg = Controlling System (GM) Rg = Recovery System (GM)
Zurich, Switzerland
GDPS/MGM Experience- HyperSwap and Global Mirror
Parallel Sysplex/GDPS Model(CICS, DB2, MQ)
B
IBM
zOSSysplexCF
A
site-2
site-1
60 KM
Metro Mirror (PPRC)
8 KM
Global Mirror (GM)
Backup LPARs
Note: needed "insurance copies" and system infrastructure
volumes not shown for clarity sake
site-R
Business Requirements :Local RPO 0 sec (no data loss)Local RTO < 1h failover time in
event of catastrophic systems, multiple components or site failure
Single component maintenance or failure without application outage
Remote RPO < 10 min (data loss), disk recover time couple minutes
Remote RTO < 2h failover time for regional D/R events
D C
FC
Rg
Kp
Kg
zOSSysplexCF
Kp
Kg
Local
Local
Web site: www.six-group .com
Rg
z/OS 1.10, NV 5.3, SA 3.2, GDPS 3.6
© 2010 IBM Corporation
Agenda
� Why GDPS ?
� GDPS Family of Offerings
– Continuous Availability of Data within a Data Center
– Continuous Availability (CA) / Disaster Recovery (DR) in Metropolitan Region
– DR at Extended Distance
– CA Regionally and DR Out-of-Region
� Extensions to Heterogeneous Platforms
� GDPS Customer Experiences
� Summary
© 2010 IBM Corporation
GDPS Value Proposition
Customer Focus
Support
� +530 GDPS Licenses installed in 38 countries worldwide
� Proven technology, automated, and repeatable result
� Complete implementation by experienced consultants
� GDPS supports industry accepted, open replication architectures (PPRC, XRC, GM, and FC)
� Architectures licensed by all enterprise storage vendors
� (new) GDPS qualification program (IBM and Hitachi)
Open IndustryStandards
Product Maturity
The Ultimate Availability Solution
Customer Acceptance
InvestmentProtection
� Easily upgradeable
� Common code base for each product
� Generally availablesince 1998
� Suite of products� E2E capability� Several years of Sys z
production experience� CA/DR best of breed� Continually enhanced
� GDPS Design Council
� Synergy with IBM development labs
� Incorporates several IBM patents
� Dedicated dev & solution test lab
� New V.R every year
Commitment� Fully supported
via standard IBM support structure
� Fixes through normal Sys z channels
ValueVision
Experience
”Using the GDPS/PPRC HyperSwap technology is a significant step forward in achieving continuous availability. The benefits in our GDPS environments are that planned switches of the disk
configuration took 13-33 seconds without application outage. The user impact time of unplanned disk reconfigurations was 9-16 seconds; with 16 seconds to swap a configuration of over 4,600
PPRC volume pairs. Without HyperSwap planned and unplanned reconfigurations had resulted into a service outage of almost two hours in our Sysplex/GDPS with 10 systems."
Wolfgang Dungl, Manager of Availability, Capacity and Performance ManagementWolfgang Schott, GDPS Project Manager iT-AUSTRIA
© 2010 IBM Corporation
Grow with GDPS
GDPS/MzGMCedacri S.p.A.
SecetiUBS
ICBCGaranti Bank
Royal Bank of Canada
Wells Fargo
CommerzbankGDPS HyperSwap Manager / zGM
More than 530 licenses in 38 countriesDozens of references
GDPS installed in many of the largest banks in the world
Introduced in 1998Over 10 years of experience
American ExpressGDPS/XRC
Barclays BankKey BankPrincipal Financial GroupRegions Financial Corp.Sun Trust Bank
Credit AgricoleIntesa Sanpaolo
BPVN
GDPS/MGMBaloise
Six Group
Signal IdunaGDPS HyperSwap Manager
dm-drogerie markt
Generali Informatik Services
La CaixaCentral Bank of TurkeyCredit SuisseDanske BankDeere & CompanyFinanz Informatik
Bancaja
Banca Popolare di MilanoBank of MontrealDeutsche Bank
BankinterBRZ
ARZGDPS/PPRC
Halifax Bank of ScotlandMonte Paschi di SienaiT-Austria
and more ….
PostbankRoyal Bank of Scotland
Svenska HandelsbankenToronto Dominion BankUBS
Sparda Bank (SDV)
GAD
© 2010 IBM Corporation
Summary
HyperSwapHyperSwapVendorIndependent
VendorIndependent
HeterogeneousHeterogeneous
Remote CopyManagement
Remote CopyManagement
Planned Actions
Planned Actions
3-site3-site
SkillTransfer
SkillTransfer
GDPSGDPSGDPS
AutomatedRecovery
AutomatedRecovery
Freedom of choice