+ All Categories
Home > Documents > Enable IBM DB2 High Availability Disaster Recovery...

Enable IBM DB2 High Availability Disaster Recovery...

Date post: 26-Mar-2018
Category:
Upload: trinhbao
View: 239 times
Download: 6 times
Share this document with a friend
56
SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com © 2014 SAP AG 1 Enable IBM DB2 High Availability Disaster Recovery (HADR) for DB2 pureScale in an SAP System Environment Applies to SAP NetWeaver 7.0 or higher on DB2 10.5 for Linux, UNIX, and Windows. Summary DB2 provides different features to enable database high availability (HA). The DB2 pureScale Feature is designed for extreme capacity, application transparency, and continuous high availability. However, prior to DB2 10.5, there was no true disaster recovery capability for DB2 pureScale. The combination of HADR and DB2 pureScale in DB2 10.5 provides excellent continuous availability within a cluster and full protection from data loss due to total failure on the primary cluster. Authors: Catherine Vu, Edgar Maniago, Ali Mehedi Company: IBM Canada Inc., SAP Canada Inc. Created on: 14 March 2014 Author Bio Catherine Vu is a member of the IBM SAP Integration and Support (ISIS) team that plays a critical role in certifying every DB2 Fix Pack and every new major DB2 release with SAP applications before their general availability. In addition, she is responsible for providing development support to SAP on IBM DB2 for Linux, UNIX, and Windows customers. Before joining the ISIS team, Catherine had many years’ experience in DB2 when she was working in the DB2 Development team and the Technical Enablement team. Since joining SAP in 2005, Edgar Maniago, a Software Engineer, has been a member of the IBM SAP Integration and Support Centre located in the Toronto IBM Lab. He currently tests, develops, and integrates new features of DB2 with SAP. Through his role in SAP development support and as a Customer Advocate for IBM, Edgar assists SAP consultants and customers with activities such as troubleshooting and performance optimization. Currently in the IBM and SAP Integration and Support team, Ali Mehedi is a Software Developer with two years of experience in test tools development, one year of experience in LAMP development, and three years of experience in DB2 for LUW and SAP integration. He is a certified DBA of DB2 for LUW and well experienced with SAP BASIS in Windows, AIX and Linux environments.
Transcript
Page 1: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com

© 2014 SAP AG 1

Enable IBM DB2 High Availability

Disaster Recovery (HADR) for DB2

pureScale in an SAP System

Environment

Applies to

SAP NetWeaver 7.0 or higher on DB2 10.5 for Linux, UNIX, and Windows.

Summary

DB2 provides different features to enable database high availability (HA). The DB2 pureScale Feature is designed for extreme capacity, application transparency, and continuous high availability. However, prior to DB2 10.5, there was no true disaster recovery capability for DB2 pureScale. The combination of HADR and DB2 pureScale in DB2 10.5 provides excellent continuous availability within a cluster and full protection from data loss due to total failure on the primary cluster.

Authors: Catherine Vu, Edgar Maniago, Ali Mehedi

Company: IBM Canada Inc., SAP Canada Inc.

Created on: 14 March 2014

Author Bio

Catherine Vu is a member of the IBM SAP Integration and Support (ISIS) team that plays a critical role in certifying every DB2 Fix Pack and every new major DB2 release with SAP applications before their general availability. In addition, she is responsible for providing development support to SAP on IBM DB2 for Linux, UNIX, and Windows customers. Before joining the ISIS team, Catherine had many years’ experience in DB2 when she was working in the DB2 Development team and the Technical Enablement team.

Since joining SAP in 2005, Edgar Maniago, a Software Engineer, has been a member of the IBM SAP Integration and Support Centre located in the Toronto IBM Lab. He currently tests, develops, and integrates new features of DB2 with SAP. Through his role in SAP development support and as a Customer Advocate for IBM, Edgar assists SAP consultants and customers with activities such as troubleshooting and performance optimization.

Currently in the IBM and SAP Integration and Support team, Ali Mehedi is a Software Developer with two years of experience in test tools development, one year of experience in LAMP development, and three years of experience in DB2 for LUW and SAP integration. He is a certified DBA of DB2 for LUW and well experienced with SAP BASIS in Windows, AIX and Linux environments.

Page 2: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 2

Table of Contents

Applies to ............................................................................................................................................................ 1

Summary ............................................................................................................................................................. 1

Author Bio ........................................................................................................................................................... 1

1. Introduction .................................................................................................................................................... 4

2. Planning ......................................................................................................................................................... 5

2.1 Reference Documentation ........................................................................................................................... 5

2.2 Terminology ................................................................................................................................................. 6

2.3 Restrictions for HADR in a DB2 pureScale Environment ............................................................................ 7

2.4 Setup Requirements for HADR ................................................................................................................... 7

2.4.1 Hardware Requirements ....................................................................................................................... 7

2.4.2 Software Requirements ........................................................................................................................ 8

3. Preparation .................................................................................................................................................. 10

3.1 Configuration of Test Systems Used in This Document ............................................................................ 10

3.2 Creating Identical Users on All Nodes ....................................................................................................... 11

3.3 Manually Setting Up Passwordless Access for User Root ........................................................................ 11

4. Installation .................................................................................................................................................... 13

4.1 Setting Up the Standby Database Server.................................................................................................. 13

4.1.1 Setting Up Identical Users and Groups ............................................................................................... 13

4.1.2 Installing the DB2 Software ................................................................................................................. 15

4.1.3 Creating the DB2 pureScale Instance ................................................................................................. 15

4.1.4 Creating the GPFS File Systems ......................................................................................................... 15

4.1.5 Adding Members and CFs ................................................................................................................... 17

4.1.6 Verifying the New Installation on the Standby Cluster and Checking the Configuration Settings ....... 17

5. HADR Setup ................................................................................................................................................ 19

5.1 Creating the Standby Database ................................................................................................................ 19

5.2 HADR Port ................................................................................................................................................. 20

5.3 Configuring the Primary and Standby ........................................................................................................ 20

5.4 Starting HADR ........................................................................................................................................... 21

5.5 Monitoring HADR ....................................................................................................................................... 22

6. HADR Operations ........................................................................................................................................ 26

6.1 Starting HADR ..................................................................................................................................... 26

6.1.1 Running on a Standard Database (HADR Not Enabled) ..................................................................... 26

6.1.2 Running on the Primary Database ....................................................................................................... 26

6.1.3 Running on the Standby Database ...................................................................................................... 27

6.2 Stopping HADR ................................................................................................................................... 27

6.2.1 Running on the Primary Database ...................................................................................................... 28

6.2.2 Running on the Standby Database ...................................................................................................... 28

6.3 Changing the Preferred Replay Member ............................................................................................. 28

Page 3: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 3

6.3.1 Changing the Current Replay Member ................................................................................................ 29

6.3.2 Changing the Standby Cluster’s Preferred Replay Member ................................................................ 31

6.3.3 Changing the Primary Cluster’s Preferred Replay Member ................................................................ 31

6.4 Role Switch .......................................................................................................................................... 32

6.5 Failover ................................................................................................................................................ 35

7. DB2 pureScale Topology Changes With HADR .......................................................................................... 39

7.1 Adding Members ........................................................................................................................................ 39

7.2 Dropping Members .................................................................................................................................... 41

8. HADR Rolling Update .................................................................................................................................. 42

8.1 Checking the Rolling Update Status .......................................................................................................... 42

8.2 Updating the Standby Cluster .................................................................................................................... 46

8.2.1 Installing the DB2 Fix Pack Update on the Secondary CF .................................................................. 46

8.2.2 Installing the DB2 Fix Pack Update on the Primary CF ....................................................................... 47

8.2.3 Installing the DB2 Fix Pack Update on the Members .......................................................................... 47

8.2.4 Checking the Update ........................................................................................................................... 47

8.3 Updating the Primary Cluster .................................................................................................................... 48

8.3.1 Installing the DB2 Fix Pack Update on the Secondary CF .................................................................. 48

8.3.2 Installing the DB2 Fix Pack Update on the Primary CF ....................................................................... 48

8.3.3 Installing the DB2 Fix Pack Update on the Members .......................................................................... 48

8.3.4 Checking the Update ........................................................................................................................... 49

8.4 Committing the Update ............................................................................................................................... 49

8.4.1 Committing the Update on the Standby Cluster .................................................................................. 49

8.4.2 Committing the Update on the Primary Cluster ................................................................................... 49

9. Maintenance and Troubleshooting .............................................................................................................. 50

9.1 Maintaining the Database Configuration on the Standby .................................................................... 50

9.2 Checking Tablespace States on the Standby After Load Operations ................................................. 50

9.3 Automatic HADR Congestion Detection .............................................................................................. 51

9.4 Failed HADR Start ............................................................................................................................... 52

9.4.1 The Standby Database is Unavailable ................................................................................................ 52

9.4.2 Mismatch in db2level ........................................................................................................................... 52

9.4.3 Duplicate Port Number ........................................................................................................................ 52

9.4.4 The Standby Database is not in Rollforward-Pending or Rollforward-in-Process Mode ..................... 53

9.4.5 HADR_TIMEOUT ................................................................................................................................ 53

9.5 Recover from Failed Takeover ............................................................................................................ 53

9.6 Failed Primary Reintegration ............................................................................................................... 53

9.7 HADR Data Collection for Support ...................................................................................................... 54

Conclusion ........................................................................................................................................................ 55

Page 4: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 4

1. Introduction

DB2 pureScale is a clustered database system that offers continuous availability, scalability, and exceptional application transparency. Designed to be continuously available, even if multiple components fail simultaneously, full access to data is still provided because DB2 pureScale uses a shared disk approach. If a failure occurs on one member, the data on the shared disk storage still remains available to active members for processing database requests. The DB2 pureScale environment can be scaled by easily adding as many members as needed to an existing database instance. Furthermore, it requires little change to the SAP environment.

Prior to DB2 10.5, DB2 pureScale did not have a DB2-provided disaster recovery solution. A solution outside of DB2, for example, based on storage level replication, was needed to protect a DB2 pureScale cluster against a disaster. As of DB2 10.5, you can use DB2 High Availability Disaster Recovery (HADR) to protect a DB2 pureScale cluster against disasters and thus further enhance reliability and reduce the risks of unplanned outages.

In HADR, the loss of data is protected by replicating data changes from a source database, the primary server, to a target database, the standby server. DB2 log records are shipped from the primary to the standby database during runtime. On the standby database, the logs are replayed to keep it synchronized with the primary database. Since both primary and standby servers have their own disks, the standby database server can be situated in a different location from the primary database server.

HADR for DB2 pureScale will be configured and managed much like existing HADR in DB2 single partition, non-pureScale environments. That is, a backup is taken from the primary database and used to restore and set up the standby database, and HADR configuration parameters are set on both the primary and standby databases. However, the main difference is that the HADR primary and standby databases consist of DB2 pureScale clusters.

Both the primary and the standby cluster have multiple members and at least one cluster caching facility located on different physical machines. When HADR is activated, all the members in the primary cluster will send their DB2 logs to the standby cluster. Only one of the members in the standby cluster will receive and replay all the logs shipped from the primary cluster.

This document describes how to set up High Availability Disaster Recovery (HADR) for DB2 pureScale in an SAP environment. Furthermore, it includes an overview of HADR maintenance tasks and operations.

Page 5: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 5

2. Planning

2.1 Reference Documentation

SAP Documentation

Planning and Installation - SAP Enhancement Package 1 for SAP NetWeaver 7.3:

http://service.sap.com/installnw73

DB2 HADR Documentation

Initializing HADR:

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/t0011725.html

HADR overview: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0011267.html

IBM developerWorks wiki page for HADR features:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR

DB2 pureScale Documentation

Preparing to install the DB2 pureScale Feature:

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/c0060061.html

How to Guide: Running an SAP System on IBM DB2 pureScale:

http://scn.sap.com/docs/DOC-14446

Use of the DB2 pureScale Feature with SAP Applications:

http://scn.sap.com/docs/DOC-39567

HADR for DB2 pureScale Documentation

HADR setup in a DB2 pureScale environment:

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0061085.html

Restrictions for HADR in DB2 pureScale environments:

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0061049.html

DB2 pureScale topology changes and high availability disaster recovery (HADR):

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0061087.html

Page 6: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 6

HADR takeover operations in a DB2 pureScale environment:

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0061129.html

2.2 Terminology

The following section defines common terms that are used in this document. To find definitions of other terms not listed below, you can search in the DB2 information Center:

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.glossary.doc/doc/glossary.html

Standard database

In an HADR context, a standard database is a database that is not configured for HADR. It is neither the

primary, nor the standby database.

Preferred replay member

In HADR for DB2 pureScale, only one member is chosen and used to replay the logs on the standby database. The member where the “START HADR” command is issued is designated as the preferred replay member.

Assisted Remote Catchup (ARCU)

In the case where a member on the primary database has failed to deliver its own logs to the HADR standby database, another member on the primary database can assist in reading and sending the failed member‘s logs to the standby database. A member can assist multiple members.

Role switch

A role switch is also called non-forced takeover. The “TAKEOVER HADR” command is issued on the standby cluster. As a result, the primary cluster now becomes the standby cluster, while the standby cluster becomes the primary cluster. A role switch can only be performed if the database is in peer state, remote catchup state (RCU) when the synchronization mode is SUPERASYNC, or assisted remote catchup state.

Failover

In the event of software, hardware, or network failure on the primary cluster, the standby cluster is forced to take over and become the primary. To perform failover, the “TAKEOVER HADR” command with the “BY FORCE” option can be issued on the standby database. This is allowed only if the standby database is in peer state, disconnected peer state, remote catchup pending state, or remote catchup state.

Failback

In the context of HADR, failback is an operation of switching the roles of the primary and the standby back to their original roles after a failover has occurred. That is, the original primary, which has acted as the standby after the failover, returns to being the primary cluster. Similarly, the original standby, which during the failover became the primary, returns to being the standby.

Catchup state

Page 7: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 7

Catchup state is an initial state when HADR is started. In this state, the standby database is trying to retrieve all logs that occurred on the primary database and apply them to the standby database. The standby can retrieve log files locally or remotely from the primary through the HADR network connection.

Peer state

HADR enters peer state when the logs on the primary have been shipped and replayed on the standby. So, the log files on the standby database appear identical to the primary database.

2.3 Restrictions for HADR in a DB2 pureScale

Environment

There are a number of restrictions for HADR in a DB2 pureScale environment:

The only two HADR synchronization modes supported in a DB2 pureScale environment are ASYNC and

SUPERASYNC. The hadr_syncmode configuration parameter can be used to specify the mode.

Due to the restriction of using ASYNC or SUPERASYNC mode, a peer window is not supported.

Only one single standby database is allowed for HADR for DB2 pureScale.

The topology of the primary and standby must be identical. The number of members in the primary

cluster is equal to the standby cluster, but not necessary for cluster caching facilities (CFs). If a member

is added or dropped from the primary cluster, it must also be added or dropped from the standby cluster.

The “reads on standby” (RoS) feature is not supported.

IBM Tivoli System Automation for Multiplatforms (SA MP) is not used to manage automated failover. It is

only used to manage high availability within the local cluster. The reason is that a DB2 pureScale cluster

is already set up highly available by design. In a DB2 pureScale environment, HADR is intended to only

serve as a DR solution.

Network address translation (NAT) between the primary and standby sites is not supported.

2.4 Setup Requirements for HADR

2.4.1 Hardware Requirements

The following hardware requirements must be fulfilled to set up HADR in a DB2 pureScale environment:

Requirement

Type

Requirement

Page 8: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 8

Database Servers

An HADR pair is made up of two separate DB2 pureScale clusters. One cluster is used as the primary and another is used as the standby. Each cluster is made up of multiple members and at least one cluster caching facility.

Both the primary and standby cluster must meet the requirements for installing and setting up a DB2 pureScale environment:

Installation prerequisites for DB2 pureScale (AIX) http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r0054850.html

Pre-installation checklist for DB2 pureScale (AIX) http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r0056077.html

Installation prerequisites for DB2 pureScale(Linux) http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r0057441.html

Pre-installation checklist for DB2 pureScale (Linux) http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/r0057204.html

NOTE

For HADR with DB2 pureScale, we recommend that you configure

more CPU power and memory for the preferred replay member.

The member topologies on the primary and standby must be the same.

However, it is possible to configure multiple logical members on the

same host if there are resource constraints.

SAP Servers Make sure that you meet the hardware requirements listed in the relevant SAP Installation Guide that can be found at https://service.sap.com/instguides.

2.4.2 Software Requirements

Requirement

Type

Requirement

Supported

Operating

Systems

The installation of DB2 pureScale is only supported on AIX Power 7 and

Linux X86.

DB2 Version You must install IBM DB2 10.5 or higher to be able to set up HADR with DB2

pureScale.

General Parallel File System (GPFS) and SA MP to be used within a cluster

are installed during the DB2 installation.

Page 9: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 9

SAP Release DB2 pureScale is only supported for SAP systems based on at least SAP

NetWeaver 7.0 SR3.

NOTE

We recommend that you apply the latest available kernel patch

to your system. For more information on how to apply an SAP

kernel patch, see SAP Note 19466.

To be able to use the new functionality of DB2 10.5 with SAP

NetWeaver Application Server ABAP, you must install the

minimum required support packages, see SAP Note 1851853.

Page 10: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 10

3. Preparation

In this document we use a test system for illustration. Step-by-step instructions and screen outputs are provided from the test system.

3.1 Configuration of Test Systems Used in This

Document

The figure below illustrates an architectural overview of HADR for DB2 pureScale in an SAP environment, which is used for this document:

Figure1: Architectural overview of test systems used in this document

In the following, the systems used for this document are described:

Database Servers

Page 11: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 11

Primary cluster: DB2 pureScale hosted on AIX LPARs coralpib52 (member 0), coralpib85

(member 1), coralpib51 (primary caching facility), coralpib86 (secondary caching facility)

Standby cluster: DB2 pureScale hosted on AIX LPARs coralpib49 (member 0), coralpib83

(member 1), coralpib50 (primary caching facility), coralpib84 (secondary caching facility)

On all hosts, db2 instance owner is db2aco.

On all hosts, TCP port 60000 is used to communicate between HADR primary and standby

clusters.

Database name is ACO.

SAP Servers

Central services instance for ABAP (ASCS Instance) is hosted on Linux machine vmsaple14.

Primary application server instance is hosted on Linux machine vmsaple14.

Additional application servers are hosted on Linux machines vmsaple15 and vmsaple16.

3.2 Creating Identical Users on All Nodes

The users must belong to the same groups. The numeric user IDs and group IDs must be identical across all servers. In addition, they must use the same shell and path to the home directory.

In this document, we use SAP’s installation tool, the software provisioning manager, to make sure that all users are created with correct properties on all members and CFs for both primary and standby clusters.

The following users must be available on all nodes:

Database administration user db2<SID>: db2aco

DB2 connect user sap<SID>: sapaco

SAP system administration user <SID>adm: acoadm

For more information about users and groups, refer to the SAP Installation Guide on SAP Service Marketplace at https://service.sap.com/instguides.

3.3 Manually Setting Up Passwordless Access for

User Root

For the DB2 pureScale feature, the root user must be able to log on to all nodes without passwords using Open Secure Shell (OpenSSH). The instance owner also requires passwordless SSH access. However, the DB2 installation processes will set this up if the instance owner does not have it set up already.

The following steps can be used to set up passwordless SSH access for the root user within a DB2 pureScale primary or standby cluster:

1. On each node in a cluster, log on as root user.

2. Generate a public and private key pair:

ssh-keygen -t dsa

3. Ensure no passphrase is entered by pressing “Enter” to accept the default value when prompted for

input. Two files will be generated in ~/.ssh . One is the private key id_dsa. Another is the public

key id_dsa.pub

Page 12: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 12

4. Transfer the public key to other nodes:

scp ~/.ssh/id_dsa.pub root@<hostname>:~/id_dsa.pub

5. Append the public key of all nodes in a cluster to the authorized_keys file located at ~/.ssh cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys.

6. Change permissions of the authorized_keys file:

chmod 644 authorized_keys

For more information about installing and setting up OpenSSH, refer to the Information Center topic at

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/t0055342.html .

Page 13: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 13

4. Installation

The following sections describe how to install SAP and set up the DB2 pureScale clusters for both the primary and standby database:

1. Install the SAP ABAP Central Services (ASCS) instance.

2. Install the database instance on the primary cluster.

3. Install the SAP primary application server.

4. Install the DB2 software on the standby.

5. Create the DB2 pureScale instance on the standby cluster.

6. Add members and caching facilities (CFs) to the standby cluster.

7. Verify the DB2 pureScale instance setup successfully on the standby.

Steps 1 to 3: For more information about the installation of a distributed system with the layout of ASCS, database and primary application server instance, refer to your relevant SAP Installation Guide on SAP Service Marketplace at https://service.sap.com/instguides.

For more details on installing SAP with DB2 pureScale in step 2, see the document “Database Installation Guide: Running an SAP System on IBM DB2 10.1 with the DB2 pureScale Feature” at http://scn.sap.com/docs/DOC-14446.

Steps 4 to 7 are described in the sections below.

4.1 Setting Up the Standby Database Server

An HADR pair is made up of two separate DB2 pureScale clusters. An HADR primary cluster as described earlier is a DB2 pureScale cluster which has multiple members and CFs. Similarly, an HADR standby cluster is also a DB2 pureScale cluster. The following section describes the procedures for setting up a standby cluster in HADR for DB2 pureScale:

1. Create users and groups across all nodes of the standby cluster.

2. Install the DB2 software on one of the hosts of the standby cluster.

3. Create a DB2 pureScale instance with one member and one CF.

4. Create the GPFS file system.

5. Extend the standby cluster by adding another member and CF.

6. Verify the standby cluster.

4.1.1 Setting Up Identical Users and Groups

DB2 and SAP systems require identical numerical user and group IDs for both the primary and standby cluster. So, when the role of the primary cluster is switched to standby cluster, there will not be any communication interruption between the SAP ASCS instance and the new primary database instance.

Either create the users and groups manually or through SAP’s software provisioning manager. If you create the users and groups manually on all hosts in the standby cluster, also copy the SAP environment files (e.g.

Page 14: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 14

.dbenv.sh/dbenv.csh, sapenv.sh/sapenv.csh, login, etc.from the home directory of <sid>adm) from the primary install-initiating host (coralpib52) to the standby install-initiating host (coralpib49).

The procedure outlined below is for creating the users and groups on all hosts in the standby cluster using the software provisioning manager.

Prerequisites

Ensure user IDs and group IDs (name and number) that match the ones in the primary cluster do not conflict with other users and groups in the standby cluster.

Procedure

1. Start the SAP installer. On the Welcome screen, choose “Operating System Users and Groups” from

“Preparations” as shown in figure 6 below.

2. When prompted, enter the correct user id and group id for users <sid>adm, sap<sid>, and db2<sid>.

Make sure they have the same values as the users and groups on the primary cluster.

Figure 6: Creation of users and groups

Page 15: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 15

4.1.2 Installing the DB2 Software

After creating users and groups as required, install the DB2 software on the first node of the standby cluster, the install-initiating host (coralpib49). During the DB2 installation, make sure to include the DB2 pureScale Feature. For more information on the procedures of installing DB2, see the DB2 Information Center at http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.qb.server.doc/doc/t0008875.html

NOTE

We recommend that you use the DB2 setup wizard instead of db2_install. The db2_install command is deprecated and might be removed in a future release.

Make sure to specify an installation path that matches the installation path in the primary cluster and that is not located on a shared file system.

4.1.3 Creating the DB2 pureScale Instance

The topology of the DB2 pureScale instance must be the same in the primary and secondary clusters. That is, they must have the same number of members and member IDs.

Prerequisites

The root user in one host must be able to log in to other hosts in a cluster without supplying a password.

In order to run the db2icrt command, make sure that the DB2 software is installed on the first node of the standby cluster (coralpib49).

Make sure the disks are available for creating the instance-shared directory, tie breaker, and the other GPFS file systems.

Procedure

As a root user, run the following command to create a DB2 pureScale instance with one member and one CF:

<install_path>/instance/db2icrt -d -instance_shared_dev <disk_name> -tbdev

<disk_name> -cf <CFHostName> -cfnet <CFNetname> -m <MemberHostname> -mnet

<MemberNetname> -a SERVER_ENCRYPT -u db2<SID> db2<SID>

EXAMPLE

/db2/db2aco/db2_software/instance/db2icrt -d -instance_shared_dev /dev/hdisk1 -

tbdev /dev/hdisk3 -cf coralpib50 -cfnet coralpib50-ib0 -m coralpib49 -mnet

coralpib49-ib0 -a SERVER_ENCRYPT -u db2aco db2aco

4.1.4 Creating the GPFS File Systems

The following section describes how to create GPFS file systems using the db2cluster command.

Page 16: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 16

Prerequisites

You must have installed the IBM DB2 pureScale Feature in order to use the db2cluster command.

You must have shared disks between the hosts of the standby cluster to create the other file systems.

Procedure

For this document, we use only one GPFS file system for the database containers due to resource limitations of the test systems.

1. As root user on the host that will be member 0, create the GPFS file system for the SAP database using the following command:

<install_path>/bin/db2cluster -cfs -create -filesystem db2data -disk

<disk_name> -mount /db2/<SID>

EXAMPLE

/db2/db2aco/db2_software/bin/db2cluster -cfs -create -filesystem db2data -disk

/dev/hdisk2 -mount /db2/ACO

2. To check the list of GPFS file systems created, run the following command:

db2cluster -cfs -list -filesystem

EXAMPLE

db2aco> db2cluster -cfs -list -filesystem

FILE SYSTEM NAME MOUNT POINT

--------------------------------- -------------------------

db2data /db2/ACO

db2fs1 /db2sd_20131028100420

3. As root user, change the ownership of /db2/<SID>:

EXAMPLE

cd /db2

chown db2aco:dbacoadm ACO

4. Switch to user db2aco, create the SAP database directories, DB2 log directory, DB2 archive directory, and db2dump directory:

EXAMPLE

su - db2aco

coralpib49:db2aco> cd /db2/ACO

coralpib49:db2aco> mkdir sapdata1

coralpib49:db2aco> mkdir sapdata2

Page 17: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 17

coralpib49:db2aco> mkdir sapdata3

coralpib49:db2aco> mkdir sapdata4

coralpib49:db2aco> mkdir log_dir

coralpib49:db2aco> mkdir log_archive

coralpib49:db2aco> mkdir db2dump

4.1.5 Adding Members and CFs

There are two ways to add a member and secondary CF to an existing DB2 pureScale instance. Either through SAP’s software provisioning manager for “DB2 pureScale Feature – Topology Management” or manually using DB2 commands. For more information about adding members and a secondary CF using SAP’s software provisioning manager, see the database installation guide “Running an SAP System on IBM DB2 10.1 with the pureScale Feature” on SAP Service Marketplace.

Run the following commands from an existing member to add an additional member and a secondary CF, as shown in the example below:

As root user, run db2iupdt to add a member:

<install_path>/instance/db2iupdt -d -add -m <memberHostName> -mnet

<memberNetname> db2<SID>

As root user, run db2iupdt to add the secondary CF:

<install_path>/instance/db2iupdt -d -add -cf <CFHostName> -mnet

<CFNetname> db2<SID>

EXAMPLE

root@coralpib49:/> /db2/db2aco/db2_software/instance/db2iupdt -d -add -m

coralpib83 -mnet coralpib83-ib0 db2aco

root@coralpib49:/> /db2/db2aco/db2_software/instance/db2iupdt -d -add -cf

coralpib84 -cfnet coralpib84-ib0 db2aco

4.1.6 Verifying the New Installation on the Standby Cluster and Checking the Configuration Settings

After setting up a DB2 pureScale instance for the standby cluster, perform the following steps to make sure the instance was installed and configured successfully:

1. Run db2start to start the DB2 pureScale cluster.

2. Run db2instance -list to get the list of members and CFs created in your standby cluster.

3. Verify that all members and CFs have been started. One CF should be PRIMARY, another CF should

be in either PEER or CATCHUP state. The following example shows how the “db2instance -list” output

looks like:

Page 18: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 18

Example

coralpib49:db2aco 18> db2instance –list

ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME

-- ---- ----- --------- ------------ ----- ---------------- ----- -------

0 MEMBER STARTED coralpib49 coralpib49 NO 0 0 coralpib49-ib0

1 MEMBER STARTED coralpib83 coralpib83 NO 0 0 coralpib83-ib0

128 CF PRIMARY coralpib50 coralpib50 NO - 0 coralpib50-ib0

129 CF CATCHUP coralpib84 coralpib84 NO - 0 coralpib84-ib0

HOSTNAME STATE INSTANCE_STOPPED ALERT

-------- ----- ---------------- -----

coralpib84 ACTIVE NO NO

coralpib50 ACTIVE NO NO

coralpib83 ACTIVE NO NO

coralpib49 ACTIVE NO NO

4. Log on to the SAP application server, vmsaple15, as user <sid>adm to test the database connection to the

primary cluster using R3trans -d. The return code (0000) indicates it has connected successfully.

5. Compare the database manager configuration parameters from the standby instance to the primary instance and make sure they match.

6. Start the SAP system on the primary application server, vmsaple14.

Page 19: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 19

5. HADR Setup

The HADR for DB2 pureScale setup is very similar to the known non-pureScale HADR setup. This makes it simple to set up, configure, and manage. The procedure for configuring and deploying HADR in a DB2 pureScale environment is as follows:

1. Back up the primary.

2. Restore the backup on the standby.

3. Configure the primary for HADR.

4. Configure the standby for HADR.

5. Start the standby.

6. Start the pimary.

7. Verify that the HADR setup was successful.

The following sections explain these steps in more detail.

5.1 Creating the Standby Database

As with non-pureScale DB2 HADR, there are also two ways to create a standby database for HADR in a DB2 pureScale environment:

You can take an offline or online backup of the primary database to an NFS mounted file system that is accessible to both primary and standby. Then, you can restore the backup image to the standby instance. The following example illustrates the procedure of creating the standby database:

Example

1. On the primary cluster, run the following command to take a backup:

db2 backup db ACO to /db2/ACO/backup compress

2. Copy the primary database backup from /db2/ACO/backup to the standby cluster at

/home/home_coralpib52.

3. On the standby cluster, restore the backup image as follows:

db2 restore db ACO from /home/home_coralpib52

Alternatively, you can initialize the standby by using a split mirror of the primary database with the following command:

db2inidb <dbname> AS standby

NOTE

The name of the primary database must remain identical to the standby database.

Page 20: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 20

If you do an online backup, ensure that the standby database remains in rollforward-pending or rollforward-in-progress mode. Do not issue the ROLLFORWARD DATABASE command on the standby after restoring or after the split mirror initialization.

Do not use TABLESPACE, INTO, REDIRECT, and WITHOUT ROLLING FORWARD options when running the RESTORE DATABASE command on the standby cluster.

If you use a split mirror approach, do not use SNAPSHOT or MIRROR options with the db2inidb command.

5.2 HADR Port

Add an HADR port to the /etc/services file on all hosts for the primary and standby clusters. The port is used for the HADR primary/standby communication.

EXAMPLE

Add the following entry to the /etc/services file on all hosts: hadr_port 60000/tcp

5.3 Configuring the Primary and Standby

Member-level configuration

On both the primary and standby database, set the following parameters for each member using the following command:

db2 “update db cfg for <dbname> member <memberNum> using HADR_LOCAL_HOST

<memhostname> HADR_LOCAL_SVC <memservicename>”

Cluster-level configuration

On both the primary and standby database, set the following configuration parameters: db2 “update db cfg for <dbname> using HADR_TARGET_LIST

{<memhostname1>:<memservicename1>|<memhostnameN>:<memservicenameN>}

HADR_SYNCMODE <syncmode> HADR_REMOTE_HOST

{<memhostname1>:<memservicename1>|<memhostnameN>:<memservicenameN>}

HADR_REMOTE_INST <instanceName>”

NOTE

Page 21: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 21

HADR_TARGET_LIST and HADR_REMOTE_HOST parameters list the members of the remote cluster. The members of a cluster must be enclosed in braces {}.

If the HADR_REMOTE_HOST, HADR_REMOTE_SVC, and HADR_REMOTE_INST configuration parameters are not set by the user, they will be automatically configured by DB2.

HADR_SYNCMODE must either be ASYNC or SUPERASYNC. If not set, a default synchronization mode of ASYNC will be assigned.

EXAMPLE

On the primary, coralpib52, run the following commands to setup HADR:

db2 “update db cfg for ACO member 0 using HADR_LOCAL_HOST coralpib52

HADR_LOCAL_SVC 60000”

db2 “update db cfg for ACO member 1 using HADR_LOCAL_HOST coralpib85

HADR_LOCAL_SVC 60000”

db2 “update db cfg for ACO using HADR_TARGET_LIST

{coralpib49:60000|coralpib83:60000} HADR_SYNCMODE ASYNC HADR_REMOTE_HOST

{coralpib49:60000|coralpib83:60000} HADR_REMOTE_INST db2aco”

On the standby, coralpib49, run the following commands to set up HADR:

db2 “update db cfg for ACO member 0 using HADR_LOCAL_HOST coralpib49

HADR_LOCAL_SVC 60000”

db2 “update db cfg for ACO member 1 using HADR_LOCAL_HOST coralpib83

HADR_LOCAL_SVC 60000”

db2 “update db cfg for ACO using HADR_TARGET_LIST

{coralpib52:60000|coralpib85:60000} HADR_SYNCMODE ASYNC HADR_REMOTE_HOST

{coralpib52:60000|coralpib85:60000} HADR_REMOTE_INST db2aco”

Changes to these parameters take effect on database activation. If the database is already online, you can have changes take effect by stopping and restarting HADR on the primary database.

5.4 Starting HADR

The START HADR command can be run on any member. The member that issues the START HADR command is designated as the preferred replay member.

Note: We recommend that you configure the preferred replay member with more CPU power and memory because only this one member on the standby replays logs. Similarly, the member selected as the preferred replay member on the primary cluster is also recommended to have more CPU and memory. So when the current primary becomes the standby after a role switch, the preferred replay member on the new standby continues to perform as well as the previous replay member.

As with non-pureScale HADR environments, the standby database must be started first. The standby database has to be in rollforward-pending or rollforward in-progress mode after restoring when the START HADR is issued on the standby. To start the HADR operation, issue the following command:

On the standby: db2 START HADR on db <db_name> as standby

Page 22: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 22

On the primary: db2 START HADR on db <db_name> as primary

5.5 Monitoring HADR

The DB2 monitoring interfaces provide details of the HADR configuration. In DB2 pureScale, there are the following two methods to gather HADR information:

MON_GET_HADR table function

db2pd command

MON_GET_HADR table function

In a DB2 pureScale environment, the monitoring table function command can only be executed from the primary database. The output displays all log streams corresponding to the primary members. On the STANDBY_MEMBER column, it always displays one member because all primary members connect to only one standby replay member.

The following example is the output of the MON_GET_HADR table function:

EXAMPLE

db2aco> db2 "select LOG_STREAM_ID, PRIMARY_MEMBER, STANDBY_MEMBER, HADR_STATE

from table (mon_get_hadr(-2))"

LOG_STREAM_ID PRIMARY_MEMBER STANDBY_MEMBER HADR_STATE

------------ ------------- ----------- ----------

1 1 0 PEER

0 0 0 PEER

LOG_STREAM_ID represents the logs and the member number. PRIMARY_MEMBER represents the member number that is currently processing and sending the logs to the standby cluster. STANDBY_MEMBER represents the standby member that is replaying the log.

From the first row of the above output, we see that the logs created by member id 1 (LOG_STREAM_ID 1)

are currently being processed by member id 1 on the primary cluster and being replayed by member 0

(STANDBY_MEMBER) in the standby cluster.

An assisted remote catchup log stream, which has HADR_STATE set to REMOTE_CATCHUP and HADR_FLAGS set to ASSISTED_REMOTE_CATCHUP, is only sent to the standby cluster by the assisting member. In the following example, the log stream created by member 1 are being sent by the assisting member,member 0, to the standby cluster.

db2aco> db2 "select LOG_STREAM_ID, PRIMARY_MEMBER, STANDBY_MEMBER, HADR_STATE,

HADR_FLAGS from table (mon_get_hadr(-2))"

LOG_STREAM_ID PRIMARY_MEMBER STANDBY_MEMBER HADR_STATE HADR_FLAGS

------------ ------------- ----------- ---------- ---------------------

1 0 0 REMOTE_CATCHUP ASSISTED_REMOTE_CATCH

0 0 0 PEER

Page 23: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 23

db2pd command

This command can be issued on either the primary or the standby database. It reports the log streams that are being processed locally on the member that issued the db2pd command. It is possible to see all log streams from all members by specifying the -allmembers option. The following example is the output of the db2pd command that was issued on member 0 of the primary and standby:

EXAMPLE

On member 0, primary database:

db2aco> db2pd -hadr -db ACO

Database Member 0 -- Database ACO -- Active -- Up 0 days 06:22:58 -- Date

2013-11-04-23.30.27.849337

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = ASYNC

STANDBY_ID = 1

LOG_STREAM_ID = 0

HADR_STATE = PEER

HADR_FLAGS =

PRIMARY_MEMBER_HOST = coralpib52

PRIMARY_INSTANCE = db2aco

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = coralpib49

STANDBY_INSTANCE = db2aco

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 11/04/2013 17:12:26.637494

(1383603146)

HEARTBEAT_INTERVAL(seconds) = 30

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 0

PEER_WAIT_LIMIT(seconds) = 0

LOG_HADR_WAIT_CUR(seconds) = 0.000

LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000008

LOG_HADR_WAIT_ACCUMULATED(seconds) = 5.154

LOG_HADR_WAIT_COUNT = 103004

SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 262088

SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 262088

PRIMARY_LOG_FILE,PAGE,POS = S0000899.LOG, 5972, 64564150489

STANDBY_LOG_FILE,PAGE,POS = S0000899.LOG, 5972, 64564150489

Page 24: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 24

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000899.LOG, 5972,

64564150489

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 11/04/2013 23:30:17.000000

(1383625817)

STANDBY_LOG_TIME = 11/04/2013 23:30:17.000000

(1383625817)

STANDBY_REPLAY_LOG_TIME = 11/04/2013 23:30:17.000000

(1383625817)

STANDBY_RECV_BUF_SIZE(pages) = 2048

STANDBY_RECV_BUF_PERCENT = 0

STANDBY_SPOOL_LIMIT(pages) = 982800

STANDBY_SPOOL_PERCENT = 5

PEER_WINDOW(seconds) = 0

READS_ON_STANDBY_ENABLED = N

From the above output, the HADR_STATE indicates that the standby database, on host coralpib49, is

currently in PEER state.

On member 0, standby database:

coralpib49:db2aco 10> db2pd -hadr -db ACO

Database Member 0 -- Database ACO -- Standby -- Up 0 days 00:03:17 -- Date 2013-

10-02-17.17.14.940473

HADR_ROLE = STANDBY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = ASYNC

STANDBY_ID = 0

LOG_STREAM_ID = 0

HADR_STATE = REMOTE_CATCHUP

HADR_FLAGS =

PRIMARY_MEMBER_HOST = coralpib52

PRIMARY_INSTANCE = db2aco

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = coralpib49

STANDBY_INSTANCE = db2aco

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 10/02/2013 17:17:00.418300 (1380748620)

HEARTBEAT_INTERVAL(seconds) = 30

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 0

PEER_WAIT_LIMIT(seconds) = 0

LOG_HADR_WAIT_CUR(seconds) = 0.000

LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000

LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000

LOG_HADR_WAIT_COUNT = 0

SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 262088

Page 25: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 25

SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 262088

PRIMARY_LOG_FILE,PAGE,POS = S0000112.LOG, 7911, 12028094245

STANDBY_LOG_FILE,PAGE,POS = S0000112.LOG, 6330, 12021652500

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000112.LOG, 6092, 12020680614

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 10/02/2013 17:18:48.000000 (1380748728)

STANDBY_LOG_TIME = 10/02/2013 17:00:27.000000 (1380747627)

STANDBY_REPLAY_LOG_TIME = 10/02/2013 17:00:10.000000 (1380747610)

STANDBY_RECV_BUF_SIZE(pages) = 2048

STANDBY_RECV_BUF_PERCENT = 1

STANDBY_SPOOL_LIMIT(pages) = 982800

STANDBY_SPOOL_PERCENT = 0

PEER_WINDOW(seconds) = 0

READS_ON_STANDBY_ENABLED = N

Page 26: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 26

6. HADR Operations

The following sections provide details on how to run basic HADR operations such as starting and stopping HADR, role switch and failover. In addition, this section covers quiescing the SAP connections to minimize the impact of a role switch or failover.

6.1 Starting HADR

The START HADR ON DATABASE <SID> command will start HADR operations. However, depending on where the command is executed, the options and the current status of the database yields different results. Furthermore, the member on which the command is executed designates that member as the current replay member.

6.1.1 Running on a Standard Database (HADR Not Enabled)

If the START HADR ON DATABASE <SID> AS PRIMARY command is run on an inactive or active standard database (HADR not enabled) :

o The standard database will be activated as the HADR primary database.

o The current member will be started and designated as the preferred replay member.

o The HADR primary database will attempt to connect to the HADR standby database. If it cannot, an error is returned and the primary database cannot be started.

o If the BY FORCE option is specified, the HADR primary database will not attempt to connect to the HADR standby database. But, it will accept a connection from a valid standby database once the standby database is available.

If the START HADR ON DATABASE <SID> AS STANDBY command is run on an inactive standard

database (HADR not enabled):

o The standard database will be activated as the HADR standby database.

o The current member is designated as the preferred replay member.

o The HADR standby database will attempt to connect to the HADR primary until a connection is successfully established.

If the START HADR on DATABASE <SID> AS STANDBY command is run on an active standard

database (HADR not enabled):

o An error message is returned. The database must be in rollforward-pending state in order to

run the START HADR …AS STANDBY command.

6.1.2 Running on the Primary Database

If the START HADR ON DATABASE <SID> AS PRIMARY command is run on an inactive primary

database:

o The HADR primary database is activated ..

o The current member is started and designated as the preferred replay member.

o The HADR primary database will attempt to connect to the HADR standby database. If it cannot, an error is returned and the primary database is not started.

Page 27: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 27

o If the BY FORCE option is specified, the HADR primary database will not attempt to connect to the HADR standby database. But, it will accept a connection from a valid standby database once the standby database is available.

If the START HADR ON DATABASE <SID> AS PRIMARY command is run on an active primary

database:

o A warning is returned since the primary is already active.

If the START HADR ON DATABASE <SID> AS STANDBY command is run on an inactive primary

database:

o After a failover, this reintegrates the failed primary into the HADR pair as the new standby database.

If the START HADR ON DATABASE <SID> AS STANDBY command is run on an active primary

database:

o An error message is returned since it is already an active primary database.

6.1.3 Running on the Standby Database

If the START HADR ON DATABASE<SID> AS PRIMARY command is run on an inactive or active

standby database:

o An error message is returned. This command cannot be issued on a standby database.

Perform a role switch to make it the primary database.

If the START HADR ON DATABASE<SID> AS STANDBY command is run on an inactive standby

database:

o The HADR standby database is started.

If the START HADR ON DATABASE <SID> AS STANDBY command is run on an active standby

database:

o A warning message is returned since this is already the active standby database.

6.2 Stopping HADR

Stopping HADR operations may sometimes be necessary in the case of dropping a member on the primary or changing the preferred replay member on the primary. Depending on where the command is executed and the current status of the database you will see different outcomes.

If you wish to just stop the database and maintain the current role of either the primary or standby database, issue the DEACTIVATE DATABASE command instead. Running the STOP HADR command will cause the database to become a standard database.

The following order of operations is the recommended procedure to shut down the HADR pair:

Page 28: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 28

1. Deactivate the primary database.

2. Stop DB2 on the primary cluster.

3. Deactivate the standby database.

4. Stop DB2 on the standby cluster.

6.2.1 Running on the Primary Database

If the STOP HADR ON DATABASE <SID> command is run on an active primary database:

o The logs stop being shipped to the standby database.

o The HADR engine dispatchable units are shut down.

o The database switches to a standard database and remains online. Transaction processing will still continue. The START HADR ON DATABASE <SID> AS PRIMARY command can subsequently be called to switch the database back to the primary database.

If the STOP HADR ON DATABASE <SID> command is run on an inactive primary database:

o The database switches to a standard database and remains offline

Note: If you want to just stop the database and maintain the current role of either the primary or standby database, issue the DEACTIVATE DATABASE command instead. Running the STOP HADR command will cause the database to become a standard database.

6.2.2 Running on the Standby Database

If the STOP HADR ON DATABASE <SID> command is run on an active standby database:

o An error message is returned. The database must first be deactivated with the DEACTIVATE DATABASE command before converting it to a standard database.

If the STOP HADR ON DATABASE <SID> command is run on an inactive standby database:

o The database switches to a standard database.

o The database is placed into rollforward-pending state.

Note: If you want to just stop the database and maintain the current role of either the primary or standby database, issue the DEACTIVATE DATABASE command instead. Running the STOP HADR command will cause the database to become a standard database.

6.3 Changing the Preferred Replay Member

When the standby database is started, the replay of the logs is performed by only one member. The member where the command START HADR ON DATABASE <SID> AS PRIMARY/STANDBY was executed is designated as the preferred replay member. The other members can be online, but they will not help in the replay of the logs.

Although the preferred replay member is designated, it is possible that the log replay will be executed by another member. The log replay will be performed by another member if the preferred replay member is not available in order to keep the standby database in peer state. If the preferred replay member becomes available, the replay will not automatically switch back to the preferred replay member and will continue processing on the current replay member.

Page 29: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 29

6.3.1 Changing the Current Replay Member

The following procedure can be used to move the replay processing to another member. This procedure will not change the designation of the preferred replay member. This may be useful if maintenance needs to be performed on the preferred replay member or the current replay member needs to be switched.

Procedure

1. Determine the current replay member by running db2pd -db <SID> -hadr on the primary cluster. The

STANDBY_MEMBER_HOST is the current replay member. In the example below, the current replay

member is host coralpib49:

Example

db2aco> db2pd -db aco -hadr

Database Member 1 -- Database ACO -- Active -- Up 13 days 04:42:16 -- Date

2013-11-25-22.44.41.945396

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = ASYNC

STANDBY_ID = 1

LOG_STREAM_ID = 1

HADR_STATE = PEER

HADR_FLAGS =

PRIMARY_MEMBER_HOST = coralpib85

PRIMARY_INSTANCE = db2aco

PRIMARY_MEMBER = 1

STANDBY_MEMBER_HOST = coralpib49

STANDBY_INSTANCE = db2aco

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 11/25/2013 22:40:29.561179

(1385437229)

HEARTBEAT_INTERVAL(seconds) = 30

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 12

2. Connect to the standby cluster and activate the database on the member that will be the current replay

member. In the example below, the database is activated on host coralpib83:

Example

coralpib83:db2aco 1> db2 activate db aco

Page 30: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 30

SQL1490W Activate database is successful, however, the database has

already

been activated on one or more nodes.

3. Stop the database on the current replay member (i.e. member 0 on coralpib49). The log replay will

automatically migrate to another member. In this example, as member 0 is stopped, the log replay will

continue on member 1, host coralpib83. If there are several members available, one will be selected,

with preference to the preferred replay member. As long as there is one online member available,

replay will continue.

Example

coralpib83:db2aco 6> db2stop 0 force

11/25/2013 22:35:25 0 0 SQL1064N DB2STOP processing was

successful.

SQL1064N DB2STOP processing was successful.

4. Check the results by running db2pd –db <SID> -hadr on the standby cluster to see that the current

replay member has switched. In the example below, the replay host is now coralpib83, which is not the

preferred replay member.

Example

db2aco> db2pd -db aco -hadr

Database Member 1 -- Database ACO -- Active -- Up 13 days 04:35:33 -- Date

2013-11-25-22.37.58.969511

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = ASYNC

STANDBY_ID = 1

LOG_STREAM_ID = 1

HADR_STATE = PEER

HADR_FLAGS = STANDBY_REPLAY_NOT_ON_PREFERRED

PRIMARY_MEMBER_HOST = coralpib85

PRIMARY_INSTANCE = db2aco

PRIMARY_MEMBER = 1

STANDBY_MEMBER_HOST = coralpib83

STANDBY_INSTANCE = db2aco

STANDBY_MEMBER = 1

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 11/25/2013 22:37:26.947200

(1385437046)

HEARTBEAT_INTERVAL(seconds) = 30

Page 31: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 31

HADR_TIMEOUT(seconds) = 120

6.3.2 Changing the Standby Cluster’s Preferred Replay Member

The following procedure can be used to change the designation of the preferred replay member on the standby cluster. The preferred replay member can only be changed with the START HADR command.

Procedure

1. On any member in the standby cluster, deactivate the database.

Example

coralpib83:db2aco 10> db2 deactivate db aco

DB20000I The DEACTIVATE DATABASE command completed successfully.

2. On the member that will be the new preferred replay member, issue the START HADR ON DATABASE

<SID> AS STANDBY command.

Example

coralpib83:db2aco 11> db2 start hadr on database aco as standby

DB20000I The command completed successfully.

6.3.3 Changing the Primary Cluster’s Preferred Replay Member

The following procedure can be used to change the designated preferred replay member on the primary cluster. That is, when the primary cluster becomes the standby during a role switch for failover, the logs will be replayed by the designated preferred replay member.

The preferred replay member can only be changed with the STOP and START HADR commands. The designation will not affect the current processing except if the primary cluster becomes the standby cluster in a role switch or failover.

Procedure

1. Issue a STOP HADR command from any member in primary cluster. This will STOP the HADR

service but will keep the database online and not affect any transactional processing.

Example

db2aco> db2 stop hadr on database aco

DB20000I The STOP HADR ON DATABASE command completed successfully.

2. On the member that will be the new preferred replay member, issue the START HADR ON

DATABASE <SID> AS PRIMARY command. The designation will only take effect during cluster

failover or role switch.

Example

db2aco> db2 start hadr on database aco as primary

DB20000I The START HADR ON DATABASE command completed successfully.

Page 32: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 32

6.4 Role Switch

Role switch is also called graceful takeover or non-forced takeover. During a role switch, the standby database and the primary database switch roles. This allows the former primary cluster to be shut down for maintenance without an outage.

To start a role switch, issue the TAKEOVER HADR command from any standby member. This could be either

the replay or non-replay members. After issuing the TAKEOVER HADR command, the primary cluster now

becomes the standby cluster, while the standby cluster becomes the primary. The database is only started on the replay member of the new primary cluster. If there is no client connection, issue an ACTIVATE

DATABASE command to start the database on other members.

After the role switch, the db2dsdriver.cfg needs to be changed. Otherwise, all the SAP applications will still attempt to connect to the former primary cluster.

The following sections describe how to perform a role switch and change the db2dsdriver.cfg.

Prerequisites

Before initiating a role switch, check the following conditions to ensure a role switch can complete successfully:

In a DB2 pureScale environment, performing a role switch is only allowed if all log streams are in one of the following conditions:

Peer state

Remote Catchup (RCU) state if the synchronization mode is SUPERASYNC

Assisted RCU (ARCU) regardless of the sync mode

It is recommended to check the log gap between the primary and standby clusters. Ensure a small log gap before issuing TAKEOVER HADR in RCU or assisted RCU state to avoid a longer takeover time.

The primary members must not be in member crash recovery (MCR) pending or in progress.

Role switch is not allowed during group crash recovery, otherwise the failover is necessary.

All the SAP application servers can ping the servers of the standby cluster.

Procedure

1. Check the HADR state and log gap between the primary and the standby clusters by issuing db2pd

as follows:

db2pd –hadr –db <db_name>

2. Either shut down the SAP application servers or dismantle the SAP connections (see SAP Note

1434153). This is necessary in order for the changes to the db2dsdriver.cfg to take effect. See the

example below for more details.

3. If the prerequisites are fulfilled, issue the TAKEOVER HADR command on any standby member:

db2 takeover hadr on db <db_name>

The primary now becomes the standby.

Page 33: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 33

4. Edit the db2dsdriver.cfg. The <server> xml tag needs to be updated to have the host names of the

servers in the new primary cluster. See the example below for more details.

5. Start SAP or force a reload of the db2dsdriver.cfg file. See the example below for more details.

6. If maintenance or upgrade activity is necessary on the new standby, perform the following:

a. Deactivate its database:

db2 deactivate db <db_name>

b. Stop the instance on the new standby cluster:

db2stop

c. Perform the upgrade or maintenance activity.

d. After completing the maintenance operations, start the instance and activate the

database on the new standby so that log replay will continue.

The following example contains outputs taken from db2pd on the test systems during a role switch as well as editing db2dsdriver.cfg

EXAMPLE

Since a role switch is going to be performed, the client affinity file db2dsdriver.cfg will need to be changed. The SAP application servers will need to reload this file. Either shut down all the SAP application servers prior to the takeover, or dismantle their connections as described in SAP Note 1434153 – “DB6: Dismantling DB connections for short maintenance tasks.” In this example, the connections will be dismantled.

Before performing a role switch, create the empty file db6_dbsl_quiesce_def_connection in

the directory DIR_GLOBAL:

touch /usr/sap/<SID>/SYS/global/db6_dbsl_quiesce_def_connections

Use transaction RZ11 to change the value of any DBSL profile parameter that can be changed dynamically, for example:

dbs/db6/dbsl_trace=0

The above parameter can be changed dynamically and will not change the default value. This change will trigger all the work processes to check for the above file db6_dbsl_quiesce_def_connections. If the file is found, the connections will begin to be quiesced.

On standby member 0, issue the TAKEOVER HADR command:

coralpib49:db2aco 105> db2 takeover hadr on db ACO

DB20000I The TAKEOVER HADR ON DATABASE command completed

successfully.

The standby cluster becomes the new primary cluster and vice versa after the replay member on the old standby finishes replaying all received logs from the old primary. In addition, the HADR state changes to PEER state.

coralpib49:db2aco 10> db2pd -hadr -db ACO

Page 34: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 34

Database Member 0 -- Database ACO -- Active -- Up 0 days 00:11:39 -- Date

2013-1

1-01-00.50.41.656854

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = ASYNC

STANDBY_ID = 1

LOG_STREAM_ID = 0

HADR_STATE = PEER

HADR_FLAGS =

PRIMARY_MEMBER_HOST = coralpib49

PRIMARY_INSTANCE = db2aco

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = coralpib52

STANDBY_INSTANCE = db2aco

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 11/01/2013 00:48:32.368765

(1383281312)

...

PRIMARY_LOG_FILE,PAGE,POS = S0000894.LOG, 1130, 64210591932

STANDBY_LOG_FILE,PAGE,POS = S0000894.LOG, 1130, 64210591354

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000894.LOG, 1130, 64210591354

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 11/01/2013 00:50:18.000000

(1383281418)

STANDBY_LOG_TIME = 11/01/2013 00:50:18.000000

(1383281418)

STANDBY_REPLAY_LOG_TIME = 11/01/2013 00:50:18.000000

(1383281418)

STANDBY_RECV_BUF_SIZE(pages) = 2048

STANDBY_RECV_BUF_PERCENT = 0

STANDBY_SPOOL_LIMIT(pages) = 982800

STANDBY_SPOOL_PERCENT = 0

PEER_WINDOW(seconds) = 0

READS_ON_STANDBY_ENABLED = N

Page 35: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 35

After the takeover has completed, before SAP can connect, the client affinity file (db2dsdriver.cfg file) needs to be changed. Generally, this file can be found in the /usr/sap/<SID>/SYS/global/db6 directory.

Edit the file and change the <server> XML tags to reflect the new primary cluster’s servers.

<alternateserverlist>

<server name="server_1" hostname="coralpib52.torolab.ibm.com"

port="5912" />

<server name="server_2" hostname="coralpib85.torolab.ibm.com"

port="5912" />

</alternateserverlist>

Change the above server tags to have the new primary cluster members’ host name.

<alternateserverlist>

<server name="server_1" hostname="coralpib49.torolab.ibm.com"

port="5912" />

<server name="server_2" hostname="coralpib83.torolab.ibm.com"

port="5912" />

</alternateserverlist>

Resume the SAP connections by removing the file db6_dbsl_quiesce_def_connections from the directory DIR_GLOBAL:

rm /usr/sap/<SID>/SYS/global/db6_dbsl_quiesce_def_connections

The SAP work processes will automatically connect to the database according to the affinity list specified in the db2dsdriver.cfg file.

6.5 Failover

If the entire primary cluster fails, for example, in the case of a disaster, an HADR failover is necessary. Failover means a switch of roles by force.

To perform a failover, issue the TAKEOVER HADR command with the BY FORCE option from any standby member. As with role switch, the standby becomes the new primary after the command has been issued. However, the role of the old primary cluster is not changed and the entire old primary cluster is disabled if there are any remaining connections to the standby cluster to prevent the split brain.

Prerequisites

Forced takeover is allowed only if all log streams are in one of the following states: RCU pending, RCU, ARCU, PEER, or DISCONNECTED PEER state. Note that the restriction of failover when log archive retrieval is in progress is removed in this release (DB2 10.5).

Page 36: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 36

Databases must not be in local catchup state (LCU), otherwise the failover will not proceed.

The SAP application servers must be able to ping the members of the standby cluster.

Procedure

1. Shut down all the SAP application servers.

2. Issue the TAKEOVER HADR command with the BY FORCE option on any standby member:

db2 takeover hadr on db <db_name> by force

The standby now becomes the primary, while the old primary remains in the same role, but is offline.

3. Edit the db2dsdriver.cfg that is generally found in the /usr/sap/<SID>/SYS/global/db6. The <server>

xml tag needs to be updated to have the host names of the servers in the new primary cluster. See

the example below for more details.

The example below contains the output of db2pd taken on the new primary after issuing the takeover by

force command.

After the old primary cluster has been repaired, reintegrate it as the new standby by issuing the db2start

hadr on db <db_name> as standby, followed by the db2 start command. Only do this if there was

no data loss during the failover, that is, if the log streams did not diverge between the primary and standby. Otherwise it may be required to re-create a new standby database from a backup or split mirror of the new primary database.

EXAMPLE

Stop all SAP application servers.

Issue the TAKEOVER BY FORCE COMMAND on the standby cluster.

coralpib49:db2aco 105> db2 takeover hadr on db ACO by force

DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.

coralpib49:db2aco 21> db2pd -hadr -db ACO -allmembers

Database Member 0 -- Database ACO -- Active -- Up 0 days 01:29:18 -- Date 2013-

11-05-13.14.39.358497

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = ASYNC

STANDBY_ID = 1

LOG_STREAM_ID = 0

HADR_STATE = DISCONNECTED

HADR_FLAGS =

PRIMARY_MEMBER_HOST = coralpib49

Page 37: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 37

PRIMARY_INSTANCE = db2aco

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = coralpib52

STANDBY_INSTANCE = db2aco

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = DISCONNECTED

HADR_CONNECT_STATUS_TIME = 11/05/2013 13:13:58.247380 (1383675238)

HEARTBEAT_INTERVAL(seconds) = 30

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 0

PEER_WAIT_LIMIT(seconds) = 0

LOG_HADR_WAIT_CUR(seconds) = 0.000

LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000

LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000

LOG_HADR_WAIT_COUNT = 0

SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384

SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384

PRIMARY_LOG_FILE,PAGE,POS = S0000901.LOG, 0, 64673337710

STANDBY_LOG_FILE,PAGE,POS = S0000000.LOG, 0, 0

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000000.LOG, 0, 0

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 11/05/2013 13:15:10.000000 (1383675310)

STANDBY_LOG_TIME = NULL

STANDBY_REPLAY_LOG_TIME = NULL

PEER_WINDOW(seconds) = 0

coralpib49:db2aco 22> db2 "select LOG_STREAM_ID, PRIMARY_MEMBER, STANDBY_MEMBER,

HADR_STATE from table (mon_get_hadr(-2))"

LOG_STREAM_ID PRIMARY_MEMBER STANDBY_MEMBER HADR_STATE

------------- -------------- -------------- -----------------------

0 0 0 DISCONNECTED

1 record(s) selected.

Edit the file and change the <server> XML tags to reflect the new primary cluster’s servers.

<alternateserverlist>

Page 38: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 38

<server name="server_1" hostname="coralpib52.torolab.ibm.com" port="5912" />

<server name="server_2" hostname="coralpib85.torolab.ibm.com" port="5912" />

</alternateserverlist>

Change the above server tags to have the new primary cluster’s members’ hostname.

<alternateserverlist>

<server name="server_1" hostname="coralpib49.torolab.ibm.com" port="5912" />

<server name="server_2" hostname="coralpib83.torolab.ibm.com" port="5912" />

</alternateserverlist>

Start SAP.

Page 39: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 39

7. DB2 pureScale Topology Changes With HADR

DB2 pureScale provides an extreme scalability that allows adding or removing a member in the cluster to meet the business needs. Along with the flexibility of DB2 pureScale, HADR for DB2 pureScale continues to support this great feature.

Depending on the topology change, an outage may be necessary in the following cases:

Members are added online.

Members are dropped. Dropping members requires an outage and reinitialization of the standby cluster.

CFs are added, which requires an outage.

7.1 Adding Members

Generally the primary and standby clusters must have the same number of members. If a member is added to the primary cluster and activated, there will be an add member log record. If the standby does not have the corresponding member when it replays the add member log record, the standby database will be shut down. Therefore, it is best to add the member on the standby prior to adding the member on the primary.

Procedure

1. On a host that already belongs to the standby cluster, run the following as root user:

db2iupdt –add –m <hostname> -mnet <member_net_name> -mid

<member_id> db2<sid>

2. Update the member-specific database configuration parameters on the standby cluster:

db2 update database configuration for <SID> member <member_id> using

hadr_local_host <standby_member_host>

db2 update database configuration for <SID> member <member_id> using

hadr_local_svc <standby_member_port>

3. On a host that already belongs to the primary cluster, run the following as root user:

db2iupdt –add –m <hostname> -mnet <member_net_name> -mid <member_id>

db2<sid>

Where <member_id> is the same as the <member_id> specified when adding the member to the

standby cluster 4. Update the member-specific database configuration parameters on the primary cluster:

db2 update database configuration for <SID> member <member_id> using

hadr_local_host <primary_member_host>

db2 update database configuration for <SID> member <member_id> using

hadr_local_svc <primary_member_port>

5. Update the db2dsdriver.cfg file, which normally can be found in the /usr/sap/<SID>/SYS/global/db6

directory, to include the new members by adding an additional <server> tag for the new member.

Also edit the <list> attributes to include the new server in the affinity list as shown in the below

example.

<alternateserverlist>

<server name="server_1" hostname="coralpib52.torolab.ibm.com" port="5912" />

<server name="server_2" hostname="coralpib85.torolab.ibm.com" port="5912" />

Page 40: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 40

<server name=”NEW_SERVER” hostname=”NEW_HOST.torolab.ibm.com” port=”5912/>

</alternateserverlist>

<affinitylist>

<list name="list1" serverorder="server_1,server_2,NEW_SERVER" />

<list name="list2" serverorder=server_2,server_1,NEW_SERVER" />

</affinitylist>

6. In order for the application servers to see the changes, the file will need to be reloaded. This can be

done by either restarting all the application servers, or dynamically reloading the client affinity file.

The client affinity file can be dynamically reloaded in one of the following ways:

Start a DB2 pureScale-enabled DBA Cockpit and go to DB2 pureScale Feature Client Affinity. On the Connected Clients screen, choose the Reload Client Affinity File on all Application Servers button as shown in the following figure:

Alternatively, run transaction RZ11 and change the value of parameter “dbs/db6/dbsl_dyn_notification” to “RELOAD_DB2DSDRIVER_FILE”. Select “Switch on all

Page 41: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 41

servers” for all application servers to reload the db2dsdriver.cfg file. The following will be displayed in the developer traces on a successful reload:

C Wed Nov 6 12:07:04 2013

C Profile parameter SWITCH:

dbs/db6/dbsl_dyn_notification='RELOAD_DB2DSDRIVER_FILE'

C Triggering reload of db2dsdriver.cfg file.

C SQLReloadConfig returned with rc=0

7. Activate the new member by either connecting to the database or explicitly activating the database

from the new member. If the member has not been added to the standby by the time it replays the

add member log as a result of the database activation on the primary, the standby database will shut

down.

7.2 Dropping Members

You cannot drop a member on the standby. If a member is dropped on the primary, the standby cluster will need to be reinitialized from a backup. Note that the primary will not be highly available during this process.

Procedure

On the primary cluster: 1. Stop HADR using the STOP HADR command.

2. Stop the DB2 pureScale instance by issuing db2stop.

3. Run the following command as root user from a host that will remain in the primary cluster:

db2iupdt –drop –m <member_host_name> db2<sid>

4. Take a database offline backup.

On the standby cluster: 1. Deactivate the database.

2. Drop the database.

3. Run the following command as root user from a host that will remain in the standby cluster:

db2iupdt –drop –m <member_host_name> db2<sid>

4. Reinitialize the standby by restoring the backup of the primary.

5. Start HADR on the standby.

6. Start HADR on the primary.

Page 42: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 42

8. HADR Rolling Update

Similar to a non-HADR environment, DB2 Fix Pack updates can be applied online. In an HADR environment, the standby cluster is updated prior to updating the primary cluster.

On the standby cluster, the secondary CF is updated first, followed by the primary CF. To help minimize the interruption of the standby log replay, the current replay member should be the last member updated.

On the primary cluster, the secondary CF is updated first, followed by the primary CF. The members can be updated in any order.

After applying the Fix Pack update on both the standby and primary clusters successfully, the update has to be committed. The commit should run first on the standby cluster, followed by the primary cluster.

Process Overview:

1. Check the rolling update status.

2. Update the Standby Cluster:

a. Install the DB2 Fix Pack update on the secondary CF.

b. Install the DB2 Fix Pack update on the primary CF.

c. Install the DB2 Fix Pack update on the members.

d. Check the update.

3. Update the Primary Cluster:

a. Install the DB2 Fix Pack update on the secondary CF.

b. Install the DB2 Fix Pack update on the primary CF.

c. Install the DB2 Fix Pack update on the members.

d. Check the update.

4. Commit the update:

a. Commit the update on the standby cluster.

b. Commit the update on the primary cluster.

8.1 Checking the Rolling Update Status

On both the standby and primary clusters, run the following command to display the rolling update status and to see the current state, version, and level of the clusters:

db2pd –ruStatus

The example output below shows that there is no online update running and the cluster is using DB2 10.5 FP1.

EXAMPLE

coralpib49:db2aco 2> db2pd -ruStatus

Page 43: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 43

ROLLING UPDATE STATUS: Disk Value

Memory Value

Record Type = INSTANCE

ID = 0

Code Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) Not

Applicable

Architecture Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) Not

Applicable

State = [NONE]

Last updated = 2013/10/29:13:48:01

Record Type = MEMBER

ID = 0

Code Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

CECL = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

Architecture Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

CEAL = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

Section Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

State = [NONE]

Last updated = 2013/10/29:13:47:51

coralpib49.torolab.ibm.com: db2pd -ruStatus -localhost ... completed ok

Record Type = MEMBER

ID = 1

Code Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

CECL = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

Architecture Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

CEAL = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

Section Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

State = [NONE]

Page 44: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 44

Last updated = 2013/10/29:14:28:02

coralpib83.torolab.ibm.com: db2pd -ruStatus -localhost ... completed ok

Record Type = CF

ID = 128

Code Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) Not

Applicable

Architecture Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) Not

Applicable

State = [NONE]

Last updated = 2013/10/29:13:47:52

Record Type = CF

ID = 129

Code Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) Not

Applicable

Architecture Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) Not

Applicable

State = [NONE]

Last updated = 2013/10/29:14:50:24

The example output of db2pd –ruStatus below shows that there is an online update currently running.

Note that the state has changed to UPDATE IN PROGRESS and that one of the members and two of the CFs are running DB2 10.5 FP3.

EXAMPLE

coralpib83:db2aco 1> db2pd -ruStatus

ROLLING UPDATE STATUS: Disk Value

Memory Value

Record Type = INSTANCE

ID = 0

Code Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) Not

Applicable

Architecture Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) Not

Applicable

State = [UPDATE IN PROGRESS]

Last updated = 2013/10/29:13:48:01

Page 45: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 45

Record Type = MEMBER

ID = 0

Code Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

CECL = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

Architecture Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

CEAL = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

Section Level = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

State = [NONE]

Last updated = 2013/10/29:13:47:51

coralpib49.torolab.ibm.com: db2pd -ruStatus -localhost ... completed ok

Record Type = MEMBER

ID = 1

Code Level = V:10 R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000) V:10

R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000)

CECL = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

Architecture Level = V:10 R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000) V:10

R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000)

CEAL = V:10 R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000) V:10

R:5 M:0 F:1 I:0 SB:0 (0x0A05000100000000)

Section Level = V:10 R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000) V:10

R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000)

State = [NONE]

Last updated = 2013/11/12:13:57:36

coralpib83.torolab.ibm.com: db2pd -ruStatus -localhost ... completed ok

Record Type = CF

ID = 128

Code Level = V:10 R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000) Not

Applicable

Architecture Level = V:10 R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000) Not

Applicable

State = [NONE]

Last updated = 2013/11/12:12:20:53

Page 46: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 46

Record Type = CF

ID = 129

Code Level = V:10 R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000) Not

Applicable

Architecture Level = V:10 R:5 M:0 F:3 I:0 SB:0 (0x0A05000300000000) Not

Applicable

State = [NONE]

Last updated = 2013/11/12:11:51:13

8.2 Updating the Standby Cluster

When updating the standby cluster, install the DB2 Fix Pack update on the secondary CF first, followed by the primary CF. After updating the CFs, update the members. To minimize the interruption of the standby log replay, the current replay member should be updated last.

8.2.1 Installing the DB2 Fix Pack Update on the Secondary CF

As instance owner db2<sid>, check to see which is the primary and which is the secondary CF by

running db2instance –list.

As root user, run the following command on the secondary CF to install the new DB2 Fix Pack update online:

<media-dir>/installFixPack –p <new-install-path> -I db2<sid> -online –l

<logfile>

where <media-dir> is the uncompressed installation image of the new Fix Pack. <new-

install-path> is the full path to where the new Fix Pack will be installed.

The following will be displayed, showing the current and target levels.

EXAMPLE

DB2 pureScale online update evaluation:

=======================================

Hostname:coralpib84

Instance name:db2aco

Target Installation path:/opt/IBM/db2/V10.5FP3

Target Code level = Version:10 Release:5 Modification:0 Fixpack:3

Target Architecture level = Version:10 Release:5 Modification:0 Fixpack:3

TSA version installed on this host : 3.2.2.5

TSA version present on the media : 3.2.2.5

Page 47: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 47

TSA version after update : 3.2.2.5

GPFS version installed on the host: 3.5.0.7

GPFS version present on the media : 3.5.0.7

GPFS version after update : 3.5.0.7

8.2.2 Installing the DB2 Fix Pack Update on the Primary CF

After the new DB2 Fix Pack has been successfully installed on the secondary CF, the update can now be installed on the primary CF.

1. Prior to installing the Fix Pack update on the primary CF, stop the primary CF by logging in as the

instance owner on the secondary CF and running the following command:

db2stop CF <primary CF id>

2. Run the installation of the new Fix Pack on the primary CF as root user by running the following

command:

<media-dir>/installFixPack –p <new-install-path> -I db2<sid> -l <log-file>

Where <media-dr> is the uncompressed DB2 installation image of the new fixpack. <new-

install-path> is the full path to where the fixpack will be installed.

8.2.3 Installing the DB2 Fix Pack Update on the Members

Once the CFs have the new DB2 Fix Pack installed, the update can now be installed on the members. Be sure to update the current replay member last in order to minimize the interruption of the standby log replay.

Run the following as root user on all the members:

<media-dir>/installFixPack –p <new-install-path> -I db2<sid> -l <log-file>

where <media-dr> is the uncompressed DB2 installation image of the new Fix Pack. <new-install-

path> is the full path to where the Fix Pack will be installed.

8.2.4 Checking the Update

After installing the update on all the CFs and members in the standby cluster, run the following command to check the status of the installations. If there are any errors, fix them before installing the update on the primary cluster.

<media-dir>/installFixPack –check_commit –p <new-install-path> -I db2<sid>

where <media-dir> is the uncompressed installation image of the new fix pack. <new-install-path> is

the full path to where the new fix pack is installed.

EXAMPLE

DBI1446I The installFixPack command is running.

Page 48: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 48

The pre-commit verification process for an online fix pack update has

started....

The checks for the pre-commit verification process have been completed

successfully.

If you perform a commit, the new level will be =

Version:10 Release:5 Modification:0 Fixpack:3

The execution completed successfully.

For more information see the DB2 installation log at "check_fp3.out".

DBI1070I Program installFixPack completed successfully.

8.3 Updating the Primary Cluster

Update the primary cluster after the standby cluster has been updated. The process is very similar to updating the standby cluster, the difference being that the primary CF does not have to be shut down and any member can be updated without affecting the log replay.

8.3.1 Installing the DB2 Fix Pack Update on the Secondary CF

Similar to the standby cluster, the secondary CF needs to be updated first.

Run the following command on the secondary CF as a root user:

<media-dir>/installFixPack –p <new-install-path> -I db2<sid> -online –l <log

file>

8.3.2 Installing the DB2 Fix Pack Update on the Primary CF 1. Prior to running the update on the primary CF, ensure that the secondary CF is in peer state. To

check, run the following command as db2<sid> and check if the “STATE” column of the primary CF

displays “PEER”.

db2instance –list

2. Once the secondary is in peer state, run the following command as root user to install the Fix Pack

update on the primary CF:

<media-dir>/installFixPack –p <new-install-path> -I db2<sid> -l <log-file>

8.3.3 Installing the DB2 Fix Pack Update on the Members

Unlike the update of the standby cluster, running the update on the member will not affect the log replay on the standby cluster, so the members can be updated in any order.

Run the following command as root user on each of the members to install the Fix Pack update:

<media-dir>/installFixPack –p <new-install-path> -I db2<sid>-l <log-file>

Page 49: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 49

8.3.4 Checking the Update

Before committing the changes, check to make sure the Fix Pack update was installed successfully on all CFs and members by running the following command:

<media-dir>/installFixpack –check_commit –p <new-install-path> -I db2<sid> -l

<log-file>

EXAMPLE

DBI1446I The installFixPack command is running.

The pre-commit verification process for an online fix pack update has

started....

The checks for the pre-commit verification process have been completed

successfully.

If you perform a commit, the new level will be =

Version:10 Release:5 Modification:0 Fixpack:3

The execution completed successfully.

For more information see the DB2 installation log at "/tmp/check_upg.out".

DBI1070I Program installFixPack completed successfully.

8.4 Committing the Update

Although the Fix Pack update has been installed on the clusters, it is still using the old level. In order to use the newly installed Fix Pack, they will have to be committed. First commit the update on the standby cluster followed by committing the update on the primary cluster.

8.4.1 Committing the Update on the Standby Cluster 1. Commit the update on the standby cluster by running the following command:

<media-dir>/installFixPack -commit_level -p FP-install-path -I db2<sid> -l

log-file-name

2. After running the commit, check the status by using the following command:

db2pd -ruStatus

8.4.2 Committing the Update on the Primary Cluster 1. Commit the update on the primary cluster by running the following command:

<media-dir>/installFixPack -commit_level -p FP-install-path -I db2<sid> -l

log-file-name

2. After running the commit, check the status by using the following command:

db2pd -ruStatus

Page 50: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 50

9. Maintenance and Troubleshooting

9.1 Maintaining the Database Configuration on the

Standby

In most cases, both the primary and standby systems should have the same values for the database and the the database manager configuration parameters to avoid an error message returned during HADR operations or performance problems on the new primary (the former standby). However, updating a configuration parameter on the primary database does not automatically update on the standby. Therefore, you have to manually change the configuration parameters on the standby to match the values on the primary database.

Depending on the DB2 database configuration, changing the configuration parameters on the standby database can take effect immediately (dynamic parameters) or require the database to be restarted (static parameters). If a parameter is documented as “Configurable Online” in the DB2 Information Center, it is a dynamic parameter.

For an HADR parameter (see section “Configure the Primary and Standby” for HADR parameters), “STOP HADR” and “START HADR” commands can dynamically load newly configured values on the primary while the primary database remains online. On the standby, refreshing the effective value of HADR configuration parameters can be done by restarting HADR. However, the STOP HADR command is required to bring down the database on the standby.

9.2 Checking Tablespace States on the Standby After

Load Operations

A tablespace on the standby database can be marked as invalid and inaccessible after performing load operations on the primary database in the following cases :

The LOAD command with the COPY YES parameter on the primary database and the path for load copy data cannot be accessed from the standby database.

The LOAD command with the NONRECOVERABLE parameter on the primary database will lead to an inconsistent tablespace state on the standby database.

The LOAD command with the COPY NO parameter will automatically convert to LOAD with NONRECOVERABLE on the primary so that the tablespace on the standby database will also be marked as inconsistent.

An offline, not logged table move or conversion with DB6CONV will call LOAD with the NONRECOVERABLE option.

Therefore, it is recommended to perform LOAD with the COPY YES option and make sure the path or device containing load data is accessible to the standby database. In addition, it is necessary to perform tablespace checks on the standby database during and after load operations to make sure it remains in normal state. The following command can be used to check the tablespace state on the standby:

db2pd –tablespaces –db <dbname>

The following example is the output of db2pd with tablespace options. The tablespace state can be concluded from the “State” field in the “Tablespace Statistics” table. The “0x0000000” state indicates it is in normal state.

EXAMPLE

Page 51: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 51

coralpib49:db2aco 7> db2pd -tablespaces -db ACO

Tablespace Statistics:

Address Id TotalPgs UsablePgs UsedPgs PndFreePgs FreePgs

HWM Max HWM State MinRecTime NQuiescers PathsDropped

TrackmodState

0x0A000200A6177300 0 36864 36856 36802 0 54

36802 36802 0x00000000 0 0 No

n/a

0x0A000200A61844C0 1 118488 118480 108184 4 10292

113224 113224 0x00000000 1385496979 0 No

n/a

0x0A000200A6191680 2 3200 3192 1524 0 1668

1826 1826 0x00000000 1389727624 0 No

n/a

. . . . . .

. . . .

. .

9.3 Automatic HADR Congestion Detection

Congestion usually happens when the standby is not able to receive data or cannot replay log pages fast enough, but the primary database continues sending log pages to the standby database until the network pipeline becomes full and no additional log pages can be sent.

HADR congestion can be detected, and when there is enough congestion, the diagnostic data is automatically collected to help investigate the congestion problem. The following command can be used for automatic HADR congestion detection and data collection:

db2fodc –hadr –db <dbname> -detect <detect_suboptions>

If no <detect_suboptions> is specified, the default suboptions and their values are assigned automatically as in the following example. In the example, the HADR congestion detection is run for one iteration. For each

iteration, db2pd –hadr –db ACO will be run every 30 seconds (interval=30) to check if the

HADR_CONNECT_STATUS field is “CONGESTED”. If 10 consecutive “CONGESTED” statuses are detected (triggercount=10), it will automatically invoke diagnostic data collection. If no congestion is

detected, keep checking until the congestion detection is turned off by using the command db2fodc –detect off.

EXAMPLE

db2aco> db2fodc -hadr -db ACO -detect

"db2fodc": Starting detection ...

db2fodc HADR congestion detect rules:

iteration=1 sleeptime=0(sec) triggercount=10 interval=30(sec) duration=-1(hour)

db2fodc:

Page 52: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 52

Hostname: coralpib52 HADR congestion detect iteration: 1

9.4 Failed HADR Start

An HADR start failure is a very common error in an HADR setup. There are a few common causes for a failing start of HADR:

The standby database is unavailable.

db2level mismatch

Duplicate port number

The standby database is not in rollforward-pending mode or rollforward in-process mode.

Missing configuration of one or more HADR parameters.

HADR_TIMEOUT

9.4.1 The Standby Database is Unavailable

Starting HADR on the primary will fail with SQL1768N, reason code = 7 if the primary database cannot

establish a connection to the HADR standby database within the time specified in the hadr_timeout

database configuration parameter. The following reasons can lead to this error:

Network issues

Ensure the machines on the primary cluster can communicate with the machines on the standby cluster. To verify the network, make sure the machine from the primary and standby clusters can ping each other using the following command:

ping <hostname>

Standby database is not active

Ensure “START HADR” as standby completed successfully before running “START HADR” as

primary. To verify, run db2pd –db <database_name> -hadr on the standby cluster. The

following example shows that the standby database is active:

EXAMPLE

coralpib49:db2aco 5> db2pd -hadr -db ACO

Database Member 0 -- Database ACO -- Standby -- Up 22 days 09:41:18 --

Date 2014-01-14-17.12.40.021384

9.4.2 Mismatch in db2level

The “START HADR” could fail due to a different DB2 Version and Fix Pack installed on the primary and the standby. To check if the DB2 level matches between the primary database and the standby database, run the “db2level” command on each cluster.

9.4.3 Duplicate Port Number

In an HADR setup, you have to reserve a port number for HADR communication. If the port number specified for the hadr_local_svc database configuration parameter is used by another process, “START HADR” will fail

Page 53: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 53

with SQL1768N reason code = 5. To ensure that no duplicate port number is used, check /etc/services (Unix) or %SystemRoot%\system32\drivers\etc\services (Windows) to see if there are duplicate entries of the same port number.

9.4.4 The Standby Database is not in Rollforward-Pending or Rollforward-in-Process Mode

When creating the standby database from the primary backup image, the standby needs to be in rollforward-pending or rollforward-in-process mode as a result of a restore or split mirror. Otherwise, “START HADR” as a standby will return the error SQL1767N, reason code = 1.

9.4.5 HADR_TIMEOUT

HADR_TIMEOUT is the amount of time (in seconds) that the primary or the standby will wait for its partner databases before considering failed connection. If HADR_TIMEOUT is set too short, HADR will give false alarms. Therefore, we recommend that you set HADR_TIMEOUT to a minimum of 120 seconds.

9.5 Recover from Failed Takeover

To bring back service availability when the HADR takeover fails, depending on the type of takeover (forced or non-forced) and whether the role has already been changed, see the following document for the necessary recovery steps:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20takeover?section=How_to_recover_from_failed_takeover

9.6 Failed Primary Reintegration

The primary reintegration is the action of rejoining the old primary as a standby when its problem is resolved and it is back online after a failover. The primary reintegration is performed by issuing “START HADR” as a standby on the old primary. Even if the command has completed successfully, it does not mean the primary

reintegration has succeeded. If the output of “db2pd –hadr –db <dbname>” shows the new standby to be

inactive, check the db2diag.log for related error messages. If the error message in the db2diag.log indicates incompatible log streams between the primary and the standby, you have to perform the following procedure to bring the old primary back to service as a standby. During this time, the system will not be highly available.

1. Drop the database on the old primary as follows:

db2 drop db <dbname>

2. Delete archive logs on the old primary.

3. Perform an online backup from the new primary:

db2 backup db <dbname> online

4. Restore the database on the new standby from the backup taken above and update the HADR

configuration for the new standby as described in section 5.3.

db2 restore db <dbname> from <backupdir>

5. Start HADR on the new standby:

db2 start hard on db <dbname> as standby

Page 54: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 54

9.7 HADR Data Collection for Support

If you encounter an HADR problem, you can manually collect HADR-related data for further investigation.

For general HADR problems, collect the following data: o Error message returned to console

o Steps taken to reproduce the problem

o Collect db2support data for both the primary and standby according to SAP Note 83819.

o Output of db2pd –hadr –db <dbname> on both the primary and the standby

o Output of the MON_GET_HADR table function on the primary

If you suspect congestion in HADR, you can execute the following command to collect the data:

db2fodc –hadr –db <dbname>

For details and examples on how to run the above command on both the primary and standby, visit the IBM Information Center at

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.trb.doc/doc/r0060755.html.

Page 55: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

SAP COMMUNITY NETWORKSDN - sdn.sap.com | BPX - bpx.sap.com | BA - boc.sap.com |

© 2014 SAP AG 55

Conclusion

DB2 pureScale provides your SAP landscape with continuous availability. Since a DB2 pureScale system consists of several members, the workload can be distributed to the other members in the event of a planned or unplanned outage. If a member or CF is unavailable, the other members or CFs in the cluster can take over the workload, providing continuous availability.

HADR provides your landscape with disaster protection. With HADR, the database transaction logs are shipped to a standby database. The standby database will replay the logs on its database, providing a current and up-to-date copy of the primary database. In the event of an outage, planned or unplanned, the standby database can easily take over and provide your end-users with continuous availability. Prior to DB2 10.5, this feature was not available for DB2 pureScale systems.

As of DB2 10.5, the HADR feature is available for DB2 pureScale, providing your landscape enhanced availability. The members of the primary DB2 pureScale cluster will ship their logs to the preferred member of the standby cluster. This member will replay all the logs, keeping the standby database current. If the entire primary cluster is not available, due to a planned or unplanned outage, the standby DB2 pureScale cluster can easily take over and provide your end users with continuous availability.

From an SAP perspective, the setup of HADR is simple. After creating the standby cluster from a database backup, the db2dsdriver.cfg file has to be modified to include the secondary cluster's members. As with a regular DB2 pureScale system, when a member in the primary cluster goes down, the connections are re-routed to the next member in the client affinity list. When the entire primary cluster is not available and a takeover is necessary, the connections will be re-routed to the members of the standby. In general, connections to the standby are not permitted since the database is in roll-forward pending status. However, during a takeover, the standby becomes the primary and the SAP work processes can connect.

As with a regular DB2 pureScale system, Fix Pack updates can now be applied online, minimizing the need for further outages during maintenance. SAP can quiesce the SAP work processes while a roll switch or takeover is happening. This prevents the SAP ABAP applications from exiting, reducing the outage seen by the end-user.

Therefore, HADR for DB2 pureScale easily integrates with SAP and provides your landscape with high availability and protection against planned and unplanned outages.

Page 56: Enable IBM DB2 High Availability Disaster Recovery …a248.g.akamai.net/n/248/420835/3955986a4a360252bed0cfecf4b0191829...Enable IBM DB2 High Availability Disaster Recovery (HADR)

Copyright

© Copyright 2014 SAP AG. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.

Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

Microsoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation.

IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise Server, PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5, POWER, OpenPower, PowerPC, BatchPipes, BladeCenter, System Storage, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA, AIX, Intelligent Miner, WebSphere, Netfinity, Tivoli and Informix are trademarks or registered trademarks of IBM Corporation.

Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.

Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries.

Oracle is a registered trademark of Oracle Corporation.

UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.

Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc.

HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.

Java is a registered trademark of Sun Microsystems, Inc.

JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape.

SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP Business ByDesign, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries.

Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects S.A. in the United States and in other countries. Business Objects is an SAP company.

All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.

These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.


Recommended