+ All Categories
Home > Documents > Enabling Database High Availability Using DB2 HADR and IBM ...

Enabling Database High Availability Using DB2 HADR and IBM ...

Date post: 29-Nov-2021
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
63
1 Enabling Database High Availability Using DB2 HADR and IBM Tivoli SA MP in an SAP Environment Applies to: SAP NetWeaver 7.0 or higher on DB2 10.1 or higher for Linux, UNIX, and Windows. Summary Multiple improvements have been made to the DB2 High Availability Disaster Recover (HADR) feature. DB2 Version 10.1 for Linux, UNIX, and Windows supports multiple standbys providing customers with true database disaster recovery (DR) capability along with high availability (HA). IBM DB2 BLU Acceleration is also supported for HADR environments as of DB2 10.5 FP4. This paper describes the new features related to HADR as well as provides examples to enable single or multiple standbys in an SAP environment to achieve DR in HA systems. Authors: Ali Mehedi, Catherine Vu, Edgar Maniago Company: IBM Canada Inc. and SAP Canada Inc. Created on: 21 November 2014 Author Bio Ali Mehedi is a Software Developer at IBM with years of experience in test tools development, DB2 Administration, SAP NetWeaver installation, configuration and maintenance, and DB2 for LUW and SAP integration. He is a certified DBA of DB2 for LUW and well experienced with SAP BASIS in Windows, AIX, and Linux environments. Since joining SAP in 2005, Edgar Maniago, a Software Engineer, has been a member of the IBM SAP Integration and Support Center located in the Toronto IBM Lab. He currently tests, develops, and integrates new features of DB2 with SAP. Through his role in SAP Development Support and as a Customer Advocate for IBM, Edgar assists SAP consultants and customers with activities such as troubleshooting and performance optimization. Catherine Vu is a member of the IBM SAP Integration and Support team that plays a critical role in certifying every DB2 Fix Pack and every new major DB2 release with SAP applications before their general availability. In addition, she is responsible for providing development support to SAP on IBM DB2 for Linux, UNIX, and Windows customers. Before joining the IBM and SAP Integrations and Support team, Catherine had many years’ experience in DB2 when she was working in the DB2 Development team and the Technical Enablement team.
Transcript
Page 1: Enabling Database High Availability Using DB2 HADR and IBM ...

1

Enabling Database High Availability Using DB2 HADR and IBM Tivoli SA MP in an SAP Environment Applies to:

SAP NetWeaver 7.0 or higher on DB2 10.1 or higher for Linux, UNIX, and Windows.

Summary

Multiple improvements have been made to the DB2 High Availability Disaster Recover (HADR) feature. DB2 Version 10.1 for Linux, UNIX, and Windows supports multiple standbys providing customers with true database disaster recovery (DR) capability along with high availability (HA). IBM DB2 BLU Acceleration is also supported for HADR environments as of DB2 10.5 FP4. This paper describes the new features related to HADR as well as provides examples to enable single or multiple standbys in an SAP environment to achieve DR in HA systems.

Authors: Ali Mehedi, Catherine Vu, Edgar Maniago Company: IBM Canada Inc. and SAP Canada Inc. Created on: 21 November 2014

Author Bio Ali Mehedi is a Software Developer at IBM with years of experience in test tools development, DB2 Administration, SAP NetWeaver installation, configuration and maintenance, and DB2 for LUW and SAP integration. He is a certified DBA of DB2 for LUW and well experienced with SAP BASIS in Windows, AIX, and Linux environments. Since joining SAP in 2005, Edgar Maniago, a Software Engineer, has been a member of the IBM SAP Integration and Support Center located in the Toronto IBM Lab. He currently tests, develops, and integrates new features of DB2 with SAP. Through his role in SAP Development Support and as a Customer Advocate for IBM, Edgar assists SAP consultants and customers with activities such as troubleshooting and performance optimization. Catherine Vu is a member of the IBM SAP Integration and Support team that plays a critical role in certifying every DB2 Fix Pack and every new major DB2 release with SAP applications before their general availability. In addition, she is responsible for providing development support to SAP on IBM DB2 for Linux, UNIX, and Windows customers. Before joining the IBM and SAP Integrations and Support team, Catherine had many years’ experience in DB2 when she was working in the DB2 Development team and the Technical Enablement team.

Page 2: Enabling Database High Availability Using DB2 HADR and IBM ...

2

Table of Contents

1 Introduction ...................................................................................................................................................... 4

2 Planning ........................................................................................................................................................... 5

2.1 References ................................................................................................................................................ 5

2.2 Technology ................................................................................................................................................ 5 2.2.1 IBM Tivoli System Automation for Multiplatforms (SA MP) ................................................................................. 5

2.2.2 HADR synchronization modes ............................................................................................................................ 5

2.2.3 Multiple standbys ................................................................................................................................................ 6

2.2.4 Log spooling ........................................................................................................................................................ 7

2.2.5 HADR replay delay.............................................................................................................................................. 7

2.2.6 Automatic failover: DB2 Automatic Client Reroute (ACR) vs. virtual IP address ................................................. 7

2.2.7 DB2 HADR for DB2 BLU Acceleration ................................................................................................................ 8

2.2.8 DB2 LOAD with COPY YES for BLU tables ........................................................................................................ 8

2.3 Hardware and operating system requirements ....................................................................................... 10

2.4 DB2 database requirements ................................................................................................................... 11

3 Preparation .................................................................................................................................................... 12

3.1 Configuration of the test system ............................................................................................................. 12 3.1.1 Hardware and operating system in the test systems ......................................................................................... 13

3.1.2 Required software downloads ........................................................................................................................... 13

3.2 Basic network setup ................................................................................................................................ 13

3.3 File system setup .................................................................................................................................... 13

3.4 Operating system users and groups ....................................................................................................... 14

4 Installing the standby ..................................................................................................................................... 15

4.1 Exporting the file systems ....................................................................................................................... 15

4.2 Turning on DB2 log archiving .................................................................................................................. 15

4.3 Taking a backup of the primary ............................................................................................................... 16

4.4 Performing a homogeneous system copy using SWPM ......................................................................... 16

4.5 Configuring ports ..................................................................................................................................... 21

4.6 Restoring the database from a backup ................................................................................................... 21

4.7 Configuring databases for HADR ............................................................................................................ 22

4.8 Performing HADR checks ....................................................................................................................... 24

4.9 Starting HADR ......................................................................................................................................... 24

4.10 Checking the HADR status using the db2pd tool .................................................................................. 25

5. Enabling automatic failover using SA MP ..................................................................................................... 27

5.1 Installing the SA MP software and license .............................................................................................. 27

5.2 Setting up the HADR cluster ................................................................................................................... 28 5.2.1 Creating the cluster configuration file ................................................................................................................ 28

5.2.2 Creating the database cluster ........................................................................................................................... 30

5.2.3 Displaying the database cluster ........................................................................................................................ 31

5.2.4. Enabling the SAP system with virtual database host name and IP address .................................................... 34

5.3 HADR micro-outage feature test using the Graceful Maintenance Tool (GMT) ..................................... 34 5.3.1 GMT Configuration ............................................................................................................................................ 35

5.3.2 Micro-failover test .............................................................................................................................................. 36

Page 3: Enabling Database High Availability Using DB2 HADR and IBM ...

3

5.3.4 Testing a disaster scenario ............................................................................................................................... 40

6. Installing the auxiliary standby database instance ....................................................................................... 42

6.1 Mounting file systems.............................................................................................................................. 42

6.2 Updating port configurations ................................................................................................................... 42

6.3 Performing a homogeneous system copy using SWPM ......................................................................... 42

6.4 Configuring the HADR auxiliary standby database ................................................................................. 42

7 Failover scenarios .......................................................................................................................................... 47

7.1 Failover scenario #1: The primary is down ............................................................................................. 47

7.2 Failover scenario #2: Both the primary and principal standby are down ................................................ 49

7.3 Failover scenario #3: The principal standby is down .............................................................................. 54

8 Miscellaneous troubleshooting in an SA MP environment ............................................................................ 56

8.1 HADR congestion .................................................................................................................................... 56

8.2 Manual creation and deletion of an SA MP cluster ................................................................................. 57

8.3 SA MP cluster resource group ................................................................................................................ 59

8.4 Collection of traces .................................................................................................................................. 59

8.5 HADR simulator ...................................................................................................................................... 60

8.6 Split-brain condition................................................................................................................................. 60

9 Conclusion ..................................................................................................................................................... 61

10 Related Content ........................................................................................................................................... 62

Copyright ........................................................................................................................................................... 63

Page 4: Enabling Database High Availability Using DB2 HADR and IBM ...

4

1 Introduction First introduced in DB2 Version 8.2, the DB2 High Availability Disaster Recovery (HADR) database replication feature provides protection against database outages and site failures. In an HADR environment, the transaction logs from a source database, called the primary, are shipped via TCP/IP and replayed to a target database, called the standby. If the primary is offline or is lost due to a disaster, the standby can be made available as the new primary using a procedure called HADR failover.

HADR failover can be automated using IBM Tivoli System Automation for Multiplatforms (SA MP). This allows applications, such as SAP, to continue with zero disruption to user activities under ideal conditions.

Starting with DB2 10.1, DB2 HADR supports multiple standby databases. This makes IBM DB2 capable of providing a complete Disaster Recovery (DR), High Availability (HA) and Continuous Availability solution in a single and easily manageable feature. The following figure illustrates a DB2 HADR cluster that contains multiple standby databases, along with automatic failover implemented through SA MP:

Figure 1: DB2 HADR Topology with Multiple Standbys

Starting with DB2 10.5 Fix Pack 4, the DB2 HADR feature is also supported for databases containing column-organized (BLU) tables. IBM DB2 with BLU Acceleration is optimized for the SAP environment. The greater performance of DB2 BLU Acceleration combined with the improved HA and DR capabilities of DB2 HADR makes DB2 an ideal RDBMS for an SAP BW environment.

This document will cover the implementation of DB2 HADR by adding a principal and auxiliary standby to an existing SAP NetWeaver (ABAP) system on AIX, Solaris SPARC or Linux. Furthermore, it describes the implementation of automatic failover using SA MP as well as recovery from several failover scenarios.

Page 5: Enabling Database High Availability Using DB2 HADR and IBM ...

5

2 Planning DB2 HADR does not require a brand new SAP installation. The version-specific installation guide, for example, for SAP NetWeaver 7.31 can be found at http://help.sap.com/nw731/ .

In general, SAP NetWeaver installation guides can be found at http://service.sap.com/instguidesnw <your SAP NetWeaver main release> Installation Installation – SAP NetWeaver Systems.

2.1 References

The following documents should be reviewed before reading this paper:

IBM DB2 High availability http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0006354.html SAP Note 1555903 - DB6: Supported DB2 Database Features http://service.sap.com/sap/support/notes/1555903

SAP Note 1612105 - DB6: FAQ for DB2 High Availability Disaster Recovery (HADR) http://service.sap.com/sap/support/notes/1612105

SAP Note 960843 - DB6: High Availability for DB2 using SA MP http://service.sap.com/sap/support/notes/960843

SAP Note 1851832 - DB6: DB2 10.5 Standard Parameter Settings http://service.sap.com/sap/support/notes/1851832

2.2 Technology

The DB2 HADR feature comes with many configuration and performance tuning options for various business needs.

2.2.1 IBM Tivoli System Automation for Multiplatforms (SA MP)

IBM Tivoli SA MP is a high-availability cluster solution that provides several monitoring mechanisms to detect system failures and a set of rules to initiate the correct action without any user intervention. The set of rules is called a policy, which describes the relationships between applications and resources. This provides SA MP with extensive up-to-date information about the system landscape so that it can restart the resource on the current node or move the database instance to another cluster node.

Since DB2 V9.1, SAP has been partnering with IBM to provide SAP customers with a free two-node license of SA MP for the IBM DB2 database server. SAP has also published the installation guide “IBM DB2 High Availability Solution: IBM Tivoli System Automation for Multiplatforms” on how to set up a database cluster using SA MP. You can find the latest version of this guide on SAP Service Marketplace at http://service.sap.com/instguidesnw.

2.2.2 HADR synchronization modes

To ensure that logs are shipped to the standby, the primary must wait for the standby to acknowledge (ACK) before it can commit. This can have a significant performance impact on the workload depending on the network bandwidth and the distance between the primary server and the standby server. On the other hand, if the primary does not wait for an ACK, the standby will be out of sync with the primary. Therefore, it has the potential risk of data loss in case of an outage.

Defined by database configuration parameter HADR_SYNCMODE, HADR synchronization modes control the risk of protection against transaction loss during the log shipping. The following table contains the available synchronization options:

Page 6: Enabling Database High Availability Using DB2 HADR and IBM ...

6

SYNC mode Definition Standby ACK for log receive

Data protection

SYNC Transactions are committed on the primary when the database logs are written to disk on the primary and the standby and when the primary has received an ACK from the standby.

Yes Logs are guaranteed to be stored in both the primary and the standby. Therefore, this mode provides the greatest protection against transaction loss.

NEARSYNC Transactions are committed on the primary when the database logs are written to disk on the primary and received in memory on the standby and when the primary has received an ACK from the standby.

Yes Possibility of data loss if both the primary and the standby failed at the same time. The transactions that are in memory on the standby are lost.

ASYNC Transactions are committed on the primary when the database logs are written to disk on the primary and sent to TCP/IP successfully.

No Possibility of data loss if both transaction logs on the primary and “commit” record(s) in-flight to the standby are lost.

SUPERASYNC Transactions are committed on the primary when the database logs are written to disk on the primary.

No Possibility of data loss in the standby if a failover operation is required while there are missing log records.

Table 1: DB2 HADR Synchronization Modes

Note: SAP recommends to use NEARSYNC as this provides adequate data protection without significant performance impact.

2.2.3 Multiple standbys

Starting with DB2 10.1, DB2 HADR supports multiple standbys. The first standby database is called the principal standby. Any additional standby database is called the auxiliary standby. The transaction logs from the primary are shipped via TCP/IP and replayed to all the standbys.

Note: A maximum of three standbys is allowed.

Depending on which synchronization mode is chosen, HADR can have a performance impact on the primary due to network latency during HADR log shipping. Better performance can be achieved by decreasing the distance between the primary and the standby servers, and by having them connected with a high performance LAN backbone. However, this introduces the risk of losing both the primary and the standby during a wide scale natural disaster such as flood, fire, etc.

Increasing the distance, on the other hand, increases the network latency which causes performance degradation and a longer failover time. Therefore, with a single standby, there is a tradeoff between the database performance and the degree of DR capability. The HADR multiple standby feature solves this

Page 7: Enabling Database High Availability Using DB2 HADR and IBM ...

7

problem. The principal standby is used to achieve HA during outages and the auxiliary standby is to be used for DR purposes only.

Note: It is recommended to have the primary and the principal standby in the same building with a high performance LAN connection for faster log shipping and quicker failover during micro-outages. The auxiliary standby is only to be used for DR purposes and is recommended to be in a different location, preferably in a different city or country. There is no restriction on the maximum distance between the primary and the standby servers. The auxiliary standby is forced to use SUPERASYNC synchronization mode which has no synchronous dependency on replication to the standby. Therefore, there is minimal performance impact for having multiple standbys.

2.2.4 Log spooling

Logs that are sent to the standby are first stored in a memory area called the HADR log receive buffer. This is controlled by the database configuration parameter DB2_HADR_BUF_SIZE. If the standby is slow in replaying the received logs, this buffer might be full, causing the primary to be blocked because it is waiting for an ACK from the standby.

The database configuration parameter HADR_SPOOL_LIMIT is used to define the maximum amount of data that can be written to disk on the standby if it falls behind in log replay. This can be used to improve HADR performance while providing better data protection. If the standby falls behind while replaying the logs and HADR_SPOOL_LIMIT is defined, that amount of logs (defined by HADR_SPOOL_LIMIT) is written to the disk on the standby without having the primary to wait for an ACK. The standby can then later read these logs and replay them when it is able to do so. This feature is useful to deal with sudden spikes of workload on the primary during peak business hours. However, setting a higher HADR_SPOOL_LIMIT or setting it to unlimited (-1) will cause a higher takeover time because the standby has to read and apply all the logs that have not yet been applied.

2.2.5 HADR replay delay

Another way to add extra protection to data from accidental human errors is to use the HADR_REPLAY_DELAY database configuration parameter. If this is enabled, the standby will wait for the duration defined by HADR_REPLAY_DELAY before replaying the logs received. This is useful if a user accidentally deletes data or a database object from the primary. The deleted data or object can be recovered before the change gets propagated to the standby within the replay delay time.

Note: SAP recommends using HADR_SPOOL_LIMIT along with HADR_REPLAY_DELAY to accommodate the logs accumulated during the replay delay period.

2.2.6 Automatic failover: DB2 Automatic Client Reroute (ACR) vs. virtual IP address

If there is a change in HADR role, that is, the standby has to take over the primary during an outage, all the clients can reconnect to the new primary automatically by using one of the following options:

1. Using a virtual IP (VIP): The VIP is bound to the primary server’s network interfaces. After a takeover, the virtual IP is bound to the network interfaces of the standby server (the new primary server).

2. Using the DB2 Automatic Client Reroute (ACR) feature: The client is configured to know the two database servers. If the database client cannot connect to the configured primary server, the database client tries to connect to the configured standby (alternate) server.

Note: SAP recommends using the virtual IP address option with SA MP. More details can be found in SAP Note 1568539, DB6: HADR - Virtual IP or Automatic Client Reroute

(http://service.sap.com/sap/support/notes/1568539).

Automatic failover is not supported between the primary and an auxiliary standby. Auxiliary standbys are to be used for DR purposes only. A manual takeover must be issued on one of the auxiliary standbys to switch it to the primary.

Page 8: Enabling Database High Availability Using DB2 HADR and IBM ...

8

2.2.7 DB2 HADR for DB2 BLU Acceleration

As of DB2 10.5 Fix Pack 4, the DB2 HADR feature can be used with databases containing BLU (column-organized) tables. Except for Reads on Standby (RoS), all HADR features including multiple standbys are supported for BLU tables without any additional requirements or settings.

2.2.8 DB2 LOAD with COPY YES for BLU tables

DB2 10.5 Fix Pack 4 also supports the DB2 LOAD command with the COPY YES option for BLU tables. Therefore, the LOAD command can be used in an HADR environment with the COPY YES option and changes will be propagated to the standby databases.

Example:

db2 ‘LOAD FROM <filename>.ixf OF IXF SAVECOUNT 20000 INSERT INTO <schema

name>."<table name>" COPY YES TO <shared directory>'

The COPY YES option creates a mini backup image of the data loaded into the shared directory specified and the standby reads from the backup and replays the changes. Therefore, the shared directory must be accessible by the standby database. In case of a failover or after a database restore, this backup can be used to roll forward to the end of logs that also includes the data loaded in the primary using the LOAD command.

Note: The LOAD command with the COPY NO option is not supported for HADR environments. Only LOAD with the COPY YES option is supported in an HADR environment. If data is loaded into tables without the COPY YES option in the HADR primary, the changes will not be propagated to the standby. Moreover, the table will be marked as unusable in all standbys as it becomes inconsistent with the table that is in the primary. The DB2_LOAD_COPY_NO_OVERRIDE registry variable can be set on the primary database to enable a load operation with the COPY NO option to be converted to a load operation with the COPY YES option. More information can be

found at http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/c0011761.html?lang=no. The HADR state is not affected by the LOAD operation. The DB2 list utilities command

can be used to display load progress in the primary.

Example:

db2 list utilities show detail

ID = 5

Type = LOAD

Database Name = D01

Member Number = 0

Description = [LOADID: 27862.2014-10-21-16.40.39.069004.0 (4;4)]

[*LOCAL.db2d01.141021204038] OFFLINE LOAD Unknown file type AUTOMATIC INDEXING

REPLACE COPY YES SAPD01./BIC/FICBLU01-ALI

Start Time = 10/21/2014 16:40:39.082853

State = Executing

Invocation Type = User

Progress Monitoring:

Phase Number = 1

Description = SETUP

Total Work = 0 bytes

Completed Work = 0 bytes

Start Time = 10/21/2014 16:40:39.082859

Phase Number [Current] = 2

Description = ANALYZE

Total Work = 41282171 rows

Completed Work = 3949572 rows

Start Time = 10/21/2014 16:40:39.399368

Page 9: Enabling Database High Availability Using DB2 HADR and IBM ...

9

Phase Number = 3

Description = LOAD

Total Work = 0 rows

Completed Work = 0 rows

Start Time = Not Started

Phase Number = 4

Description = BUILD

Total Work = 2 indexes

Completed Work = 0 indexes

Start Time = Not Started

The standby is updated after the load is complete in the primary. The standby db2diag.log file shows the following logs indicating the load operation start and completion time where data is being loaded in the primary into the table /BIC/FICBLU01-ALI.

Example:

2014-10-15-16.48.58.682164-240 I20140A498 LEVEL: Warning

PID : 4915624 TID : 27765 PROC : db2sysc 0

INSTANCE: db2d01 NODE : 000 DB : D01

APPHDL : 0-175 APPID: *LOCAL.DB2.141015193921

HOSTNAME: sapaix11

EDUID : 27765 EDUNAME: db2agent (D01) 0

FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:548

DATA #1 : String, 74 bytes

Starting to restore a load copy.

SAPD01./BIC/FICBLU01-ALI.20141015160659

2014-10-15-16.52.59.701675-240 I26278A449 LEVEL: Warning

PID : 4915624 TID : 27765 PROC : db2sysc 0

INSTANCE: db2d01 NODE : 000 DB : D01

APPHDL : 0-175 APPID: *LOCAL.DB2.141015193921

HOSTNAME: sapaix11

EDUID : 27765 EDUNAME: db2agent (D01) 0

FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:1142

MESSAGE : Load copy restore completed successfully.

Note: As long as the standby has access to the shared directory where the mini backup from LOAD with COPY YES is located, LOAD will complete even if a failover happens right after data is loaded into the primary.

If data is loaded into a table in the primary using the LOAD command without COPY YES option or with COPY NO option, the table will be marked as unavailable in the standby.

Example:

SELECT COUNT(*) AS COUNT FROM SAPD01./BIC/FICBLU01-ALI

COUNT

-----------

SQL1477N For table "SAPD01./BIC/FICBLU01-ALI" an object "130" in table space

"20" cannot be accessed. SQLSTATE=55019

db2diag.log:

2014-09-03-21.18.24.500558-240 E1950A551 LEVEL: Warning

PID : 14025052 TID : 18507 PROC : db2sysc 0

INSTANCE: db2d01 NODE : 000 DB : D01

Page 10: Enabling Database High Availability Using DB2 HADR and IBM ...

10

APPHDL : 0-8 APPID: *LOCAL.DB2.140904011422

HOSTNAME: sapaix11

EDUID : 18507 EDUNAME: db2redow (D01) 0

FUNCTION: DB2 UDB, data management, sqldMarkObjInErr, probe:1

MESSAGE : ADM5571W The "DATA" object with ID "130" in table space "20" for

table "TBSPACEID=20.TABLEID=130" is being marked as unavailable.

If a failover happens before the data load is complete in the primary, the tablespace containing the table will be marked as Restore Pending in the standby when it becomes the new primary. The table will be marked as Load Pending in the primary, now the standby.

db2 list tablespaces show detail

Tablespace ID = 3

Name = D01#FACTI

Type = Database managed space

Contents = All permanent data. Large table space.

State = 0x0100

Detailed explanation:

Restore pending

Tablespace ID = 4

Name = D01#FACTD

Type = Database managed space

Contents = All permanent data. Large table space.

State = 0x0100

Detailed explanation:

Restore pending

To resolve this issue, the load operation must be terminated using the TERMINATE option from the database host where it was started.

Example:

db2 ‘LOAD FROM <filename>.ixf OF IXF TERMINATE INSERT INTO <schema name>."<table

name>" COPY YES TO <shared directory>'

Note: If the LOAD command with the COPY YES option is used, this situation can be avoided in an HADR environment. To recover from a Restore Pending state, the tablespace must be restored from a tablespace level backup. Tablespace level backup is not supported for HADR. Therefore, to take a tablespace level backup, HADR must be disabled. Once the tablespace is restored, the standby must be refreshed with a new copy (using a full database backup) of the primary.

2.3 Hardware and operating system requirements

1. The operating system on the primary and the standby must have the same version, including patches.

2. A TCP/IP interface must be available between the HADR host machines, and a high-speed, high-capacity network is recommended. The network bandwidth required for HADR log shipping between the primary and the principle standby depends on the amount of logs generated in the primary per second during peak time. The minimum bandwidth can be easily calculated using the following paper: http://scn.sap.com/docs/DOC-56040 Note: The primary ships logs to the principle standby and all the active auxiliary standbys simultaneously. Therefore, the required network bandwidth is multiplied by the number of active standbys. The HADR simulator tool described in the following link can also be used to determine the maximum network shipping rate between systems: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20simulator

Page 11: Enabling Database High Availability Using DB2 HADR and IBM ...

11

2.4 DB2 database requirements

1. The versions of the database systems for the primary and standby must be identical; for example, both must be either 10.1 or 10.5.

2. During a rolling fix pack update, the modification level (for example, the fix pack level) of the database system for the standby can be temporarily higher than that of the primary in order to test the new level. Both databases should be on the same DB2 version and fix pack level for normal operations.

For more information about software and hardware requirements, see the following links:

System requirements for DB2 high availability disaster recovery (HADR) http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0011759.html

Installation and storage requirements for high availability disaster recovery (HADR) http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0012543.html

System requirements for IBM DB2 for Linux, UNIX, and Windows http://www-01.ibm.com/support/docview.wss?uid=swg27038033

Page 12: Enabling Database High Availability Using DB2 HADR and IBM ...

12

3 Preparation For the purpose of this exercise, we will enable HADR for an existing SAP NetWeaver system with a distributed installation.

3.1 Configuration of the test system

The test system has the following setup where the SAP Central Services Instance (ASCS) and the SAP Primary Application Server (Central Instance) reside on host saplxvm06. The database instance resides on host saplxvm07.

The goal is to enable the HADR feature by adding a standby database on host saplxvm08 (IP address 9.26.166.200). Furthermore, HADR will be configured for multiple standby databases by adding an auxiliary standby on host saplxvm09 (IP address 9.26.166.201). By the end of this exercise the system topology should look as shown in the following figure:

Figure 2: Topology of the HADR Test System

Page 13: Enabling Database High Availability Using DB2 HADR and IBM ...

13

3.1.1 Hardware and operating system in the test systems

All database hosts are on separate hardware with identical configuration and on the same operating system level.

Example: For the test systems saplxvm07, saplxvm08, and saplxvm09, the OS and hardware configuration is compared using /proc/meminfo, /proc/cpuinfo and /etc/SuSE-release files. Each system has 4 “Intel(R) Xeon(R) CPU X5680 @ 3.33GHz” CPUs, 8 GB of RAM, and the following operating system:

SUSE Linux Enterprise Server 11 (x86_64)

VERSION = 11

PATCHLEVEL = 2

3.1.2 Required software downloads

1. DB2 10.5 for Linux, UNIX, and Windows Download from SAP Service Marketplace at https://service.sap.com/swdc. Refer to SAP Note 1851853 - DB6: Using DB2 10.5 with SAP Applications and SAP Note 1260217 - DB6: Software Components Contained in DB2 License from SAP for supported new features. IBM Tivoli SA MP for Linux is included in the DB2 10.5 for Linux, UNIX, and Windows image.

2. SAP NetWeaver 7.3 - Including Enhancement Package 1, Support Package Stack 09 Download from SAP Service Marketplace at https://service.sap.com/swdc. Find more information at SAP Support Portal at http://help.sap.com/nw731.

3. SAP Software Provisioning Manager (SWPM) 1.0 SP04 or higher Download from SAP Service Marketplace at http://service.sap.com/sltoolset -> Software Logistics Toolset 1.0 -> Software Provisioning Manager. See SAP Note 1680045 - Release Note for Software Provisioning Manager 1.0 for more information.

3.2 Basic network setup

Make sure that all hosts are able to communicate with each other via TCP/IP. Add the appropriate IP addresses to the hostname mappings in the /etc/hosts file of each host.

Example:

Sample content of /etc/hosts file (on saplxvm06, saplxvm07, saplxvm08, and sapxlvm09):

saplxvm06:~ # cat /etc/hosts | grep –i saplxvm

9.26.166.198 saplxvm06.torolab.ibm.com saplxvm06

9.26.166.199 saplxvm07.torolab.ibm.com saplxvm07

9.26.166.200 saplxvm08.torolab.ibm.com saplxvm08

9.26.166.201 saplxvm09.torolab.ibm.com saplxvm09

Adding static IP addresses to hostname mappings in the hosts file removes the system’s DNS servers as a single point of failure. In case of a DNS failure, the clustered systems can still resolve the addresses of the other machines via the /etc/hosts file. From each host, ping all other hosts to check communication.

3.3 File system setup

DB2 HADR does not require shared storage devices. To avoid a single point of failure and data loss, the primary and all the standbys should have their own storage. It is recommended to have the same file system structure and storage devices for the primary and all the standbys. This reduces the probability of the standby falling behind during log replay or log spooling. It is also recommended to have the database log directory and database tablespace containers in separate file systems for each database.

Page 14: Enabling Database High Availability Using DB2 HADR and IBM ...

14

3.4 Operating system users and groups

All SAP and DB2 related user IDs and group IDs from the primary and the SAP Application Server must also be available and free to use in the standby server.

Example:

The following command is used to collect group IDs and user IDs on the hosts saplxvm07 and saplxvm06. The same IDs will be used in the standby and auxiliary standby servers.

id ahaadm

Group Name Group ID (GID) Description

dbahaadm 401 SYSADM (system administrator) authority

dbahactl 402 SYSCTRL and SYSMON authority

dbahamnt 403 SYSMAINT and SYSMON authority

dbahamon 404 SYSMAINT and SYSMON authority

sapsys 390 SAP System Services group

sapinst 1000 Common group used by SWPM for all SAP system users.

Table 2: Groups required for an SAP NetWeaver installation

User Name User ID (UID) Description

ahaadm 301 SAP system administrator to run the SAP Central Services Instance

db2aha 302 SAP database administrator

sapaha 303 SAP ABAP database connect user

sapahadb 304 SAP JAVA database connect user

sapadm 305 The user sapadm is used for SAP Host Agent.

daaadm 306 SAP Diagnostics Agent administrator

Table 3: Users required for an SAP NetWeaver installation

Page 15: Enabling Database High Availability Using DB2 HADR and IBM ...

15

4 Installing the standby To perform the steps in this section, Section 3, must have been completed.

4.1 Exporting the file systems

The directories /sapmnt/<SID>/exe, /sapmnt/<SID>/profile, and /sapmnt/<SID>/global from

the SAP Application Server must be mounted on the standby host.

Example:

On the host saplxvm06, the following lines must be added to the /etc/exports file:

/sapmnt/AHA/exe saplxvm07(rw,no_root_squash,async,insecure,no_subtree_check)

saplxvm08(rw,no_root_squash,async,insecure,no_subtree_check)

saplxvm09(rw,no_root_squash,async,insecure,no_subtree_check)

/sapmnt/AHA/global saplxvm07(rw,no_root_squash,async,insecure,no_subtree_check)

saplxvm08(rw,no_root_squash,async,insecure,no_subtree_check)

saplxvm09(rw,no_root_squash,async,insecure,no_subtree_check)

/sapmnt/AHA/profile saplxvm07(rw,no_root_squash,async,insecure,no_subtree_check)

saplxvm08(rw,no_root_squash,async,insecure,no_subtree_check)

saplxvm09(rw,no_root_squash,async,insecure,no_subtree_check)

The following commands must be executed on host saplxvm06 to confirm the export and allow access:

exportfs -a

exportfs

The following lines can be added to the /etc/fstab file on the standby host saplxvm08 so that the directories are automatically remounted after a system restart:

saplxvm06:/sapmnt/AHA/exe /sapmnt/AHA/exe nfs defaults 0 0

saplxvm06:/sapmnt/AHA/global /sapmnt/AHA/global nfs defaults 0 0

saplxvm06:/sapmnt/AHA/profile /sapmnt/AHA/profile nfs defaults 0 0

The following command can be used to mount all directories mentioned in /etc/fstab.

mount -a

4.2 Turning on DB2 log archiving

DB2 requires log archiving to be turned on for the HADR setup. See the following link to the IBM Knowledge Center for options to turn on log archiving:

http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0051344.html

Example:

saplxvm07:db2aha > db2 update db cfg for AHA using LOGARCHMETH1

DISK:/db2/AHA/log_archive/

After enabling log archiving, a complete offline backup must be taken in order to take the database out of the backup pending state. To reduce production downtime during the offline backup, the backup can be split to multiple files. Using the COMPRESS option during backup will increase the backup duration as it adds compression time to the regular backup time.

Example:

saplxvm07:db2aha > db2 backup db aha to /db2/db2aha/backup, /db2/db2aha/backup,

/db2/db2aha/backup, /db2/db2aha/backup

Page 16: Enabling Database High Availability Using DB2 HADR and IBM ...

16

4.3 Taking a backup of the primary

An online or offline backup of the primary database is required, and the backup image will be used to create the standby.

Example:

saplxvm07:db2aha > db2 backup db aha online to

/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup

4.4 Performing a homogeneous system copy using SWPM

To create the standby, it is recommended to perform a homogeneous system copy (database copy method) using SAP Software Provisioning Manager (SWPM). A custom installation must be selected to be able to manually enter the same user IDs and group IDs used in the primary server. SWPM also provides a few DB2 HADR-specific installation options. Specifically, IBM Tivoli System Automation for Multiplatforms for DB2 must be selected with HADR (High Availability Disaster Recovery) as Cluster Type. Users are also given the option to select HADR as synchronization mode and HADR local and remote service name (or port number) configuration parameters.

Note: “Synchronization mode” corresponds to the DB2 database configuration parameter HADR_SYNCMODE. “HADR local and remote service name” corresponds to the database configuration parameters HADR_LOCAL_SVC and HADR_REMOTE_SVC respectively.

A homogeneous system copy creates all users, sets up the environment, installs the database software, creates the instance on the standby, and then prompts the user to restore the database from a backup. This is when the backup taken in step 4.3 from the primary server must be restored in the standby server.

Note: To set up HADR, the standby must be in rollforward pending state. Therefore, SWPM is no longer required for the HADR setup and must be exited after restoring the database.

Example:

The following screens show SWPM HADR-related settings:

Page 17: Enabling Database High Availability Using DB2 HADR and IBM ...

17

Figure 3: Start screen of SWPM

Page 18: Enabling Database High Availability Using DB2 HADR and IBM ...

18

Figure 4: SWPM screen for IBM Tivoli System Automation for Multiplatforms (SA MP) for High Availability installation options

Note: The cluster configuration file is used to create a cluster for the automatic failover, which will be explained later in this document. In figure 4, the Generate cluster configuration files checkbox is not selected because SWPM will be exited after restoring the database and you will not reach the step to create the cluster configuration file. For a new installation of the primary, if this option is selected, the cluster configuration file (cluster_config.xml) will be generated in the directory /tmp.

Page 19: Enabling Database High Availability Using DB2 HADR and IBM ...

19

Figure 5: DB2 High Availablity Disaster Recovery options in SWPM

Note: The two port numbers will be assigned to the database configuration parameters HADR_LOCAL_SVC and HADR_REMOTE_SVC. This can be changed later.

Page 20: Enabling Database High Availability Using DB2 HADR and IBM ...

20

Figure 6: SWPM message window to restore database for homogeneous system copy

Note: As described earlier, at this stage, SWPM must be stopped by clicking the “Cancel” button.

Page 21: Enabling Database High Availability Using DB2 HADR and IBM ...

21

Figure 7: Stop SWPM.

Note: As described earlier, SWPM is no longer needed for the HADR setup. Stop it by clicking the “Stop” button.

4.5 Configuring ports

The database connection ports in the /etc/services file must be the same as in the primary host (saplxvm07).

Example:

sapdb2AHA 5912/tcp

AHA_HADR_1 5951/tcp # DB2 HADR log shipping

AHA_HADR_2 5952/tcp # DB2 HADR log shipping

sapmsAHA 3600/tcp # SAP System Message Server Port

DB2_db2aha 60006/tcp

DB2_db2aha_1 60007/tcp

DB2_db2aha_2 60008/tcp

DB2_db2aha_3 60009/tcp

DB2_db2aha_4 60010/tcp

DB2_db2aha_END 60011/tcp

Note: The port numbers configured for AHA_HADR_1 and AHA_HADR_2 are used for the primary (saplxmv07) and the standby (saplxmv08) server’s HADR local and remote service name database configuration parameters (HADR_LOCAL_SVC, HADR_REMOTE_SVC). The same port number can be used for both parameters.

4.6 Restoring the database from a backup

Before restoring the database, all required file systems must be created or mounted on the standby host (saplxvm08), matching those of the primary host (saplxvm07).

Example:

saplxvm07:db2aha 51> cd /db2/AHA

saplxvm07:db2aha 52> ls -lrt

total 32

drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata4

drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata3

drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata2

Page 22: Enabling Database High Availability Using DB2 HADR and IBM ...

22

drwxr-x--- 3 db2aha dbahaadm 4096 May 13 16:45 sapdata1

drwxr-xr-x 3 db2aha dbahaadm 4096 May 13 16:45 db2aha

drwxr-xr-x 3 db2aha dbahaadm 4096 May 13 16:46 log_dir

drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 15:40 log_archive

drwxr-xr-x 27 db2aha dbahaadm 4096 Jul 4 14:00 db2dump

The following directories are created (or mounted) on the standby host with the same ownership and permission as in the primary host above.

saplxvm08:db2aha 60> cd /db2/AHA

saplxvm08:db2aha 61> ls -lrt

total 32

drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 db2aha

drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata4

drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata3

drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata2

drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:43 sapdata1

drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:51 log_dir

drwxr-xr-x 3 db2aha dbahaadm 4096 May 15 16:51 log_archive

drwxr-xr-x 4 db2aha dbahaadm 4096 Jun 13 14:22 db2dump

The backup taken in section 4.3 of this document is restored in the standby host using the following command:

saplxvm08:db2aha 31> db2 restore db aha from

/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup,/db2/db2aha/backup

DB20000I The RESTORE DATABASE command completed successfully.

Note: DB2 HADR requires the standby to be in rollforward pending mode. Therefore, after restoring the database, it is not necessary to execute the ROLLFORWARD DATABASE command.

Example:

saplxvm08:db2aha 23> db2 get db cfg for aha|grep -i Rollforward

Rollforward pending = DATABASE

Note: During the database instance installation on the standby (saplxvm08), the parameter DBHOST in SAP DEFAULT.PFL was changed to the host name of the standby host (saplxvm08). You should change the value of SAPDBHOST and j2ee/dbhost to a virtual host name later on when setting up the database virtual host. For now, it should be changed to the primary host (saplxvm07).

Example:

saplxvm08:db2aha 34> grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL

SAPDBHOST = saplxvm07

j2ee/dbhost = saplxvm07

4.7 Configuring databases for HADR

To enable HADR, update the database manager configuration parameters for both the primary and the standby as shown in the following examples:

Example:

On host saplxvm07, as user db2aha, the following sample script is executed:

saplxvm07:db2aha 11> cat primary_hadr_cfg.sql

UPDATE DB CFG FOR AHA USING HADR_LOCAL_HOST saplxvm07;

UPDATE DB CFG FOR AHA USING HADR_LOCAL_SVC AHA_HADR_1;

Page 23: Enabling Database High Availability Using DB2 HADR and IBM ...

23

UPDATE DB CFG FOR AHA USING HADR_REMOTE_HOST saplxvm08;

UPDATE DB CFG FOR AHA USING HADR_REMOTE_SVC AHA_HADR_2;

UPDATE DB CFG FOR AHA USING HADR_REMOTE_INST db2aha;

UPDATE DB CFG FOR AHA USING HADR_TIMEOUT 120;

UPDATE DB CFG FOR AHA USING HADR_SYNCMODE NEARSYNC;

UPDATE DB CFG FOR AHA USING HADR_SPOOL_LIMIT 1000;

UPDATE DB CFG FOR AHA USING HADR_PEER_WINDOW 240;

UPDATE DB CFG FOR AHA USING indexrec RESTART logindexbuild ON;

saplxvm07:db2aha 52> db2 -z primary_hadr_cfg.sql.log -tvf primary_hadr_cfg.sql

saplxvm07:db2aha 55> db2 get db cfg for aha | grep HADR

HADR database role = PRIMARY

HADR local host name (HADR_LOCAL_HOST) = saplxvm07

HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_1

HADR remote host name (HADR_REMOTE_HOST) = saplxvm08

HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_2

HADR instance name of remote server (HADR_REMOTE_INST) = db2aha

HADR timeout value (HADR_TIMEOUT) = 120

HADR target list (HADR_TARGET_LIST) =

HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000

HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0

HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

Note: Port numbers for the parameters AHA_HADR_1 and AHA_HADR_2 are defined in the /etc/services file (see section 4.5). Actual port numbers (5951 and 5952) can also be used instead of these.

On host saplxvm08, as user db2aha, the following script is executed:

saplxvm08:db2aha 25> cat standby_hadr_cfg.sql

UPDATE DB CFG FOR AHA USING HADR_LOCAL_HOST saplxvm08;

UPDATE DB CFG FOR AHA USING HADR_LOCAL_SVC AHA_HADR_2;

UPDATE DB CFG FOR AHA USING HADR_REMOTE_HOST saplxvm07;

UPDATE DB CFG FOR AHA USING HADR_REMOTE_SVC AHA_HADR_1;

UPDATE DB CFG FOR AHA USING HADR_REMOTE_INST db2aha;

UPDATE DB CFG FOR AHA USING HADR_TIMEOUT 120;

UPDATE DB CFG FOR AHA USING HADR_SYNCMODE NEARSYNC;

UPDATE DB CFG FOR AHA USING HADR_SPOOL_LIMIT 1000;

UPDATE DB CFG FOR AHA USING HADR_PEER_WINDOW 240;

UPDATE DB CFG FOR AHA USING indexrec RESTART logindexbuild ON;

saplxvm08:db2aha 26> db2 –z standby_hadr_cfg.sql.out -tvf standby_hadr_cfg.sql

saplxvm08:db2aha 53> db2 get db cfg for aha | grep HADR

HADR database role = STANDBY

HADR local host name (HADR_LOCAL_HOST) = saplxvm08

HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_2

HADR remote host name (HADR_REMOTE_HOST) = saplxvm07

HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_1

HADR instance name of remote server (HADR_REMOTE_INST) = db2aha

HADR timeout value (HADR_TIMEOUT) = 120

HADR target list (HADR_TARGET_LIST) =

HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000

HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0

Page 24: Enabling Database High Availability Using DB2 HADR and IBM ...

24

HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

Both the primary and the standby must be deactivated and reactivated for the changes to take effect. This will require a production system downtime.

4.8 Performing HADR checks

Before starting HADR, check the following:

1. The /etc/services file on both the standby and primary host contains the same port numbers. Example:

saplxvm07:~ # cat /etc/services | grep -i aha

sapdb2AHA 5912/tcp

AHA_HADR_1 5951/tcp # DB2 HADR log shipping

AHA_HADR_2 5952/tcp # DB2 HADR log shipping

sapmsAHA 3600/tcp # SAP System Message Server Port

DB2_db2aha 60006/tcp

DB2_db2aha_1 60007/tcp

DB2_db2aha_2 60008/tcp

DB2_db2aha_3 60009/tcp

DB2_db2aha_4 60010/tcp

DB2_db2aha_END 60011/tcp

2. The DB2 licenses on both the primary and the standby are valid and not a trial license. Use the “db2licm -l” command to verify. SA MP is not supported for DB2 temporary licenses. Apply a valid license using the “db2licm –a <license file name>” command as user db2aha.

3. The database manager configuration Parameter SVCENAME is defined as sapdb2<SID> on both the primary and the standby hosts.

Example:

saplxvm08:db2aha 23> db2 get dbm cfg | grep -i svcename

TCP/IP Service name (SVCENAME) = sapdb2AHA

4. User sap<SID> is able to connect to the database using a valid password. Example:

saplxvm07:db2aha 59> db2 connect to aha user sapaha using ******

Database Connection Information

Database server = DB2/LINUXX8664 10.5.0

SQL authorization ID = SAPAHA

Local database alias = AHA

4.9 Starting HADR

HADR must be started on the standby first and then on the primary using the START HADR command.

Example:

On the standby host saplxvm08:

saplxvm08:db2aha 35> db2 deactivate db aha

DB20000I The DEACTIVATE DATABASE command completed successfully.

saplxvm08:db2aha 36> db2 start hadr on db aha as standby

Page 25: Enabling Database High Availability Using DB2 HADR and IBM ...

25

DB20000I The START HADR ON DATABASE command completed successfully.

On the primary host saplxvm07:

saplxvm07:db2aha 51> db2 deactivate db aha

DB20000I The DEACTIVATE DATABASE command completed successfully.

saplxvm07:db2aha 52> db2 start hadr on db aha as primary

DB20000I The START HADR ON DATABASE command completed successfully.

HADR is now enabled and the standby will begin to replay the logs to catch up to the primary.

4.10 Checking the HADR status using the db2pd tool

The HADR status can be checked using the db2pd tool. The following example shows an output of db2pd from the primary.

Example:

saplxvm07:db2aha 75> db2pd -d AHA -HADR

Database Member 0 -- Database AHA -- Active -- Up 0 days 09:50:28 -- Date 2014-05-22-

00.38.20.793285

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = NEARSYNC

STANDBY_ID = 1

LOG_STREAM_ID = 0

HADR_STATE = PEER

HADR_FLAGS =

PRIMARY_MEMBER_HOST = saplxvm07

PRIMARY_INSTANCE = db2aha

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = saplxvm08

STANDBY_INSTANCE = db2aha

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 05/21/2014 14:47:57.497164 (1400698077)

HEARTBEAT_INTERVAL(seconds) = 30

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 3

PEER_WAIT_LIMIT(seconds) = 0

LOG_HADR_WAIT_CUR(seconds) = 0.000

LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000056

LOG_HADR_WAIT_ACCUMULATED(seconds) = 1.464

LOG_HADR_WAIT_COUNT = 36460

SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384

SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380

PRIMARY_LOG_FILE,PAGE,POS = S0000007.LOG, 4393, 2666673961

STANDBY_LOG_FILE,PAGE,POS = S0000007.LOG, 4390, 2666660069

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000007.LOG, 4390, 2666660069

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 05/22/2014 00:38:17.000000 (1400733497)

STANDBY_LOG_TIME = 05/22/2014 00:37:21.000000 (1400733441)

STANDBY_REPLAY_LOG_TIME = 05/22/2014 00:37:21.000000 (1400733441)

STANDBY_RECV_BUF_SIZE(pages) = 2048

STANDBY_RECV_BUF_PERCENT = 0

STANDBY_SPOOL_LIMIT(pages) = 1000

Page 26: Enabling Database High Availability Using DB2 HADR and IBM ...

26

STANDBY_SPOOL_PERCENT = 0

PEER_WINDOW(seconds) = 240

PEER_WINDOW_END = 05/22/2014 00:42:17.000000 (1400733737)

READS_ON_STANDBY_ENABLED = N

Refer to the following IBM Knowledge Center page for more details on the above values: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0011729.html

Page 27: Enabling Database High Availability Using DB2 HADR and IBM ...

27

5. Enabling automatic failover using SA MP The steps in section 4 describe how to enable HADR, but not automatic failover. Therefore, if the primary goes down, a manual takeover operation must be performed from the standby. All applications must be manually redirected to the standby, the new primary. This can be done by changing the db2cli.ini and SAP profile (see section 5.2.4).

To enable automatic failover, SA MP must be installed and configured on both the primary and the standby hosts.

5.1 Installing the SA MP software and license

SWPM provides the option to install SA MP. SA MP must be installed on the primary and the standby hosts.

Example:

The following steps need to be performed to install and configure SA MP:

1. Check the DB2 image that can be downloaded from SAP Service Marketplace for the included SA MP software.

saplxvm07: cd <DB2-DVD-Mount-Point>/LINUXX86_64/ESE/disk1/db2/linuxamd64/tsamp/

saplxvm07: ls

db2cktsa installSAM integration license Linux prereqSAM README uninstallSAM

2. Install SA MP.

saplxvm07: ./prereqSAM

prereqSAM: All prerequisites for the ITSAMP installation are met on operating

system

SUSE Linux Enterprise Server 11 (x86_64)

VERSION = 11

PATCHLEVEL = 2

saplxvm07: ./installSAM --noliccheck

3. Install the SA MP license as described in SAP Note 816773.

saplxvm07: samlicm -i ~/sam32.lic

saplxvm07: samlicm -s

Product: IBM Tivoli System Automation for Multiplatforms 3.2

Creation date: Wed 19 Aug 2009 12:00:01 AM EDT

Expiration date: Thu 31 Dec 2037 12:00:01 AM EST

4. Install the HA scripts on the primary and the standby by running the db2cptsa command.

saplxvm07:/db2/db2aha/db2_software/install/tsamp # ./db2cptsa

DBI1110I The DB2 High Availability (HA) scripts for the IBM Tivoli

System Automation for Multiplatforms (SA MP) were successfully

updated in /usr/sbin/rsct/sapolicies/db2.

Explanation:

You need DB2 HA scripts to use SA MP with the DB2 HA feature.

These DB2 HA scripts are located at /usr/sbin/rsct/sapolicies/db2. The

DB2 installer detects whether these DB2 HA scripts need to be installed

or updated.

The DB2 installer successfully updated the DB2 HA scripts.

Page 28: Enabling Database High Availability Using DB2 HADR and IBM ...

28

User response:

No action is required.

5.2 Setting up the HADR cluster

SAP provides a tool called Cluster Setup Tool to easily set up the HADR cluster for automatic failover. The tool is included in the DB2 media provided by SAP and is located in <DB2-DVD-Mount-Point>/LINUXX86_64/SA_MP/scripts/. Instructions for the tool are provided in <DB2-DVD-Mount-Point>/LINUXX86_64/DB6_SAMP_InstGuide.pdf.

The latest version of the Cluster Setup Tool can be downloaded from SAP Note 960843 - DB6: High Availability for DB2 using SA MP: http://service.sap.com/sap/support/notes/960843

Example:

saplxvm07:~ # tar -xzvf samp_scripts_633_20140317.tgz

samp_scripts/

samp_scripts/sampdbcmd

samp_scripts/startdb

samp_scripts/startj2eedb

samp_scripts/stopdb

samp_scripts/stopj2eedb

samp_scripts/sapdb2cluster.sh

The sapdb2cluster.sh script is used to set up and create the cluster using the configuration file with all the required parameters. The script has the following options:

1. Create, Show or Edit Database Configuration 2. Create Database Cluster 3. Show Database Cluster State 4. Delete Database Cluster

For further information, refer to the SAP installation guide “IBM DB2 High Availability Solution: IBM Tivoli System Automation for Multiplatforms” that can be found on SAP Service Marketplace at http://service.sap.com/instguidesnw.

5.2.1 Creating the cluster configuration file

The sapdb2cluster.sh script must be run as root user from the primary or the standby host.

Example:

saplxvm07:~/samp_scripts # ./sapdb2cluster.sh -l sapdb2cluster.log -f

sapdb2cluster.conf

By default, the configuration information is saved in the sapdb2cluster.conf file and the log is saved in the sapdb2cluster.log file.

Select option “1 - Create, Show or Edit Database Configuration” which displays values from the current configuration file if it exists in the current directory or prompts you for new values.

Example:

General Cluster Configuration

[1] SAP_SID = AHA

[2] SAP_CI_HOSTNAME = saplxvm06

[3] SAP_CI_INST_NR =

Page 29: Enabling Database High Availability Using DB2 HADR and IBM ...

29

[4] TSA_DOMAIN_NAME = sap_ahadb2

[5] TSA_TIEBREAKER_IP_ADDRESS = 9.26.166.1

[6] TSA_DISK_HEARTBEAT = [OFF]

[7] TSA_REMOTE_CMD = ssh

Database Cluster Configuration

[8] DB2_HOSTNAME_LIST = saplxvm07

saplxvm08

[9] DB2_CLUSTER_TYPE = HADR

[10] DB2_INST_DIR = /db2/db2aha/db2_software

[11] DB2_DB2INSTANCE = db2aha

[12] DB2_HA_HOSTNAME = saplxvmsap

[13] DB2_HA_IP_ADDRESS = 9.26.166.97

[14] DB2_NETWORK_INTERFACE_LIST = eth0:9.26.166.199:255.255.254.0:saplxvm07

eth0:9.26.166.200:255.255.254.0:saplxvm08

[15] DB2_HA_IP_MASK = 255.255.254.0

[16] TSA_LICENSE_FILE = /root/sam32.lic

[17] TSA_USER_LIST = ahaadm

db2aha

[18] DB2_HADR_SYNC_MODE = NEARSYNC

[19] DB2_HADR_PORTS = AHA_HADR_1:5951/tcp

AHA_HADR_2:5952/tcp

[20] TSA_USER_GROUP_NAME = sagrp

[21] TSA_USER_GROUP_ID = 222

[22] DB2_COMM_PORTS = DB2_db2aha:60006/tcp

DB2_db2aha_1:60007/tcp

DB2_db2aha_2:60008/tcp

DB2_db2aha_3:60009/tcp

DB2_db2aha_4:60010/tcp

DB2_db2aha_END:60011/tcp

sapdb2AHA:5912/tcp

[23] DB2_GROUP_LIST = dbahaadm:401:true

dbahactl:402:true

dbahamnt:403:true

dbahamon:404:true

sapsys:390:true

sapinst:1000:true

[24] DB2_USER_LIST =

ahaadm:301:/home/ahaadm:true:/bin/csh:true:sapsys:sapinst:dbahactl

db2aha:302:/db2/db2aha:false:/bin/csh:true:dbahaadm:sapinst

sapaha:303:/home/sapaha:true:/bin/csh:true:dbahamon:dialout:video

sapadm:305:/home/sapadm:true:/bin/false:true:sapsys:dialout:video

Edit Database configuration

Press Enter to Exit or select a number to edit a parameter (e.g. 1 for SAP_SID):

The parameter TSA_DISK_HEARTBEAT enables SA MP Disk Heartbeat and is defined by the accessibility of the raw disks, logical volumes (LVID), multipath devices (MPATH), or physical volumes (PVID). This allows TSA to distinguish between a network failure and a node failure. Refer to the following link for more information: http://www-01.ibm.com/support/knowledgecenter/SSRM2X_4.1.0/com.ibm.samp.doc_4.1/sampugdiskheartbeat.html?cp=SSRM2X_4.1.0%2F0-4-5-1-1&lang=en

The parameter DB2_HOSTNAME_LIST takes the primary host name and the standby host name separated by comma. DB2_HA_HOSTNAME is used to assign the virtual host name. The virtual host name must not be in use. It is also required to supply a valid SA MP license location. In our example, the license file /root/sam32.lic

Page 30: Enabling Database High Availability Using DB2 HADR and IBM ...

30

is assigned to variable TSA_LICENSE_FILE.

The remaining parameters should be assigned automatically from the system.

5.2.2 Creating the database cluster

Once the cluster configuration file has been generated, the cluster can be created by using option “2 - Create Database Cluster”. Logs are reported in the file sapdb2cluster.log.

Example:

Read configuration file

Check general configuration

Check database cluster configuration

Create cluster domain and nodes

Create SA MP domain

Check for SA MP software on cluster nodes

Check for SA MP on node saplxvm07 : OK

Check for SA MP on node saplxvm08 : OK

Prepare cluster nodes : OK

Create cluster domain sap_ahadb2 : OK

Create database cluster resources

Create database cluster (HADR)

Check DB2 cluster : OK

Prepare DB2 cluster

Grant SA MP control to ahaadm,db2aha on all nodes

Create user group sagrp on node saplxvm07 : OK

Grant SA MP control on node saplxvm07 : OK

Create user group sagrp on node saplxvm08 : OK

Grant SA MP control on node saplxvm08 : OK

Replicate DB2 HADR ports : OK

Disable DB2 Fault Monitor : OK

Configuring SAP for HA DB2 cluster

Modifying User Environment : OK

Replace startdb/stopdb scripts : OK

Update SAP profiles with virtual database hostname : OK

Update thin client configuration with virtual database hostname : OK

Setup HADR for DB2 cluster

Start database servers

Start database server on node saplxvm07 : OK

Start database server on node saplxvm08 : OK

Configure DB2 servers for HADR

Check HADR database roles for DB2 cluster : OK

Configure database server saplxvm07 for HADR : OK

Configure database server saplxvm08 for HADR : OK

Start database server saplxvm08 as STANDBY : OK

Start database server saplxvm07 as PRIMARY : OK

Activate databases

Activate database AHA on node saplxvm08 : OK

Activate database AHA on node saplxvm07 : OK

Wait for HADR cluster to become Peer state : OK

Check database cluster configuration : OK

Generate db2haicu configuration file (/tmp/cluster_config.xml) : OK

Execute db2haicu

Copying config file to cluster node saplxvm07 : OK

Copying config file to cluster node saplxvm08 : OK

Removing virtual IP address from cluster nodes : OK

Executing db2haicu at node saplxvm08 : OK

Page 31: Enabling Database High Availability Using DB2 HADR and IBM ...

31

Executing db2haicu at node saplxvm07 : OK

Action finished. Press Enter to continue ...

The above output also describes the steps performed during cluster creation. The script configures both systems for HADR and generates the /tmp/cluster_config.xml configuration file on both systems. The script then uses the configuration file to execute the db2haicu command on both the primary and the standby to create the cluster. The output of the db2haicu command is stored in /tmp/cluster_config.log by the script.

Example:

The following is an example of the cluster_config.xml file:

<?xml version="1.0" encoding="UTF-8"?>

<DB2Cluster xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="db2ha.xsd"

clusterManagerName="TSA" version="2.2">

<ClusterDomain domainName="sap_ahadb2">

<Quorum quorumDeviceProtocol="network" quorumDeviceName="9.26.166.1"/>

<PhysicalNetwork physicalNetworkName="db2network" physicalNetworkProtocol="ip">

<Interface interfaceName="eth0" clusterNodeName="saplxvm07">

<IPAddress baseAddress="9.26.166.199" subnetMask="255.255.254.0"

networkName="db2network"/>

</Interface>

<Interface interfaceName="eth0" clusterNodeName="saplxvm08">

<IPAddress baseAddress="9.26.166.200" subnetMask="255.255.254.0"

networkName="db2network"/>

</Interface>

<LogicalSubnet baseAddress="9.26.166.0" subnetMask="255.255.254.0"

networkName="db2network"/>

</PhysicalNetwork>

<ClusterNode clusterNodeName="saplxvm07"/>

<ClusterNode clusterNodeName="saplxvm08"/>

</ClusterDomain>

<FailoverPolicy>

<HADRFailover />

</FailoverPolicy>

<DB2PartitionSet>

<DB2Partition dbpartitionnum="0" instanceName="db2aha" />

</DB2PartitionSet>

<HADRDBSet>

<HADRDB databaseName="AHA"

localInstance="db2aha"

remoteInstance="db2aha"

localHost="saplxvm07"

remoteHost="saplxvm08" />

<VirtualIPAddress baseAddress="9.26.166.97" subnetMask="255.255.254.0"

networkName="db2network" />

</HADRDBSet>

</DB2Cluster>

5.2.3 Displaying the database cluster

Option “3 - Show Database Cluster State” will display the cluster status.

Example:

Page 32: Enabling Database High Availability Using DB2 HADR and IBM ...

32

Figure 8: lssam output of SA MP cluster configuration for HADR

Note: With option 3, the lssam command is executed by the script to collect the status output.

The following terminologies help describe the lssam output:

Term Description

Peer Domain A cluster of servers or nodes

Resource Fixed or floating hardware or software

Resource Group

A virtual group of resources

Equivalency A fixed set of resources of the same class that provide the same functionality

Nominal State The desired state of a resource. It can be online or offline. If changed, SA MP will bring a resource online or shut it down.

The cluster consists of three resource groups and three equivalency resource groups with the same functionalities. Equivalency resource groups allow SA MP to select any resource with the same functionality to perform an operation in case of a failure. The following table is an example of resource groups and their equivalency resource groups.

Example:

Resource Group Description

db2_db2aha_db2aha_AHA-rg The database instance resource group consists of the primary and the standby instances, the application resource db2_db2aha_db2aha_AHA, and the service IP resource db2ip_9_26_166_97.

db2_db2aha_saplxvm07_0-rg Database instance resource group for the host saplxvm07

db2_db2aha_saplxvm08_0-rg Database instance resource group for the host saplxvm08

Page 33: Enabling Database High Availability Using DB2 HADR and IBM ...

33

db2_db2aha_db2aha_AHA-rg_group-equ Equivalency database instance resource group equivalent to db2_db2aha_db2aha_AHA-rg

db2_db2aha_saplxvm07_0-rg_group-equ

Equivalency database instance resource group for the host saplxvm07

db2_db2aha_saplxvm08_0-rg_group-equ

Equivalency database instance resource group for the host saplxvm08

In the above lssam output, the Online IBM.ResourceGroup:db2_db2aha_db2aha_AHA-rg

Nominal=Online entry means that the resource group db2_db2aha_db2aha_AHA-rg is online and its

nominal state is also online. Note that for the resource group db2_db2aha_db2aha_AHA-rg, both the

standby and the primary resource applications use the same service IP, but only one of them is online. The

lsrpdomain and the lsrpnode commands can be used to check the domain status.

Example:

saplxvm07:~/samp_scripts # lsrpdomain

Name OpState RSCTActiveVersion MixedVersions TSPort GSPort

sap_ahadb2 Online 3.1.4.4 No 12347 12348

saplxvm07:~/samp_scripts # lsrpnode

Name OpState RSCTVersion

saplxvm07 Online 3.1.4.4

saplxvm08 Online 3.1.4.4

Use ifconfig -a on the primary host to check that the virtual IP address is linked to the primary host IP

address.

Example:

saplxvm07:~/samp_scripts # ifconfig -a

eth0 Link encap:Ethernet HWaddr 00:0C:29:1A:98:28

inet addr:9.26.166.199 Bcast:9.26.167.255 Mask:255.255.254.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:166592 errors:0 dropped:581 overruns:0 frame:0

TX packets:106491 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:169945146 (162.0 Mb) TX bytes:18069161 (17.2 Mb)

eth0:0 Link encap:Ethernet HWaddr 00:0C:29:1A:98:28

inet addr:9.26.166.97 Bcast:9.26.167.255 Mask:255.255.254.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

The link is defined by eth0:0 in the sample output above where 9.26.166.97 is the virtual IP address. The standby host IP configuration remains unchanged.

Note: During a failover scenario, this IP link is removed and a new link between the standby host and the virtual IP is created in the standby host.

For more information on SA MP clusters and resources, refer to https://www.ibm.com/developerworks/tivoli/library/tv-tivoli-system-automation/

Page 34: Enabling Database High Availability Using DB2 HADR and IBM ...

34

5.2.4. Enabling the SAP system with virtual database host name and IP address

This step has already been completed by sapdb2cluster.sh.

On saplxvm06, the “SAPDBHOST” and “j2ee/dbhost” in the default profile and “hostname” in the “db2cli.ini” file are replaced by the virtual host name. In the following example, the virtual host name is “saplxvmsap”:

saplxvm06:db2aha 34> grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL

SAPDBHOST = saplxvmsap

j2ee/dbhost = saplxvmsap

saplxvm06:~ # cat /usr/sap/AHA/SYS/global/db6/db2cli.ini

; Comment lines start with a semi-colon.

[AHA]

Database=AHA

protocol=tcpip

hostname=saplxvmsap

servicename=5912

[COMMON]

diagpath=/usr/sap/AHA/SYS/global/db6/db2dump

Once the database cluster is created, all connections to the database must be refreshed to pick up the change. The virtual host (saplxvmsap) is reflected as the database host in the SAP system.

Figure 9: Dashboard screen in the DBA Cockpit (Web Dynpro user interface)

5.3 HADR micro-outage feature test using the Graceful Maintenance Tool (GMT)

The micro-outage feature of SAP on IBM DB2 for LUW can be used for pausing SAP applications for a short period to perform a controlled failover without having to stop any SAP ABAP application servers. This allows administrators to perform certain database maintenance without any significant downtime. The GMT makes

Page 35: Enabling Database High Availability Using DB2 HADR and IBM ...

35

optimal use of the micro-outage feature and provides an easy way to perform a controlled, graceful HADR failover.

The GMT can be downloaded from SAP Note 1530812 - DB6: Graceful Maintenance Tool and can be used from the primary or the standby host as root user. The GMT requires the ABAP routines attached to SAP Note 1907533 - ABAP Routines for Graceful Maintenance Tool (GMT) and SAP Note 1443426 - DB6: Graceful Cluster Switch.

Example:

saplxvm07:~/samp_scripts # tar –xzvf gmt_scripts_633_20140317.tgz

gmt_scripts/

gmt_scripts/exitDB2Restart.sh

gmt_scripts/exitFPActivate.sh

gmt_scripts/exitNoOp.sh

gmt_scripts/exitResumeBtcExternal.sh

gmt_scripts/exitSuspendBtcExternal.sh

gmt_scripts/sapdb2gmt.sh

The script (sapdb2gmt.sh) offers the following three self-explanatory options:

1. Create, Show or Edit GMT Configuration 2. Check Graceful Prerequisites 3. Init Graceful Maintenance Mode

Example:

saplxvm07:~/samp_scripts/gmt_scripts # ./sapdb2gmt.sh -l sapdb2gmt.log -f

sapdb2gmt.conf

./sapdb2gmt.sh version 6.33 started on Fri May 23 14:58:13 EDT 2014

---Graceful Maintenance Tool (GMT) for SAP running on DB2 LUW (Version 6.33)---

1 - Create, Show or Edit GMT Configuration

2 - Check Graceful Prerequisites

3 - Init Graceful Maintenance Mode

e - Exit

Input:

5.3.1 GMT Configuration

The Graceful Maintenance Tool needs to be configured first before it can be used. Option 1 is used to configure and generate the configuration file (sapdb2gmt.conf). Logs are stored in the sapdb2gmt.log.

Example:

Input: 1

Show GMT configuration

General System Configuration

[1] SAP_SID = AHA

[2] TSA_REMOTE_CMD = ssh

Database Configuration

[3] DB2_INST_DIR = /db2/db2aha/db2_software

[4] DB2_DB2INSTANCE = db2aha

Page 36: Enabling Database High Availability Using DB2 HADR and IBM ...

36

Database Graceful Maintenance Tool

[5] DB2_GMT_TIMEOUT = 100

[6] DB2_GMT_CMD = CLUSTER_FAILOVER

[7] DB2_GMT_AS_HOST = saplxvm06

[8] DB2_GMT_AS_NR = 1

[9] DB2_GMT_CLIENT = 001

[10] DB2_GMT_USER = DDIC

Edit GMT configuration

Press Enter to Exit or select a number to edit a parameter (e.g. 1 for SAP_SID):

SAP RFC Configuration Parameters

Parameter Description Value (example)

DB2_GMT_AS_HOST Host name of the SAP primary application server saplxvm06

DB2_GMT_AS_NR Instance number of the SAP primary application server

1

DB2_GMT_USER User for RFC calls DDIC DB2_GMT_CLIENT Client to use for RFC calls 001

After the configuration has been completed, Option 2 can be used to check all prerequisites for micro failover using the GMT.

5.3.2 Micro-failover test

Option 3 of GMT (3 - Init Graceful Maintenance Mode) initiates a DB2 HADR cluster failover.

Example:

Read configuration file

Check general configuration

Check database configuration : OK

Starting graceful maintenance mode

Enter Parameter for Graceful Maintenance Mode

[1] SAP_SID = AHA

[2] TSA_REMOTE_CMD = ssh

[3] DB2_GMT_TIMEOUT = 100

[4] DB2_GMT_SAP_BTC_GRACE_PERIOD = 60

[5] DB2_GMT_CMD = CLUSTER_FAILOVER

[6] DB2_GMT_SAP_COMM_MODE = SAPEVT

[7] DB2_GMT_SAP_EVENT_ACTIVATE = SAP_DBA_GMT_ACTIVATE

[8] DB2_GMT_SAP_EVENT_BTC_SUSPEND = SAP_DBA_GMT_SUSPEND_BATCH_JOBS

[9] DB2_GMT_SAP_EVENT_BTC_RESUME = SAP_DBA_GMT_RESUME_BATCH_JOBS

[10] DB2_GMT_SAP_SCRIPT_BTC_SUSPEND =

[11] DB2_GMT_SAP_SCRIPT_BTC_RESUME =

Check Cluster Prerequisites

Checking DB2 HADR Peer State

Checking DB2 HADR Peer State : OK

Checking DB2 HADR Peer State : OK

Check General Prerequisites

Page 37: Enabling Database High Availability Using DB2 HADR and IBM ...

37

Clean quiesce file : OK

Checking SAP DBSL feature prerequisites: : OK

Checking for transactions running for more than 60 seconds : WARNING

*** WARNING: Found Long Running Transactions (Tue Jul 22 11:10:53 EDT 2014) ***

COMMENT APPL_NAME AGENT_ID UOW_START_TIME STATUS RUNTIME

SAP_USER SAP_APPL_SERVER SAP_WP_TYPE SAP_RE

---------- ------------ -------- ------------------- ------------ ----------- ---

--------- -------------------- ----------- ------ --------------

LONGRUNNER DB2ATS 3354 2014-07-22-11.00.00 LOCKWAIT 652

DB2AHA saplxvm07 -Task: -

1 record(s) selected.

WARNING: Long running transactions are active which might be canceled (rolled

back)!

*******************************************************************************

Checking SAP Stack Type : OK

Checking for Java Connections : OK

Checking database connection as db2aha : OK

Checking R3trans connection as ahaadm : OK

Checking ABAP functions (SAP Note 1907533) : OK

Warnings occurred. Do you want to proceed with graceful maintenance? [Yes|No]:

Yes

The tool displays a list of active transactions and asks for confirmation to proceed.

If “Yes” is selected, the script proceeds and waits at the step, “Waiting for the Quiesce file

(current: 12 s; timeout: 65 s)” until the SAP applications are paused.

Example:

Warnings occurred. Do you want to proceed with graceful maintenance? [Yes|No]:

Yes

Suspend SAP batch jobs

Create event SAP_DBA_GMT_SUSPEND_BATCH_JOBS via sapevt : OK

Suspend external batch jobs via exit script : OK [skipped]

Waiting grace period (31 s)

Note: This requires downtime. All connections to the database will be closed.

After automatically creating the quiesce file

“/usr/sap/AHA/SYS/global/db6_dbsl_quiesce_def_connections”, the script displays a list of

database connections and gives you the option to either forcefully close connections and roll back all transactions, or to wait for transactions to be completed and connections to be closed from application side. After the cluster switch, the standby, which is in host saplxvm08, becomes the new primary, while the primary, in host saplxvm07, becomes the new standby.

Example:

Waiting grace period (0 s) : OK

Enable micro outage feature of DBSL

Create event SAP_DBA_GMT_ACTIVATE via sapevt : OK

Waiting for quiesce file (current: 3 s; timeout: 65 s) : OK

Closing database connections (9) : OK [skipped]

Page 38: Enabling Database High Availability Using DB2 HADR and IBM ...

38

************** Open Connections (Tue Jul 22 11:46:10 EDT 2014) ***************

APPL_NAME AGENT_ID AUTHID UOW_START_TIME STATUS

SAP_USER SAP_APPL_SERVER SAP_WP_TYPE SAP_REPORT

-------------------- -------- ---------- ------------------- -------------------- -

----------- -------------------- ----------- ------------------------------

DB2ATS 3354 DB2AHA 2014-07-22-11.00.00 LOCKWAIT

DB2AHA saplxvm07 -Task: -

dw.sapAHA_DVEBMGS01 27 SAPAHA 2014-07-22-11.44.27 UOWWAIT

SAPSYS saplxvm06 DIA SAPMSSY2

dw.sapAHA_DVEBMGS01 28 SAPAHA 2014-07-22-11.42.27 UOWWAIT

SAPSYS saplxvm06 SPO SAPLSPOA

dw.sapAHA_DVEBMGS01 29 SAPAHA 2014-07-22-11.43.27 UOWWAIT

SAPSYS saplxvm06 DIA SAPMSSY2

dw.sapAHA_DVEBMGS01 32 SAPAHA 2014-07-22-11.44.27 UOWWAIT

SAPSYS saplxvm06 DIA SAPMSSY2

dw.sapAHA_DVEBMGS01 34 SAPAHA 2014-07-22-00.01.03 UOWWAIT

DDIC saplxvm06 BTC SAPMSSY2

dw.sapAHA_DVEBMGS01 35 SAPAHA 2014-07-22-11.43.27 UOWWAIT

SAPSYS saplxvm06 DIA SAPMSSY2

dw.sapAHA_DVEBMGS01 37 SAPAHA 2014-07-22-11.44.10 UOWWAIT

SAPSYS saplxvm06 DIA CL_ABSTRACT_SAML_PROTOCOL=====

dw.sapAHA_DVEBMGS01 39 SAPAHA 2014-07-22-11.39.27 UOWWAIT

SAPSYS saplxvm06 DIA SAPMSSY2

9 record(s) selected.

*******************************************************************************

Do you want to wait 60 seconds longer? [Yes|No]: Yes

Total Timeout is now 240 (Maximum allowed 400)

Closing database connections (6)

If all connections cannot be closed within 400 seconds, the tool will remind the user again to wait or to continue:

Example:

Total Timeout is now 240 (Maximum allowed 400)

Closing database connections (5) : OK [skipped]

\n ************** Open Connections (Tue Jul 22 11:48:30 EDT 2014) ***************

APPL_NAME AGENT_ID AUTHID UOW_START_TIME STATUS

SAP_USER SAP_APPL_SERVER SAP_WP_TYPE SAP_REPORT

-------------------- -------- ---------- ------------------- -------------------- -

----------- -------------------- ----------- ------------------------------

DB2ATS 3354 DB2AHA 2014-07-22-11.00.00 LOCKWAIT

DB2AHA saplxvm07 -Task: -

dw.sapAHA_DVEBMGS01 32 SAPAHA 2014-07-22-11.44.27 UOWWAIT

SAPSYS saplxvm06 DIA SAPMSSY2

dw.sapAHA_DVEBMGS01 34 SAPAHA 2014-07-22-00.01.03 UOWWAIT

DDIC saplxvm06 BTC SAPMSSY2

dw.sapAHA_DVEBMGS01 35 SAPAHA 2014-07-22-11.43.27 UOWWAIT

SAPSYS saplxvm06 DIA SAPMSSY2

dw.sapAHA_DVEBMGS01 37 SAPAHA 2014-07-22-11.44.10 UOWWAIT

SAPSYS saplxvm06 DIA CL_ABSTRACT_SAML_PROTOCOL=====

dw.sapAHA_DVEBMGS01 39 SAPAHA 2014-07-22-11.39.27 UOWWAIT

SAPSYS saplxvm06 DIA SAPMSSY2

6 record(s) selected.

*******************************************************************************

Page 39: Enabling Database High Availability Using DB2 HADR and IBM ...

39

Do you want to wait 60 seconds longer? [Yes|No]: No

If “No” is selected, the tool will continue closing connections and prompt if the transactions can be forcefully rolled back.

Example:

Do you want to wait 60 seconds longer? [Yes|No]: No

Do you want to force applications (rollback of running transactions)?

[Yes|No|Continue]: Yes

Execute DB2 QUIESCE for AHA: : OK

Waiting for DB2 QUIESCE (current: 1 s; timeout: 10 s; connections: 0): OK

Execute Database Cluster Failover (HADR) : OK

Execute DB2 UNQUIESCE: : OK

Disable micro outage feature of DBSL : OK

Clean quiesce file : OK

Checking database connection as db2aha : OK

Checking R3trans connection as ahaadm : OK

Resume SAP batch jobs

Create event SAP_DBA_GMT_RESUME_BATCH_JOBS via sapevt : OK

Resume external batch jobs via exit script : OK [skipped]

Graceful Maintenance Mode Start : Tue Jul 22 11:42:24 EDT 2014

Graceful Maintenance Mode End : Tue Jul 22 11:50:05 EDT 2014

Action finished. Press Enter to continue ...

The cluster should reflect the changes and can be displayed using the lssam command.

Example:

saplxvm07:ahaadm 5> lssam

Online IBM.ResourceGroup:db2_db2aha_db2aha_AHA-rg Nominal=Online

|- Online IBM.Application:db2_db2aha_db2aha_AHA-rs

|- Offline IBM.Application:db2_db2aha_db2aha_AHA-rs:saplxvm07

'- Online IBM.Application:db2_db2aha_db2aha_AHA-rs:saplxvm08

'- Online IBM.ServiceIP:db2ip_9_26_166_97-rs

|- Offline IBM.ServiceIP:db2ip_9_26_166_97-rs:saplxvm07

'- Online IBM.ServiceIP:db2ip_9_26_166_97-rs:saplxvm08

Online IBM.ResourceGroup:db2_db2aha_saplxvm07_0-rg Nominal=Online

'- Online IBM.Application:db2_db2aha_saplxvm07_0-rs

'- Online IBM.Application:db2_db2aha_saplxvm07_0-rs:saplxvm07

Online IBM.ResourceGroup:db2_db2aha_saplxvm08_0-rg Nominal=Online

'- Online IBM.Application:db2_db2aha_saplxvm08_0-rs

'- Online IBM.Application:db2_db2aha_saplxvm08_0-rs:saplxvm08

Online IBM.Equivalency:db2_db2aha_db2aha_AHA-rg_group-equ

|- Online IBM.PeerNode:saplxvm08:saplxvm08

'- Online IBM.PeerNode:saplxvm07:saplxvm07

Online IBM.Equivalency:db2_db2aha_saplxvm07_0-rg_group-equ

'- Online IBM.PeerNode:saplxvm07:saplxvm07

Online IBM.Equivalency:db2_db2aha_saplxvm08_0-rg_group-equ

'- Online IBM.PeerNode:saplxvm08:saplxvm08

Online IBM.Equivalency:db2network

Page 40: Enabling Database High Availability Using DB2 HADR and IBM ...

40

|- Online IBM.NetworkInterface:eth0:saplxvm08

'- Online IBM.NetworkInterface:eth0:saplxvm07

The virtual IP link from the old primary is removed and a new link is created in the new primary host (saplxvm08).

saplxvm08:~ # ifconfig eth0:0

eth0:0 Link encap:Ethernet HWaddr 00:0C:29:76:2C:2C

inet addr:9.26.166.97 Bcast:9.26.167.255 Mask:255.255.254.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

During the cluster switch, the changes can be monitored live using the lssam -top command as user root from any of the hosts.

5.3.4 Testing a disaster scenario

If SA MP fails to connect to the primary, it will first try to start the primary before failing over to the standby. Therefore, force stopping the primary will not cause an automatic failover. An automatic failover will be triggered if the primary host is unplugged or the OS is stopped. The following example shows how to test a disaster scenario and trigger an automatic failover. In the example, the main DB2 engine process, db2sysc, will be renamed to simulate SA MP being unable to restart DB2 on the primary, causing SA MP to initiate a failover. This is similar to a kernel panic that will kill the db2sysc process, but the host is still available.

Example:

1. From a command line window, “lssam -top” is issued to monitor the cluster status live.

2. To bring down the primary without shutting down the server, the /db2/db2aha/sqllib/adm/db2sysc file is renamed to /db2/db2aha/sqllib/adm/db2sysc_backup and the “db2sysc” process is killed on the primary host. The rename is necessary, otherwise SA MP will just restart DB2 on the primary and a failover will not be triggered.

saplxvm07:db2aha 54> mv /db2/db2aha/sqllib/adm/db2sysc

/db2/db2aha/sqllib/adm/db2sysc_backup

saplxvm07:db2aha 55> db2_kill -9 db2sysc

Application ipclean: Removing DB2 engine and client IPC resources for

db2aha.

3. The changes are reflected live in the lssam output:

Page 41: Enabling Database High Availability Using DB2 HADR and IBM ...

41

Figure 10: Cluster status after the primary is down (lssam output)

After a successful takeover, the primary (saplxvm07) will switch roles with the standby (saplxvm08). Once the test is complete, the /db2/db2aha/sqllib/adm/db2sysc_backup must be moved back to /db2/db2aha/sqllib/adm/db2sysc. Once the file is moved back, the old primary (saplxvm07) will be automatically brought back up and activated as the new standby. It may take several minutes. The status can be monitored using the lssam -top command.

Figure 11: Cluster status after a successful takeover (lssam output)

Note: Moving the db2sysc file is only performed to simulate a disaster scenario and is not recommended.

Page 42: Enabling Database High Availability Using DB2 HADR and IBM ...

42

6. Installing the auxiliary standby database instance As described earlier, the auxiliary standby is for DR purposes only and should be used to protect data from wide spread disasters. Adding an auxiliary standby is similar to adding a principal standby except for minor changes to the DB2 database configuration. The following sections show how to add the first auxiliary standby.

6.1 Mounting file systems

The directories /sapmnt/<SID>/exe, /sapmnt/<SID>/profile, and /sapmnt/<SID>/global from the SAP application server must be mounted on the auxiliary standby host.

Example:

saplxvm09:~ # mount | grep AHA

saplxvm06:/sapmnt/AHA/exe on /sapmnt/AHA/exe type nfs (rw,addr=9.26.166.198)

saplxvm06:/sapmnt/AHA/global on /sapmnt/AHA/global type nfs (rw,addr=9.26.166.198)

saplxvm06:/sapmnt/AHA/profile on /sapmnt/AHA/profile type nfs (rw,addr=9.26.166.198)

6.2 Updating port configurations

A new port number (AHA_HADR_3:5953) is defined in the /etc/services file of the primary, the standby, and all auxiliary standby servers. This port number will be used for the newly added auxiliary standby server (saplxvm09) HADR local service name database configuration parameter.

Example:

sapdb2AHA 5912/tcp

AHA_HADR_1 5951/tcp # DB2 HADR log shipping

AHA_HADR_2 5952/tcp # DB2 HADR log shipping

AHA_HADR_3 5953/tcp # DB2 HADR log shipping

sapmsAHA 3600/tcp # SAP System Message Server Port

DB2_db2aha 60006/tcp

DB2_db2aha_1 60007/tcp

DB2_db2aha_2 60008/tcp

DB2_db2aha_3 60009/tcp

DB2_db2aha_4 60010/tcp

DB2_db2aha_END 60011/tcp

6.3 Performing a homogeneous system copy using SWPM

Sections 4.4 Homogeneous system copy using SWPM, 4.5 Configuring ports, and 4.6 Restoring the database from a backup of this document must be completed on the new auxiliary host (saplxvm09).

Note: The homogeneous system copy changes the SAPDBHOST and j2ee/dbhost variables to saplxvm09 in the SAP default profile /sapmnt/AHA/profile/DEFAULT.PFL. This needs to be manually changed back to the virtual host, saplxvmsap. The auxiliary standby is also in rollforward pending mode just like the principal standby.

6.4 Configuring the HADR auxiliary standby database

On the new auxiliary standby host (saplxvm09), as user db2aha, the following sample script is executed to configure the auxiliary standby database for HADR:

saplxvm09:db2aha 81> cat auxiliary_standby_hadr_cfg.sql

UPDATE DB CFG FOR AHA USING HADR_LOCAL_HOST saplxvm09;

UPDATE DB CFG FOR AHA USING HADR_LOCAL_SVC AHA_HADR_3;

UPDATE DB CFG FOR AHA USING HADR_REMOTE_HOST saplxvm07;

UPDATE DB CFG FOR AHA USING HADR_REMOTE_SVC AHA_HADR_1;

UPDATE DB CFG FOR AHA USING HADR_REMOTE_INST db2aha;

Page 43: Enabling Database High Availability Using DB2 HADR and IBM ...

43

UPDATE DB CFG FOR AHA USING HADR_TIMEOUT 120;

UPDATE DB CFG FOR AHA USING HADR_TARGET_LIST

saplxvm07:AHA_HADR_1|saplxvm08:AHA_HADR_2;

UPDATE DB CFG FOR AHA USING HADR_SYNCMODE SUPERASYNC;

UPDATE DB CFG FOR AHA USING HADR_SPOOL_LIMIT 1000;

UPDATE DB CFG FOR AHA USING HADR_PEER_WINDOW 240;

UPDATE DB CFG FOR AHA USING indexrec RESTART logindexbuild ON;

saplxvm09:db2aha 37> db2 –z auxiliary_hadr_cfg.sql.log -tvf

auxiliary_standby_hadr_cfg.sql

saplxvm09:db2aha 55> db2 get db cfg for aha | grep HADR

HADR database role = STANDBY

HADR local host name (HADR_LOCAL_HOST) = saplxvm09

HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_3

HADR remote host name (HADR_REMOTE_HOST) = saplxvm07

HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_1

HADR instance name of remote server (HADR_REMOTE_INST) = db2aha

HADR timeout value (HADR_TIMEOUT) = 120

HADR target list (HADR_TARGET_LIST) =

saplxvm07:AHA_HADR_1|saplxvm08:AHA_HADR_2

HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000

HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0

HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

Note: The HADR_TARGET_LIST parameter is where the other two HADR server host names and their port numbers are listed in pairs. The order means that the first host in the list is the principal standby and the second host is the auxiliary standby, and so on.

The HADR_TARGET_LIST database configuration parameter also needs to be updated in the primary and the standby.

Example:

saplxvm07:db2aha 190> db2 "UPDATE DB CFG FOR AHA USING HADR_TARGET_LIST

saplxvm08:AHA_HADR_2|saplxvm09:AHA_HADR_3"

DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.

SQL1363W One or more of the parameters submitted for immediate modification

were not changed dynamically. For these configuration parameters, the database

must be shutdown and reactivated before the configuration parameter changes

become effective.

saplxvm07:db2aha 70> db2 get db cfg for aha | grep HADR

HADR database role = PRIMARY

HADR local host name (HADR_LOCAL_HOST) = saplxvm07

HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_1

HADR remote host name (HADR_REMOTE_HOST) = saplxvm08

HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_2

HADR instance name of remote server (HADR_REMOTE_INST) = db2aha

HADR timeout value (HADR_TIMEOUT) = 120

HADR target list (HADR_TARGET_LIST) =

saplxvm08:AHA_HADR_2|saplxvm09:AHA_HADR_3

Page 44: Enabling Database High Availability Using DB2 HADR and IBM ...

44

HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000

HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0

HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

saplxvm08:db2aha 95> db2 "UPDATE DB CFG FOR AHA USING HADR_TARGET_LIST

saplxvm07:AHA_HADR_1|saplxvm09:AHA_HADR_3"

DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.

SQL1363W One or more of the parameters submitted for immediate modification

were not changed dynamically. For these configuration parameters, the database

must be shutdown and reactivated before the configuration parameter changes

become effective.

saplxvm08:db2aha 55> db2 get db cfg for aha | grep HADR

HADR database role = STANDBY

HADR local host name (HADR_LOCAL_HOST) = saplxvm08

HADR local service name (HADR_LOCAL_SVC) = AHA_HADR_2

HADR remote host name (HADR_REMOTE_HOST) = saplxvm07

HADR remote service name (HADR_REMOTE_SVC) = AHA_HADR_1

HADR instance name of remote server (HADR_REMOTE_INST) = db2aha

HADR timeout value (HADR_TIMEOUT) = 120

HADR target list (HADR_TARGET_LIST) =

saplxvm07:AHA_HADR_1|saplxvm09:AHA_HADR_3

HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC

HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000

HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0

HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240

Note: To convert from a single standby to multiple standbys, as the above message indicated, the database must be deactivated and reactivated. This will require a downtime of the system. It is recommended to quiesce the SAP system to close the connections temporarily instead of stopping.

Example:

saplxvm09:db2aha 41> db2 deactivate db aha

DB20000I The DEACTIVATE DATABASE command completed successfully.

saplxvm08:db2aha 98> db2 deactivate db aha

DB20000I The DEACTIVATE DATABASE command completed successfully.

saplxvm07:db2aha 192> db2 deactivate db aha

DB20000I The DEACTIVATE DATABASE command completed successfully.

saplxvm09:db2aha 42> db2 start hadr on db aha as standby

DB20000I The START HADR ON DATABASE command completed successfully.

saplxvm08:db2aha 99> db2 activate db aha

DB20000I The ACTIVATE DATABASE command completed successfully.

saplxvm07:db2aha 193> db2 activate db aha

DB20000I The ACTIVATE DATABASE command completed successfully.

Page 45: Enabling Database High Availability Using DB2 HADR and IBM ...

45

Once HADR is successfully activated, the db2pd –hadr command on the primary host will list all of the standbys:

Example:

saplxvm07:db2aha 200> db2pd -d aha -hadr

Database Member 0 -- Database AHA -- Active -- Up 0 days 00:01:49 -- Date 2013-12-03-

16.16.07.536459

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = NEARSYNC

STANDBY_ID = 1

LOG_STREAM_ID = 0

HADR_STATE = PEER

HADR_FLAGS =

PRIMARY_MEMBER_HOST = saplxvm07

PRIMARY_INSTANCE = db2aha

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = saplxvm08

STANDBY_INSTANCE = db2aha

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 12/03/2013 16:14:20.709721 (1386105260)

HEARTBEAT_INTERVAL(seconds) = 30

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 17

PEER_WAIT_LIMIT(seconds) = 0

LOG_HADR_WAIT_CUR(seconds) = 0.000

LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000

LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000

LOG_HADR_WAIT_COUNT = 0

SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384

SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380

PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785

STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)

STANDBY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)

STANDBY_REPLAY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)

STANDBY_RECV_BUF_SIZE(pages) = 2048

STANDBY_RECV_BUF_PERCENT = 0

STANDBY_SPOOL_LIMIT(pages) = 1000

STANDBY_SPOOL_PERCENT = 0

PEER_WINDOW(seconds) = 240

PEER_WINDOW_END = 12/03/2013 16:19:51.000000 (1386105591)

READS_ON_STANDBY_ENABLED = N

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = SUPERASYNC

STANDBY_ID = 2

LOG_STREAM_ID = 0

HADR_STATE = REMOTE_CATCHUP

Page 46: Enabling Database High Availability Using DB2 HADR and IBM ...

46

HADR_FLAGS =

PRIMARY_MEMBER_HOST = saplxvm07

PRIMARY_INSTANCE = db2aha

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = saplxvm09

STANDBY_INSTANCE = db2aha

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 12/03/2013 16:14:21.118983 (1386105261)

HEARTBEAT_INTERVAL(seconds) = 30

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 16

PEER_WAIT_LIMIT(seconds) = 0

LOG_HADR_WAIT_CUR(seconds) = 0.000

LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000

LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000

LOG_HADR_WAIT_COUNT = 0

SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384

SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380

PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785

STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 0, 2849058785

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)

STANDBY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)

STANDBY_REPLAY_LOG_TIME = 12/03/2013 16:10:07.000000 (1386105007)

STANDBY_RECV_BUF_SIZE(pages) = 2048

STANDBY_RECV_BUF_PERCENT = 0

STANDBY_SPOOL_LIMIT(pages) = 1000

STANDBY_SPOOL_PERCENT = 0

PEER_WINDOW(seconds) = 0

READS_ON_STANDBY_ENABLED = N

For more information on DB2’s HADR multiple standby database feature, see the IBM Knowledge Center: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.admin.ha.doc/doc/c0059994.html

Page 47: Enabling Database High Availability Using DB2 HADR and IBM ...

47

7 Failover scenarios As mentioned earlier, the auxiliary standbys are only for disaster recovery (DR) purposes and automatic failover is not supported between the primary and auxiliary standbys.

The HADR setup in the example of this paper has a primary, a principal standby, and an auxiliary standby. The database manager configuration parameter HADR_TARGET_LIST on the primary, the principal standby, and the auxiliary standby are set as follows:

saplxvm07:db2aha 71> db2 get db cfg for aha | grep HADR_TARGET_LIST

HADR target list (HADR_TARGET_LIST) =

saplxvm08:AHA_HADR_2|saplxvm09:AHA_HADR_3

saplxvm08:db2aha 56> db2 get db cfg for aha | grep HADR_TARGET_LIST

HADR target list (HADR_TARGET_LIST) =

saplxvm07:AHA_HADR_1|saplxvm09:AHA_HADR_3

saplxvm09:db2aha 58> db2 get db cfg for aha | grep HADR_TARGET_LIST

HADR target list (HADR_TARGET_LIST) =

saplxvm07:AHA_HADR_1|saplxvm08:AHA_HADR_2

With these settings, if the primary (on host saplxvm07) is down, the standby host saplxvm08 will become the new primary host. The database on host saplxvm07 will be the principal standby, and saplxvm09 will be the auxiliary standby. Once a failover happens, the goal should always be to bring back the failed host and return to the original configuration. The following subsections show failover scenarios and how to recover from them.

To test these failover scenarios, a DB2 software failure is simulated by killing the db2sysc process, which is the main DB2 process. This is similar to a kernel panic and causes DB2 to become inaccessible.

Note: During the test scenario, workload is generated using SAP transaction SGEN, and the cluster is monitored using lssam –top.

Example:

saplxvm07:db2aha 64> ps -ef | grep -i db2sysc

db2aha 4428 4426 2 10:18 ? 00:00:45 db2sysc 0

db2aha 32216 28174 0 10:48 pts/0 00:00:00 grep -i db2sysc

saplxvm07:db2aha 65> which db2sysc

/db2/db2aha/sqllib/adm/db2sysc

saplxvm07:db2aha 66> mv /db2/db2aha/sqllib/adm/db2sysc

/db2/db2aha/sqllib/adm/db2sysc.backup

saplxvm07:db2aha 66>

saplxvm07:db2aha 66> kill -9 4428

saplxvm07:db2aha 66>

7.1 Failover scenario #1: The primary is down

The following example describes the scenario when the primary is down. This scenario simulates a failure when DB2 is inaccessible. When the primary (on host saplxvm07) goes down, the standby (on host saplxvm08) will automatically take over. The SAP workload is temporarily interrupted but is automatically failed over to the standby database and continues without stopping the application. The following screenshots show changes in the cluster during the failover:

Page 48: Enabling Database High Availability Using DB2 HADR and IBM ...

48

Figure 12: Cluster status during failover

As shown in the above figure, the cluster resource for host saplxvm07 is in pending online state as TSA tries to restart DB2. However, since it is inaccessible (db2sysc was renamed, so it cannot start DB2), failover occurs, as shown in the figure below:

Figure 13: Cluster status after the primary went down and the standby took over

As shown in the figure above, failover has occurred, the database is up as the resource group

db2_db2aha_db2aha_AHA is set to Online. The database is up

asIBM.Application:db2_db2aha_db2aha_AHA is set to Online.

Furthermore, it also shows that IBM.Application:db2_db2aha_db2aha_AHA is running on host

saplxvm08. The VIP 9.26.166.97 is shown to be bonded to the network interface on host saplxvm08.

Page 49: Enabling Database High Availability Using DB2 HADR and IBM ...

49

The resource group IBM.ResourceGroup:db2_db2aha_saplxvm07 has the status Pending Online.

This indicates that it cannot start DB2 on saplxvm07 (since db2sysc has been renamed). Once DB2 can

be started on saplxvm07, SA MP automatically detects that and assigns saplxvm07 as the principal

standby.

The example below shows the renaming of db2sysc to make DB2 accessible again.

Example:

saplxvm07:db2aha 74> mv /db2/db2aha/sqllib/adm/db2sysc.backup

/db2/db2aha/sqllib/adm/db2sysc

saplxvm07:db2aha 81> db2start

07/23/2014 11:34:10 0 0 SQL1063N DB2START processing was successful.

SQL1063N DB2START processing was successful.

After DB2 becomes accessible on host saplxvm07, SA MP assigns it as the principal standby. The auxiliary

standby remains as it is. Once the failed system is brought back up and is back in the cluster, the system will

be in HADR catch up state. All the logs from the current primary (saplxvm08) must be replayed for the

system to be in PEER state. To make saplxvm07 the primary again, HADR takeover can be performed

using the Graceful Maintenance Tool (GMT) as described in section 5.3.2 of this document.

7.2 Failover scenario #2: Both the primary and principal standby are down

The following scenario describes a situation where both the primary and the principal standby are unavailable. This is similar to a disaster recovery situation where the auxiliary standby must be brought online.

When the primary (on host saplxvm07) and the principal standby are both unavailable, all applications will

be stopped as no automatic failover is available. A manual takeover must be initiated from the auxiliary standby database.

The following figure shows the SA MP resources when both the primary and the principal standby are down.

Figure 14: Cluster status after both the primary and the standby went down

The following steps must be performed to make the auxiliary standby the new primary and to start SAP:

1. Stop the SAP central instance and all application servers.

Page 50: Enabling Database High Availability Using DB2 HADR and IBM ...

50

2. The parameter SAPDBHOST in the SAP profile /sapmnt/AHA/profile/DEFAULT.PFL needs to be

updated to point to the auxiliary standby host saplxvm09. Currently, it is pointing to the virtual host name

saplxvmsap. The new values should look as follows:

saplxvm06:~ # grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL

SAPDBHOST = saplxvm09

j2ee/dbhost = saplxvm09

3. The parameter Hostname in the CLI driver file db2cli.ini needs to be updated to point to the auxiliary

standby host saplxvm09 instead of the virtual host saplxvmsap:

saplxvm06:ahaadm 92> cat /sapmnt/AHA/global/db6/db2cli.ini

; Comment lines start with a semi-colon.

[AHA]

Database=AHA

Protocol=tcpip

Hostname=saplxvm09

Servicename=5912

[COMMON]

Diagpath=/usr/sap/AHA/SYS/global/db6/db2dump

The takeover HADR command is executed on the auxiliary standby to make the host saplxvm09 the new

primary. The takeover BY FORCE option must be used because the primary and the principal standby are not

available.

saplxvm09:db2aha 11> db2pd -db aha -hadr | grep HADR_ROLE

HADR_ROLE = STANDBY

saplxvm09:db2aha 12> db2 TAKEOVER HADR ON DATABASE AHA BY FORCE

DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.

saplxvm09:db2aha 13> db2pd -db aha -hadr | grep HADR_ROLE

HADR_ROLE = PRIMARY

saplxvm09:db2aha 17> db2pd -db aha -hadr | grep HADR_STATE

HADR_STATE = DISCONNECTED

5. The SAP central instance and application servers can be started.

Note: The host saplxvm09 is now the primary and all applications are connected directly to this host. The SA MP cluster and automatic failover are not in effect. Note that because the auxiliary standby is for DR purposes only and is

forced to use the HADR SUPERASYNC synchronization mode, the failover may come with the cost of inflight

transaction loss during this kind of widespread disaster. The takeover operation may take longer depending on the amount of logs to be replayed from the buffer as well as from disk if log spooling is used.

Once the old primary, host saplxvm07, and the old principal standby, host saplxvm08, are brought back

up, HADR is not active on those hosts:

saplxvm07:db2aha 102> db2pd -db aha -hadr

Database AHA not activated on database member 0 or this database name cannot be found

in the local database directory.

Option -hadr requires -db <database> or -alldbs option and active database.

saplxvm08:db2aha 65> db2pd -db aha -hadr

Database AHA not activated on database member 0 or this database name cannot be found

in the local database directory.

Option -hadr requires -db <database> or -alldbs option and active database.

Page 51: Enabling Database High Availability Using DB2 HADR and IBM ...

51

The following steps can be performed to include the principal standby and the old primary back into the HADR cluster.

1. At the moment, the old primary on host saplxvm07 cannot be activated since the auxiliary standby has

become the new primary through forced takeover. The host saplxvm07 still is the primary, so any attempt

to activate it will result in an error because a primary is already running.

saplxvm07:db2aha 108> db2 activate db aha

SQL1776N The command cannot be issued on an HADR database. Reason code = "6".

saplxvm07:db2aha 109> db2 “? SQL1776N”

6

This database is an old primary database. It cannot be started

because the standby has become the new primary through forced

takeover.

Therefore, HADR must be started on the old primary as a standby:

saplxvm07:db2aha 111> db2 start hadr on db aha as standby

DB20000I The START HADR ON DATABASE command completed successfully.

2. The principal standby on host saplxvm08 can be activated since it is still a standby:

saplxvm08:db2aha 67> db2 activate db aha

DB20000I The ACTIVATE DATABASE command completed successfully.

The changes are reflected in the new primary, on host saplxvm09, as shown in the db2pd output below:

saplxvm09:db2aha 18> db2pd -db aha -hadr

Database Member 0 -- Database AHA -- Active -- Up 0 days 02:53:57 -- Date 2014-07-23-

13.38.18.130220

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = NEARSYNC

STANDBY_ID = 1

LOG_STREAM_ID = 0

HADR_STATE = PEER

HADR_FLAGS =

PRIMARY_MEMBER_HOST = saplxvm09

PRIMARY_INSTANCE = db2aha

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = saplxvm07

STANDBY_INSTANCE = db2aha

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 07/23/2014 13:36:31.238696 (1406136991)

HEARTBEAT_INTERVAL(seconds) = 30

HEARTBEAT_MISSED = 0

HEARTBEAT_EXPECTED = 239

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 17

PEER_WAIT_LIMIT(seconds) = 0

LOG_HADR_WAIT_CUR(seconds) = 0.000

LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000049

LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000

Page 52: Enabling Database High Availability Using DB2 HADR and IBM ...

52

LOG_HADR_WAIT_COUNT = 1

SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384

SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380

PRIMARY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376

STANDBY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)

STANDBY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)

STANDBY_REPLAY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)

STANDBY_RECV_BUF_SIZE(pages) = 2048

STANDBY_RECV_BUF_PERCENT = 0

STANDBY_SPOOL_LIMIT(pages) = 1000

STANDBY_SPOOL_PERCENT = 0

STANDBY_ERROR_TIME = NULL

PEER_WINDOW(seconds) = 240

PEER_WINDOW_END = 07/23/2014 13:41:54.000000 (1406137314)

READS_ON_STANDBY_ENABLED = N

HADR_ROLE = PRIMARY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = SUPERASYNC

STANDBY_ID = 2

LOG_STREAM_ID = 0

HADR_STATE = REMOTE_CATCHUP

HADR_FLAGS =

PRIMARY_MEMBER_HOST = saplxvm09

PRIMARY_INSTANCE = db2aha

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = saplxvm08

STANDBY_INSTANCE = db2aha

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 07/23/2014 13:19:45.058717 (1406135985)

HEARTBEAT_INTERVAL(seconds) = 30

HEARTBEAT_MISSED = 0

HEARTBEAT_EXPECTED = 37

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 3

PEER_WAIT_LIMIT(seconds) = 0

LOG_HADR_WAIT_CUR(seconds) = 0.000

LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000049

LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000

LOG_HADR_WAIT_COUNT = 1

SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384

SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380

PRIMARY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376

STANDBY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000525.LOG, 2233, 37242077376

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)

STANDBY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)

STANDBY_REPLAY_LOG_TIME = 07/23/2014 13:37:23.000000 (1406137043)

STANDBY_RECV_BUF_SIZE(pages) = 2048

STANDBY_RECV_BUF_PERCENT = 0

Page 53: Enabling Database High Availability Using DB2 HADR and IBM ...

53

STANDBY_SPOOL_LIMIT(pages) = 1000

STANDBY_SPOOL_PERCENT = 0

STANDBY_ERROR_TIME = NULL

PEER_WINDOW(seconds) = 0

READS_ON_STANDBY_ENABLED = N

At the moment, host saplxvm09 is the primary, host saplxvm07 is the principal standby, and host

saplxvm08 is the auxiliary standby host. The cluster and automatic failover are still not active. The following

steps can be performed to go back to the original setup with the primary on host saplxvm07, the principal

standby on host saplxvm08, and the auxiliary standby host saplxvm09.

1. The SAP central instance and application servers must be stopped to make changes to the SAP profile variables.

2. The cluster configuration must be deleted using option 4 - Delete Database Cluster of

sapdb2cluster.sh to avoid interference (SQL1770N).

3. The takeover HADR command is executed in the standby host saplxvm07.

saplxvm07:db2aha 125> db2pd -db aha -hadr | grep HADR_ROLE

HADR_ROLE = STANDBY

saplxvm07:db2aha 126> db2 TAKEOVER HADR ON DATABASE AHA

DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.

saplxvm07:db2aha 127> db2pd -db aha -hadr | grep HADR_ROLE

HADR_ROLE = PRIMARY

4. The parameter SAPDBHOST in the SAP profile, /sapmnt/AHA/profile/DEFAULT.PFL needs to be

updated to point to the virtual host saplxvmsap instead of saplxvm09. The new values should look as

follows:

saplxvm06:~ # grep -i dbhost /sapmnt/AHA/profile/DEFAULT.PFL

SAPDBHOST = saplxvmsap

j2ee/dbhost = saplxvmsap

5. The parameter Hostname in the CLI driver file needs to be updated to point to the virtual host

saplxvmsap instead of the saplxvm09:

saplxvm06:ahaadm 92> cat /sapmnt/AHA/global/db6/db2cli.ini

; Comment lines start with a semi-colon.

[AHA]

Database=AHA

Protocol=tcpip

Hostname=saplxvmsap

Servicename=5912

[COMMON]

Diagpath=/usr/sap/AHA/SYS/global/db6/db2dump

6. Now that the original HADR configuration has been restored, option 2 - Create Database Cluster of

sapdb2cluster.sh can be used to create the cluster and enable automatic failover.

Note: The cluster configuration script generates the cluster_config.xml file in the /tmp directory and executes the

db2haicu -f /tmp/cluster_config.xml command in the principal standby and the primary hosts. Error

messages are logged in sapdb2cluster.log, /tmp/cluster_config.log, and db2diag.log files.

For errors related to the configuration, it is recommended to check the /tmp/cluster_config.xml file format

and values. More details can be found in the IBM Knowledge Center at http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/t0052800.html?lang=en

Page 54: Enabling Database High Availability Using DB2 HADR and IBM ...

54

7. Now the SAP central instance and application servers can be started.

7.3 Failover scenario #3: The principal standby is down

If the principal standby is down, it will simply be out of the HADR cluster. The applications are not affected. The standby simply needs to be activated and it will be back in the cluster.

Example:

saplxvm08:db2aha 68> db2pd -db aha -hadr

Database AHA not activated on database member 0 or this database name cannot be found

in the local database directory.

Option -hadr requires -db <database> or -alldbs option and active database.

saplxvm08:db2aha 70> db2 activate db aha

DB20000I The ACTIVATE DATABASE command completed successfully.

saplxvm08:db2aha 71> db2pd -db aha -hadr

Database Member 0 -- Database AHA -- Standby -- Up 0 days 00:00:08 -- Date 2014-07-

23-15.18.54.335704

HADR_ROLE = STANDBY

REPLAY_TYPE = PHYSICAL

HADR_SYNCMODE = NEARSYNC

STANDBY_ID = 0

LOG_STREAM_ID = 0

HADR_STATE = PEER

HADR_FLAGS =

PRIMARY_MEMBER_HOST = saplxvm07

PRIMARY_INSTANCE = db2aha

PRIMARY_MEMBER = 0

STANDBY_MEMBER_HOST = saplxvm08

STANDBY_INSTANCE = db2aha

STANDBY_MEMBER = 0

HADR_CONNECT_STATUS = CONNECTED

HADR_CONNECT_STATUS_TIME = 07/23/2014 15:18:49.837590 (1406143129)

HEARTBEAT_INTERVAL(seconds) = 30

HEARTBEAT_MISSED = 0

HEARTBEAT_EXPECTED = 0

HADR_TIMEOUT(seconds) = 120

TIME_SINCE_LAST_RECV(seconds) = 0

PEER_WAIT_LIMIT(seconds) = 0

LOG_HADR_WAIT_CUR(seconds) = 0.000

LOG_HADR_WAIT_RECENT_AVG(seconds) = 1.446530

LOG_HADR_WAIT_ACCUMULATED(seconds) = 230.194

LOG_HADR_WAIT_COUNT = 1093

SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384

SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380

PRIMARY_LOG_FILE,PAGE,POS = S0000525.LOG, 10870, 37277278904

STANDBY_LOG_FILE,PAGE,POS = S0000525.LOG, 10870, 37277278904

HADR_LOG_GAP(bytes) = 0

STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000525.LOG, 10870, 37277278904

STANDBY_RECV_REPLAY_GAP(bytes) = 0

PRIMARY_LOG_TIME = 07/23/2014 15:18:06.000000 (1406143086)

STANDBY_LOG_TIME = 07/23/2014 15:18:06.000000 (1406143086)

Page 55: Enabling Database High Availability Using DB2 HADR and IBM ...

55

STANDBY_REPLAY_LOG_TIME = 07/23/2014 15:18:06.000000 (1406143086)

STANDBY_RECV_BUF_SIZE(pages) = 2048

STANDBY_RECV_BUF_PERCENT = 0

STANDBY_SPOOL_LIMIT(pages) = 1000

STANDBY_SPOOL_PERCENT = 0

STANDBY_ERROR_TIME = NULL

PEER_WINDOW(seconds) = 240

PEER_WINDOW_END = 07/23/2014 15:16:10.000000 (1406142970)

READS_ON_STANDBY_ENABLED = N

Note: Once HADR has been set up properly with multiple standbys, it is recommended to exercise different failover scenarios and develop documented procedures to follow during an actual disaster.

Page 56: Enabling Database High Availability Using DB2 HADR and IBM ...

56

8 Miscellaneous troubleshooting in an SA MP environment This section introduces some useful commands for operations within an SA MP environment.

8.1 HADR congestion

DB2 HADR works by sending database logs via TCP/IP from the primary to the standby. On the standby, the logs are stored in a buffer controlled by the DB2 registry variable DB2_HADR_BUF_SIZE. If the standby cannot keep up with the amount of logs being shipped by the primary, the buffer might fill up, in which case the primary will no longer be able to ship logs. The cluster will be in a CONGESTED state.

Example:

saplxvm07:db2aha 59> db2pd -db aha -hadr | grep -i HADR_CONNECT_STATUS

HADR_CONNECT_STATUS = CONGESTED

HADR_CONNECT_STATUS_TIME = 07/23/2014 15:18:49.835990 (1406143129)

Note: In this case, the HADR_CONNECT_STATUS_TIME shows the congestion start time.

While in a CONGESTED state and with a synchronization mode of SYNC, NEARSYNC, or even ASYNC, transactions on the primary are stopped. With SUPERASYNC, transactions continue while the cluster is in a CONGESTED state.

Note: To solve this issue, it is recommended to increase DB2_HADR_BUF_SIZE on the standby (as well as on the primary in case it becomes the standby after a takeover). By default, DB2_HADR_BUF_SIZE is twice the size of the primary's LOGBUFSZ which is also the minimum value. As mentioned earlier, HADR_SPOOL_LIMIT can be used to avoid HADR congestion. HADR_SPOOL_LIMIT allows the standby to spool logs to disk if the buffer is full. This means that the primary can continue with transactions without having to wait for the standby to flush out the logs from the buffer. This is especially effective if congestion is occurring during peak time.

Example:

saplxvm07:db2aha 61> db2 get db cfg for aha | grep -i LOGBUFSZ

Log buffer size (4KB) (LOGBUFSZ) = 1024

saplxvm07:db2aha 63> db2 get db cfg for aha | grep -i HADR_SPOOL_LIMIT

HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 1000

Note: DB2 HADR-related registry variables can be checked using the db2set command. Unless the registry variables are set to a user-defined value, the default values are used for each variable.

Example:

saplxvm07:db2aha 58> db2set -lr | grep -i hadr

DB2_HADR_BUF_SIZE

DB2_HADR_NO_IP_CHECK

DB2_HADR_PEER_WAIT_LIMIT

DB2_HADR_SOSNDBUF

DB2_HADR_SORCVBUF

DB2_HADR_ROS

saplxvm07:db2aha 50> db2 get db cfg for aha | grep LOGBUFSZ

Log buffer size (4KB) (LOGBUFSZ) = 1024

saplxvm07:db2aha 51> db2set DB2_HADR_BUF_SIZE=3072

saplxvm07:db2aha 52> db2set | grep HADR

DB2_HADR_BUF_SIZE=3072

Page 57: Enabling Database High Availability Using DB2 HADR and IBM ...

57

More on the DB2 HADR log shipping method can be found under the following link:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20log%20shipping

For DB2 Version 10.5 and higher, db2fodc also provides an option to collect congestion-related traces automatically. The automatic congestion trace can be turned on and off using the following commands:

Example:

saplxvm07:db2aha 54> db2fodc -hadr -db AHA -detect

"db2fodc": Starting detection ...

db2fodc HADR congestion detect rules:

iteration=1 sleeptime=0(sec) triggercount=10 interval=30(sec) duration=-

1(hour)

db2fodc:

Hostname: saplxvm07 HADR congestion detect iteration: 1

saplxvm07:db2aha 50> db2fodc -detect off

"db2fodc": Stopping all FODC detections. Note that it can take up to 60

seconds to stop all detections.

More on the DB2 HADR automatic congestion detection tool can be found at http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.trb.doc/doc/r0060632.html?lang=en

8.2 Manual creation and deletion of an SA MP cluster

In case of an error during the cluster creation using SAP’s Cluster Setup Tool sapdb2cluster.sh, the DB2

tool db2haicu can be used instead. The XML configuration file cluster_config.xml can be used with

db2haicu to manually create the cluster. An example of the file was provided earlier in section 5.2.2. A cluster must be created on the standby as user db2<sid> first, followed by the primary, as shown in the example below.

Example:

saplxvm08:db2aha 57> db2haicu -f /tmp/cluster_config.xml

Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file

called db2diag.log. Also, you can use the utility called db2pd to query the status of

the cluster domains you create.

For more information about configuring your clustered environment using db2haicu, see

the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in

the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is 'db2aha'. The

cluster configuration that follows will apply to this instance.

db2haicu is collecting information on your current setup. This step may take some

time as db2haicu will need to activate all databases for the instance to discover all

paths ...

Creating domain 'sap_ahadb2' in the cluster ...

Creating domain 'sap_ahadb2' in the cluster was successful.

Page 58: Enabling Database High Availability Using DB2 HADR and IBM ...

58

Configuring quorum device for domain 'sap_ahadb2' ...

Configuring quorum device for domain 'sap_ahadb2' was successful.

Adding network interface card 'eth0' on cluster node 'saplxvm07' to the network

'db2network' ...

Adding network interface card 'eth0' on cluster node 'saplxvm07' to the network

'db2network' was successful.

Adding network interface card 'eth0' on cluster node 'saplxvm08' to the network

'db2network' ...

Adding network interface card 'eth0' on cluster node 'saplxvm08' to the network

'db2network' was successful.

Adding DB2 database partition '0' to the cluster ...

Adding DB2 database partition '0' to the cluster was successful.

HADR database 'AHA' has been determined to be valid for high availability. However,

the database cannot be added to the cluster from this node because db2haicu detected

this node is the standby for HADR database 'AHA'. Run db2haicu on the primary for

HADR database 'AHA' to configure the database for automated failover.

All cluster configurations have been completed successfully. db2haicu exiting ...

saplxvm07:db2aha 57> db2haicu -f /tmp/cluster_config.xml

Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file

called db2diag.log. Also, you can use the utility called db2pd to query the status of

the cluster domains you create.

For more information about configuring your clustered environment using db2haicu, see

the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in

the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is 'db2aha'. The

cluster configuration that follows will apply to this instance.

db2haicu is collecting information on your current setup. This step may take some

time as db2haicu will need to activate all databases for the instance to discover all

paths ...

Configuring quorum device for domain 'sap_ahadb2' ...

Configuring quorum device for domain 'sap_ahadb2' was successful.

Network adapter 'eth0' on node 'saplxvm07' is already defined in network 'db2network'

and cannot be added to another network until it is removed from its current network.

Network adapter 'eth0' on node 'saplxvm08' is already defined in network 'db2network'

and cannot be added to another network until it is removed from its current network.

Adding DB2 database partition '0' to the cluster ...

Adding DB2 database partition '0' to the cluster was successful.

Adding HADR database 'AHA' to the domain ...

Adding HADR database 'AHA' to the domain was successful.

All cluster configurations have been completed successfully. db2haicu exiting ...

saplxvm07:db2aha 49> db2haicu -delete

Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

Removing HADR database 'AHA' from the domain ...

Removing HADR database 'AHA' from the domain was successful.

Removing DB2 database partition '0' from the cluster ...

Removing DB2 database partition '0' from the cluster was successful.

All cluster configurations have been completed successfully. db2haicu exiting ...

saplxvm08:db2aha 52> db2haicu –delete > /tmp/db2haicu_delete.txt

Page 59: Enabling Database High Availability Using DB2 HADR and IBM ...

59

8.3 SA MP cluster resource group

The rgreq command can be used to start, stop, cancel, lock, unlock, or move an SA MP resource group.

Example:

The following command is used to unlock the resource group db2_db2aha_db2aha_AHA-rg:

saplxvm08:~ # rgreq -o unlock db2_db2aha_db2aha_AHA-rg

Note: Because of APAR IC98315: VIRTUAL IP RESOURCE (IBM.SERVICEIP) CANNOT BE FOUND, one of the resource groups in the DB2 HADR cluster configuration may remain locked after a graceful cluster switch in DB2 10.5 GA. The issue has been resolved in DB2 10.5 FP1.

8.4 Collection of traces

The “lssam” command with the –T option can be used to write a trace on screen.

saplxvm07:~ # lssam –T

Traces can also be collected for a particular resource manager (RM) using the lssrc tool.

lssrc -ls IBM.RecoveryRM

lssrc -ls IBM.GblResRM

lssrc -ls IBM.StorageRM

To collect a trace, first find out where the trace files are located using the following command:

saplxvm07:~ # lssrc -ls IBM.RecoveryRM | grep trace_summary

/var/ct/sap_ahadb2/log/mc/IBM.RecoveryRM/trace_summary -> spooling not enabled

saplxvm07:~ # lssrc -ls IBM.GblResRM | grep trace_summary

/var/ct/sap_ahadb2/log/mc/IBM.GblResRM/trace_summary -> spooling not enabled

The following commands can be used to format the trace to a more readable text and store them to the specified location /tmp:

saplxvm07:~ # rpttr -odtic /var/ct/sap_ahadb2/log/mc/IBM.RecoveryRM/trace_summary >

/tmp/RecoveryRM_trace.out

saplxvm07:~ # rpttr -odtic /var/ct/sap_ahadb2/log/mc/IBM.GblResRM/trace_summary >

/tmp/ GblResRM_trace.out

The samlog command is a handy tool that can be used to collect, format, merge, and display SA MP-related

logs.

Example:

saplxvm07:~ # samlog -t 15m | more

samlog called at 2014-09-25 15:30:35 on saplxvm07 with options

System time offset between local host and saplxvm08 is +5.29 seconds. You may ad

just system times in cluster.

saplxvm07 0.00 IBM.RecoveryRM trace_summary, IBM.GblResRM trace_summary

saplxvm08 +5.29 IBM.RecoveryRM trace_summary, IBM.GblResRM trace_summary

-------------------------------------------------------------------------

A list of IBM Tivoli System Automation command references can be found using the following link:

http://www-01.ibm.com/support/knowledgecenter/SSRM2X_4.1.0/com.ibm.samp.doc_4.1/samprgcharmcmds.html?lang=en

Page 60: Enabling Database High Availability Using DB2 HADR and IBM ...

60

8.5 HADR simulator

The DB2 HADR simulator can help plan, measure, and diagnose an HADR environment quickly and efficiently. The tool can be downloaded for free from the IBM developerWorks wiki page: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/HADR%20simulator?section=Introduction. The wiki page also provides a detailed description of how to use the simulator.

8.6 Split-brain condition

If both databases in an HADR cluster become primaries independently, the HADR connection is lost and applications can connect to both databases. The databases will be inconsistent and this condition is referred to as an HADR split-brain. The following situations can lead to a split-brain condition:

If the standby becomes the primary and the original primary is brought back up using the START HADR command with the AS PRIMARY BY FORCE option.

The TAKEOVER HADR command is issued on the standby with the PEER WINDOW ONLY option and the primary is not brought down before the peer window expires.

After a forced takeover, the HADR-related configuration parameters (hadr_remote_host,

hadr_remote_inst, and hadr_remote_svc) are automatically updated on the new primary and

its standbys including the old primary. If the primary is not shut down before a forced takeover from the standby, it might result in a split-brain condition as the automatic reconfiguration does not take place until the old primary is shut down and restarted as standby.

To avoid a split brain situation during a force takeover, the standby sends a disabling message, also called a poison pill, to the primary. The primary is shut down and cannot be reactivated unless the poison pill is

cleared by a START HADR command. More information on proper takeover scenarios in a multiple standby

HADR configuration can be found at the following link: http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.ha.doc/doc/c0059999.html?lang=en

Note: Once a split-brain condition is encountered, HADR must be configured from scratch using a database backup from the host that has the most up-to-date logs. In such a situation, it is recommended to contact SAP support.

Page 61: Enabling Database High Availability Using DB2 HADR and IBM ...

61

9 Conclusion The DB2 HADR feature delivers a complete Disaster Recovery (DR), High Availability (HA) and Continuous Availability solution, providing customers with greater data protection with minimum performance impact. HADR also comes with a variety of configuration options to satisfy different business needs.

For example, SYNC mode can be used for guaranteed database log shipping as opposed to SUPERASYNC

mode which provides zero performance impact because the primary does not wait for an ACK after logs are shipped. Log spooling helps achieve higher performance, reduced chance of congestion, and greater data protection.

Automatic failover using SA MP along with SAP’s Cluster Setup Tool and Graceful Maintenance Tool make it easy for customers to monitor and maintain the HADR cluster.

With HADR support for IBM DB2 BLU Acceleration, DB2 HADR can now be used in SAP BW environments with DB2 column-organized tables providing essential high performance as well as crucial DR and HA capabilities. Moreover, DB2 HADR and BLU Acceleration are provided as a DB2 feature and can be enabled out of the box.

With improvements to the DB2 LOAD command, customers can now take advantage of the faster LOAD

operation without compromising data replication.

Under ideal conditions, the tools and processes described in this paper can be used to implement an SAP Business Suite system that is continuously available, protected from wide-spread disasters, and facilitates micro-outages to help perform database maintenance with zero downtime.

Page 62: Enabling Database High Availability Using DB2 HADR and IBM ...

62

10 Related Content Note 1555903 - DB6: Supported DB2 Database Features

Note 1612105 - DB6: FAQ for DB2 High Availability Disaster Recovery (HADR)

Note 1746101 - DB6: High Availability with SAP on DB2 using SA MP

Note 1443426 - DB6: Graceful Cluster Switch

Note 960843 - DB6: Cluster Setup Tool

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR

Page 63: Enabling Database High Availability Using DB2 HADR and IBM ...

63

Copyright

© 2014 SAP SE SE or an SAP SE affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE. The information contained herein may be changed without prior notice.

Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary.

These materials are provided by SAP SE and its affiliated companies (“SAP SE Group”) for informational purposes only, without representation or warranty of any kind, and SAP SE Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

SAP SE and other SAP SE products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE in Germany and other countries.

Please see http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices.


Recommended