Cold Failover With Oracle Cluster Ware V2(1)

Cold Failover with Oracle Clusterware.

Introduction:This part will explain how to setup an Oracle Coldfailover environment with Oracle Clusterware. Some knowledge about Cluster environments is required.

With the introduction of Oracle Clusterware 10g, Oracle provided their own cluster software, where other vendors’ Clusterware was not needed anymore and could reduce the complexity of an environment where Real Application Clusters is used.

Already in the Oracle 10g Release 1 there was the Clusterware API, but no documentation about the functionality was available. In Oracle 10g Release 2 this information was provided in the Oracle Manuals and using it is supported.

It is clear if you look at Oracle that they start to position Oracle Clusterware more and more as a standalone product that must be used in RAC environments but can also be used to protect other applications as wel

Why use Oracle Clusterware, when to use RAC or Dataguard or other HA solutions, is a question I often hear, and there are persons pro Dataguard or pro RAC. My personal point of view is what are the business requirements ? Implementing RAC requires new skills, but in an environment where scaling is not an issue, and where a system down of 15 minutes is also not a problem why use RAC? But you often see environments where Dataguard is in place but no switchover is performed when problems appear. Due to the fact that application can not work with the other location or the Dataguard system can not handle the amount of work.

Positioning Oracle Clusterware for Coldfailover cases is basically in the middle of RAC and Dataguard. Where the main purpose of Dataguard is to protect site failures. RAC is needed to prevent downtime and/or scalability. Oracle Clusterware can be used to do a failover when a problem occurs or maintenance is needed on an environment but RAC is a step to far. If you are familiar with Oracle FAILSAFE on windows, it is easy to compare this coldfailover functionality with Oracle FAILSAFE but now available for all platforms.

First we will focus on how to protect a database instance using Oracle Clusterware.But if you want to protect other applications the options are endless. This is due to the fact that the basics are the same and after that you can script any requirement. Oracle also provides several white papers with examples, and during investigation of using Clusterware for coldfailover these documents where also used.

Pros and cons.In no particular order some of the advantages/disadvantages when using coldfailover are:

No full utilization of the available hardware. Less expensive then RAC. Starting point to go to RAC for the future. Provide technical solutions based on HA demands. Possible to use RAC and NON-RAC in one single grid. There is no usage of the racgimon as there is no RAC The usage of Oracle Clusterware for all kind of other HA solutions like HTTP servers,

databases, Enterprise Manager DBConsole can be used to manage it all.

Starting point:To better understand what is written in the rest of this document some knowledge on how to install Oracle Clusterware and Oracle Automatic Storage Management is required.

Installing Oracle Clusterware:To be able to use Cold failover, you must install Oracle Clusterware and meet the requirements mentioned in the installation manual. (To prepare an environment for Real Application Cluster).

Looking at the details, this means you need to interconnect your nodes, you must have shared storage available for the OCR (Oracle cluster registry) and Voting Disk A VIP (virtual IP) is also required, including the GSD, ONS and listener. Later on you can decide to stop resources that you are not using. (In this case the GSD, ONS)

Installing Clusterfile System:For the cold failover part related to the database instances, having a clustered file system in place is very useful and reduces the failover time needed.It is not a MUST but my advice is to use it !

Why? In a coldfailover situation it must be possible to have LUN connected to node 1 and during a failover the same LUN can be connected to node 2. You have to option to arrange this during the failover process, but it can introduces mistakes and increase the time needed to do the failover. It’s preferred to prepare everything, do it in advance and make use of the sharing of the disks. Automatic Storage Management (ASM) can help in this case. Using OCFSv2 on Linux can also be used and is available for free.

When you don’t use a Clusterfile system, you must make sure that in a coldfailover case the luns are attached to the failover node.

For now I will assume that ASM will be used. Follow the general installation manual related to the RDBMS. Don’t create a database, select to create an ASM instance or create a ASM instance after installation.

When the above steps are performed, you are ready to install the RDBMS.

Installing Oracle RDBMS:Once again I assume you know how to install the RDBMS software on each node.During the startup of the runInstaller it will detect that Oracle Clusterware is active and you are able to select a local install or start installing Real application cluster.

Make sure you select the local install option. Define the ORACLE_HOME name and location. Make your life as a DBA easier by defining the same path/location on each node in the coldfailover environment.

Once you have installed the RDBMS software on node 1, you need to run the installer again on the other node, define the same location, and select local install again.

Create Database:Create a database instance on one of the nodes, just as you normally would do, no additional steps are required yet. If we look to the status after installation you will have an environment like the picture below.

The installation of all the required components are performed, we can start to configure our own resources which will be managed by Oracle Clusterware.

The examples and technical part is based on the following environment. Notice that same part will also working on 10g Release 2 stack, using Release 1 is not supported.

2 nodes with Enterprise Linux 4 update 5.Oracle Clusterware 11g (11.1.0.6) Oracle ASM 11g (11.1.0.6) installed with cluster_database=true parameterOracle RDBMS 11g (11.1.0.6) dbname and instance name siprod.

First explain why it is not possible to use the default resource names. When creating an RAC database using the DBCA the OCR (Oracle Cluster registry) is also updated with the resources

starting with ora.<dbname>.db and ora.<dbname>.<instancename>.inst

One of the parameters of the resource is the hosting_members, this mean where the instance will be active. The main reason why you can not use srvctl commands for adding a resource. Modifying ora.<names>.<type> resources is not supported by Oracle.

See below the errors you will receive when trying to use the default srvctl commands.

First add the resource manual to the OCR.[oracle@racworkshop1 ~]$ srvctl add database -d siprod -o /u01/app/oracle/product/11.1.0/si_1/[oracle@racworkshop1 ~]$ srvctl configsiprod[oracle@racworkshop1 ~]$ srvctl add instance -d siprod -i siprod -n racworkshop1,racworkshop2PRKO-2003 : Invalid command line option value: racworkshop1,racworkshop2[oracle@racworkshop1 ~]$ srvctl add instance -d siprod -i siprod -n racworkshop1[oracle@racworkshop1 ~]$ srvctl start database -d siprod Sofar so good, siprod instance is running on the first node.

[oracle@racworkshop1 ~]$ srvctl status database -d siprodInstance siprod is running on node racworkshop1[oracle@racworkshop1 ~]$ ps -ef | grep pmonoracle 6663 1 0 08:22 ? 00:00:00 asm_pmon_+ASM1oracle 2830 1 0 10:39 ? 00:00:00 ora_pmon_siprodoracle 3558 12714 0 10:40 pts/0 00:00:00 grep pmon

Now the failover part. [oracle@racworkshop1 ~]$ crs_relocate ora.siprod.siprod.inst -c racworkshop2CRS-1019: Resource ora.siprod.siprod.inst (application) cannot run on racworkshop2CRS-0223: Resource 'ora.siprod.siprod.inst' has placement error. Normal doesn't work, so with the force option, still not working ?

[oracle@racworkshop1 ~]$ crs_relocate ora.siprod.siprod.inst -c racworkshop2 -fCRS-1019: Resource ora.siprod.siprod.inst (application) cannot run on racworkshop2 CRS-0223: Resource 'ora.siprod.siprod.inst' has placement error. Oke, lets try to add a new resource for the passive node.

[oracle@racworkshop1 ~]$ srvctl add instance -d siprod -i siprod -n racworkshop2PRKO-2010 : Error in adding instance to node: racworkshop2PRKR-1008 : adding of instance siprod on node racworkshop2 to cluster database siprod failed.CRS-0211: Resource 'ora.siprod.siprod.inst' has already been registered. This is also not possible because of the already existing resource name.Conclusion: Don't use srvctl to add the default resource names.

What is needed to have Cold failover working properly and automatically?

1) A default script performing a start/stop/check 2) Have this script located on each node in the $CRS_HOME/crs/public location3) Create a resource profile using the script at point 1.4) Copy pfile and password file5) A placeholder to failover all required resources. (optional) 6) A script for the placeholder resource. (optional)7) Test the start/stop/relocate of the resource on each node.8) Virtual IP configuration (optional)9) TNSNAMES configuration for client connections 10) Failure tests.11) Application Listener (optional)12) Remaining

1) Default scripts to perform start/stop/check.

You need to have a script available which will react on the start/stop or check command. The script in the appendix will also validate if ASM is up and running and if not the case it will start the ASM instance as well. (this assumes the diskgroup is automatically mounted).

Test result start/stop and check of the script on the commandline.

First the usage validation:[oracle@racworkshop1 public]$ ./grid_dbfailover.shUsage: grid_dbfailover.sh {start|stop|check}

Now the start validation:[oracle@racworkshop1 public]$ ./grid_dbfailover.sh start SQL*Plus: Release 11.1.0.6.0 - Production on Tue May 13 14:46:00 2008 Copyright (c) 1982, 2007, Oracle. All rights reserved. SQL> Connected to an idle instance.SQL> ORACLE instance started. Total System Global Area 334794752 bytesFixed Size 1299736 bytesVariable Size 100666088 bytesDatabase Buffers 226492416 bytesRedo Buffers 6336512 bytesDatabase mounted.Database opened.SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options

Now the stop validation:[oracle@racworkshop1 public]$ ./grid_dbfailover.sh stop SQL*Plus: Release 11.1.0.6.0 - Production on Tue May 13 14:46:54 2008 Copyright (c) 1982, 2007, Oracle. All rights reserved. SQL> Connected.SQL> Database closed.Database dismounted.ORACLE instance shut down.SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options

Now the check validation:To be able to test this we need to validate the return code, when the instance is stopped and when the instance is started.

[oracle@racworkshop1 public]$ ./grid_dbfailover.sh stop SQL*Plus: Release 11.1.0.6.0 - Production on Tue May 13 14:57:15 2008Copyright (c) 1982, 2007, Oracle. All rights reserved.SQL> Connected.SQL> Database closed.Database dismounted.ORACLE instance shut down.SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options[oracle@racworkshop1 public]$ ./grid_dbfailover.sh check[oracle@racworkshop1 public]$ echo $?1

[oracle@racworkshop1 public]$ ./grid_dbfailover.sh start SQL*Plus: Release 11.1.0.6.0 - Production on Tue May 13 14:58:16 2008Copyright (c) 1982, 2007, Oracle. All rights reserved.SQL> Connected to an idle instance.SQL> ORACLE instance started. Total System Global Area 334794752 bytesFixed Size 1299736 bytesVariable Size 100666088 bytesDatabase Buffers 226492416 bytesRedo Buffers 6336512 bytesDatabase mounted.Database opened.SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options

[oracle@racworkshop1 public]$ ./grid_dbfailover.sh check[oracle@racworkshop1 public]$ echo $?0

To use the script on the commandline, modify the _USR_ORA_LANG, _USR_ORA_SRV and _USR_ORA_FLAGS to the correct settings.When a resource profile is created those parameter values are passed from the resource profile.

2) Make the scripts available on each node in the $CRS_HOME/crs/public location.

Each of the action scripts needed must be located in the Oracle Clusterware home subfolder crs/public. This is the default location where the Clusterware stack will check for the scripts. It is possible to define another location for the action script during the creation of the resource, although I advise to use the default location. Make sure the all the required scripts are copied to all the nodes in the cluster where you want the single instance to be active.

3) Make the database single instance profile.Now that the script is created, tested and copied, it is time to create a resource.First we will create the resource .cap file and after that we will register the resource in the OCR (Oracle Cluster Registry)

Be aware that resource names starting with ora. Are not allowed.

First create the profile:[oracle@racworkshop1 ~]$ crs_profile -create grid.siprod.db -t application -r grid.prod_grp1 -a /u01/app/crs/crs/public/grid_dbfailover.sh -o ci=20,ra=5,osrv=siprod,ol=/u01/app/oracle/product/11.1.0/si_1/,oflags=1,rt=600 create=> resourcename-r => required resource-a => action script to execute-o othersci => check interval every 20 secondra => restart attemptsosrv => oracle_sid (_USR_ORA_SRV)ol=> oracle_home (_USR_ORA_LANG)oflags=> is ASM used ? (_USR_ORA_FLAGS)rt=> retry ?

Now register the profile:[oracle@racworkshop1 ~]$ crs_register grid.siprod.db

Remark:- In point 6 we talked about a placehoder, which is optional. If you are using it as a required resource. Make sure you are doing step 6 before step 3- When creating a profile a file is created in the default location, $CRS_HOME/crs/public, the profile name is followed by the filename extension .cap

4) Copy pfile and passwordfile.

Copy the init file and the password file[oracle@racworkshop1 dbs]$ scp orapwsiprod oracle@racworkshop2:/u01/app/oracle/product/11.1.0/si_1/dbs/orapwsiprod 100% 1536 1.5KB/s 00:00[oracle@racworkshop1 dbs]$ scp initsiprod.ora oracle@racworkshop2:/u01/app/oracle/product/11.1.0/si_1/dbs/initsiprod.ora 100% 40 0.0KB/s 00:00 Now copy the bdump/udump location etc.Due to the fact that we are using ASM as shared, the spfile location is also available for the single instance environment which makes management for parameters very easy.

5) Placeholder to failover all required resourcesNow that the basic part is created is can be very useful to create a placeholder for the resource of the database and all related resources. This is optional and not a requirement, but can be very useful if there are more dependencies involved like HTTP server restart or VIP relocation, dedicated listener start etc.

In the example I am using the placeholder to show the possible added value. Use the script in point 6 as action script.

First create the profle.[oracle@racworkshop1 ~]$ crs_profile -create grid.prod_grp1 -t application -a /u01/app/crs/crs/public/grid_resgroup.sh -o ci=600

Now register the resource in the OCR.[oracle@racworkshop1 ~]$ crs_register grid.prod_grp1

mailto:oracle@racworkshop2:/u01/app/oracle/product/11.1.0/si_1/dbs/

mailto:oracle@racworkshop2:/u01/app/oracle/product/11.1.0/si_1/dbs/

6) Script for the placeholder resource.A script needs to be created which will just exit successful. During the above profile creation you must define an action script. This parameter is required when you create a resource.

#!/bin/ksh# Script to start/stop and check a single instance database resource for coldfailover environment.## Description:# - This script will used in coldfailover Oracle Clusterware environments# This is a dummy script required to build an own application resource.### Requirements:# - Oracle Clusterware is active# - Scripts is placed in $ORA_CRS_HOME/crs/public location and has execute rights for the oracle user# -## Version: active version 1.0# - version 1.0 created 24 april 2007.# -> 1.1## - Bernhard de Cock Buning# - Grid IT / www.grid-it.nl / www.rachelp.nl#

#set -x -v#exec > /tmp/dbfailover.log 2>&1exit 0;

7) Test the start/stop/relocate of the resource on each node.

Now that the resources are created and the scripts are all in place we can validate if the single instance failover will work as designed. In this part we will not look at the client connections part yet, although this can be a requirement as well. In step 7 we just test if we can use the resources and simulate a failure which will result in a failover.

Start the resource on node 1[oracle@racworkshop1 ~]$ crs_stat -t | grep gridgrid.prod_grp1 application OFFLINE OFFLINEgrid.siprod.db application OFFLINE OFFLINE

[oracle@racworkshop1 ~]$ crs_start grid.prod_grp1Attempting to start `grid.prod_grp1` on member `racworkshop1`Start of `grid.prod_grp1` on member `racworkshop1` succeeded.

[oracle@racworkshop1 ~]$ crs_stat -t | grep grid

grid.prod_grp1 application ONLINE ONLINE racw...hop1grid.siprod.db application OFFLINE OFFLINE

[oracle@racworkshop1 ~]$ crs_start grid.siprod.dbAttempting to start `grid.siprod.db` on member `racworkshop1`Start of `grid.siprod.db` on member `racworkshop1` succeeded.

[oracle@racworkshop1 ~]$ ps -ef | grep pmonoracle 7583 1 0 08:05 ? 00:00:00 asm_pmon_+ASM1oracle 17011 1 0 09:26 ? 00:00:00 ora_pmon_siprodoracle 17316 12262 0 09:26 pts/0 00:00:00 grep pmon

[oracle@racworkshop1 ~]$ . oraenvORACLE_SID = [CRS] ? siprodThe Oracle base for ORACLE_HOME=/u01/app/oracle/product/11.1.0/si_1 is /u01/app/oracle[oracle@racworkshop1 ~]$ sqlplus / as sysdbaSQL*Plus: Release 11.1.0.6.0 - Production on Wed May 14 09:26:59 2008Copyright (c) 1982, 2007, Oracle. All rights reserved.Connected to:Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> select open_mode,name from v$database; OPEN_MODE NAME---------- ---------READ WRITE SIPROD

Stop the resources again and validate the status[oracle@racworkshop1 ~]$ crs_stop grid.siprod.dbAttempting to stop `grid.siprod.db` on member `racworkshop1`Stop of `grid.siprod.db` on member `racworkshop1` succeeded.

[oracle@racworkshop1 ~]$ crs_stop grid.prod_grp1Attempting to stop `grid.prod_grp1` on member `racworkshop1`Stop of `grid.prod_grp1` on member `racworkshop1` succeeded.

[oracle@racworkshop1 ~]$ crs_stat -t | grep gridgrid.prod_grp1 application OFFLINE OFFLINEgrid.siprod.db application OFFLINE OFFLINE

[oracle@racworkshop1 ~]$ ps -ef | grep pmonoracle 7583 1 0 08:05 ? 00:00:00 asm_pmon_+ASM1oracle 20143 12262 0 09:34 pts/0 00:00:00 grep pmon

Using the resource names, it is possible to start the resource on node 1

Start resources on node 2[oracle@racworkshop1 ~]$ crs_start grid.siprod.db -c racworkshop2Attempting to start `grid.prod_grp1` on member `racworkshop2`Start of `grid.prod_grp1` on member `racworkshop2` succeeded.

Attempting to start `grid.siprod.db` on member `racworkshop2`Start of `grid.siprod.db` on member `racworkshop2` succeeded.


Stop resources on node 2oracle@racworkshop1 ~]$ crs_stop grid.siprod.dbAttempting to stop `grid.siprod.db` on member `racworkshop2`Stop of `grid.siprod.db` on member `racworkshop2` succeeded.


[oracle@racworkshop1 ~]$ crs_stop grid.prod_grp1Attempting to stop `grid.prod_grp1` on member `racworkshop2`Stop of `grid.prod_grp1` on member `racworkshop2` succeeded.

Now that we are able to start the resources on each node it is time to see if we can relocate the resources. Resources active on node 1 must relocate to node 2. If successful relocate back to node 1.

Relocate the resources from node 1 to node 2[oracle@racworkshop1 ~]$ crs_stat -t | grep gridgrid.prod_grp1 application ONLINE ONLINE racw...hop1grid.siprod.db application ONLINE ONLINE racw...hop1


[oracle@racworkshop1 ~]$ crs_relocate -f grid.prod_grp1 -c racworkshop2Attempting to stop `grid.siprod.db` on member `racworkshop1`Stop of `grid.siprod.db` on member `racworkshop1` succeeded.Attempting to stop `grid.prod_grp1` on member `racworkshop1`Stop of `grid.prod_grp1` on member `racworkshop1` succeeded.Attempting to start `grid.prod_grp1` on member `racworkshop2`Start of `grid.prod_grp1` on member `racworkshop2` succeeded.Attempting to start `grid.siprod.db` on member `racworkshop2`Start of `grid.siprod.db` on member `racworkshop2` succeeded.

[oracle@racworkshop1 ~]$ ps -ef | grep pmonoracle 7583 1 0 08:05 ? 00:00:00 asm_pmon_+ASM1

[oracle@racworkshop1 ~]$ crs_stat -t | grep gridgrid.prod_grp1 application ONLINE ONLINE racw...hop2

grid.siprod.db application ONLINE ONLINE racw...hop2


Relocate the resources back from node 1 to node 2[oracle@racworkshop1 ~]$ crs_relocate -f grid.prod_grp1 -c racworkshop1Attempting to stop `grid.siprod.db` on member `racworkshop2`Stop of `grid.siprod.db` on member `racworkshop2` succeeded.Attempting to stop `grid.prod_grp1` on member `racworkshop2`Stop of `grid.prod_grp1` on member `racworkshop2` succeeded.Attempting to start `grid.prod_grp1` on member `racworkshop1`Start of `grid.prod_grp1` on member `racworkshop1` succeeded.Attempting to start `grid.siprod.db` on member `racworkshop1`Start of `grid.siprod.db` on member `racworkshop1` succeeded.

[oracle@racworkshop1 ~]$ crs_stat -t | grep gridgrid.prod_grp1 application ONLINE ONLINE racw...hop1grid.siprod.db application ONLINE ONLINE racw...hop1

[oracle@racworkshop1 ~]$ ps -ef | grep pmonoracle 7583 1 0 08:05 ? 00:00:00 asm_pmon_+ASM1oracle 31591 1 0 10:03 ? 00:00:00 ora_pmon_siprodoracle 31842 12262 0 10:03 pts/0 00:00:00 grep pmonRemark: -f is needed otherwise a relocate is not possible using the placeholder. [oracle@racworkshop1 ~]$ crs_relocate grid.prod_grp1 -c racworkshop2CRS-1022: Resource grid.siprod.db (application) is running on racworkshop1 CRS-0223: Resource 'grid.prod_grp1' has placement error.

8) Virtual IP configurationIt is possible to create an Application VIP. The decision to use an application VIP depended on the size of the GRID used. If you are using just 2 nodes, my advise would be to use the standard VIP. But if the GRID contains several nodes, it is easier to use an Application VIP as this will increase the flexibility.

A disadvantage can be that you will have a lot of Virtual IP’s active in your GRID.Here decisions need to be made regarding the best option for the organization (i.e. beneficial).

There is no script needed for the Application VIP as we will use, the Oracle provided, default usrvip script.

Modify hostfile / DNS to add application VIP192.168.1.175 coldfailover.rachelp.nl coldfailover

Create Application VIP profile and register[oracle@racworkshop1 ~]$ crs_profile -create grid.prod_grp1.vip -t application -r grid.prod_grp1 -a /u01/app/crs/bin/usrvip -o oi=eth0,ov=192.168.1.175,on=255.255.255.0

[oracle@racworkshop1 ~]$ crs_register grid.prod_grp1.vip

Modify VIP profile As root correct privileges on the vip.[root@racworkshop1 ~]# /u01/app/crs/bin/crs_setperm grid.prod_grp1.vip -o root[root@racworkshop1 ~]# /u01/app/crs/bin/crs_setperm grid.prod_grp1.vip -u user:oracle:r-x The VIP needs to be owned by root, but the Oracle user must be able to start/use it.

Validate if VIP can be used:[oracle@racworkshop1 public]$ crs_start -c racworkshop1 grid.prod_grp1.vipAttempting to start `grid.prod_grp1.vip` on member `racworkshop1`Start of `grid.prod_grp1.vip` on member `racworkshop1` succeeded.

[oracle@racworkshop1 public]$ ping -c 1 coldfailoverPING coldfailover.rachelp.nl (192.168.1.175) 56(84) bytes of data.64 bytes from coldfailover.rachelp.nl (192.168.1.175): icmp_seq=0 ttl=64 time=0.036 ms

Now the VIP is part of the prod_grp1 and will also be failover during a relocate or failure.

[oracle@racworkshop1 ~]$ crs_stat -t | grep gridgrid.prod_grp1 application ONLINE ONLINE racw...hop1grid....rp1.vip application ONLINE ONLINE racw...hop1grid.siprod.db application ONLINE ONLINE racw...hop1

[oracle@racworkshop1 ~]$ crs_relocate -f grid.prod_grp1 -c racworkshop2Attempting to stop `grid.siprod.db` on member `racworkshop1`Stop of `grid.siprod.db` on member `racworkshop1` succeeded.Attempting to stop `grid.prod_grp1.vip` on member `racworkshop1`Stop of `grid.prod_grp1.vip` on member `racworkshop1` succeeded.Attempting to stop `grid.prod_grp1` on member `racworkshop1`Stop of `grid.prod_grp1` on member `racworkshop1` succeeded.Attempting to start `grid.prod_grp1` on member `racworkshop2`Start of `grid.prod_grp1` on member `racworkshop2` succeeded.Attempting to start `grid.prod_grp1.vip` on member `racworkshop2`Start of `grid.prod_grp1.vip` on member `racworkshop2` succeeded.Attempting to start `grid.siprod.db` on member `racworkshop2`Start of `grid.siprod.db` on member `racworkshop2` succeeded.

[oracle@racworkshop1 ~]$ crs_stat -t | grep gridgrid.prod_grp1 application ONLINE ONLINE racw...hop2grid....rp1.vip application ONLINE ONLINE racw...hop2grid.siprod.db application ONLINE ONLINE racw...hop2

Remark:When using Application VIP make sure you add the VIP also to the placeholder group, as this will make relocation easier.

9) TNSNAMES configuration for client connections.

The tnsnames configuration is totally depended on the application possibilities for failover. I will not go into detail about options like client side connect time failover, load balancing etc. How to configure the TNSNAMES also depends on the usage of an Application VIP and Application Listener.

Example 1:Here we make user of the Default listener and Default VIP.The Listener configuration will have in this case has 2 ip addresses. The configuration looks like a normal RAC setup tnsnames.

siprod=(description=(address_list=(address=(protocol=tcp)(host=racworkshop1-vip)(port=1521))(address=(protocol=tcp)(host=racworkshop2-vip)(port=1521))(failover=on))(connect_data=(service_name=siprod)))

Where in this case if the listener on racworkshop1 is not active, or the service is not registered it will try to see if the listener handles the service siprod on racworkshop2.

Example 2:Now we use the Application VIP to connect to. Here the tnsnames is very straight forward as it doesn’t matter where the Application VIP is active as long as it is active and used by a listener.

siprod=(description=(address_list=(address=(protocol=tcp)(host=coldfailover)(port=1521)))(connect_data=(service_name=siprod)))

10) Failure test.

One of the main reasons why the Oracle Clusterware layer is used, is to be able to have automatic failover in case of problems.

Using the relocate command can be very useful during maintenances windows, but the most important part is to minimize downtime in case of a failure. And testing this part is needed.

Test relocate in case of a failure.The resource is active on node 1. We want to have the resource active on node 2 after a problem with the instance. Please notice the value of the restart attempts which is 5/5.

For the test we kill the pmon process of the instance.

[oracle@racworkshop1 ~]$ crs_stat -t -v | grep gridgrid.prod_grp1 application 0/1 0/0 ONLINE ONLINE racw...hop1grid.siprod.db application 5/5 0/0 ONLINE ONLINE racw...hop1


[oracle@racworkshop1 ~]$ kill -9 21908


[oracle@racworkshop1 ~]$ crs_stat -t -v | grep gridgrid.prod_grp1 application 0/1 0/0 ONLINE ONLINE racw...hop2grid.siprod.db application 0/5 0/0 ONLINE OFFLINE


As you can see a failover due to a failure is succesfull. Looking to the restart attempt counter, this is reset to 0 and will now try to start first 5 times on node 2, before a failover to node 1 will occur. Below the proof of this, as we kill pmon on node 2.

oracle@racworkshop1 ~]$ crs_stat -t -v | grep gridgrid.prod_grp1 application 0/1 0/0 ONLINE ONLINE racw...hop2grid.siprod.db application 0/5 0/0 ONLINE ONLINE racw...hop2


[oracle@racworkshop2 ~]$ kill -9 13707[oracle@racworkshop2 ~]$ ps -ef | grep pmonoracle 17281 1 0 09:40 ? 00:00:00 asm_pmon_+ASM2oracle 15043 14120 0 10:58 pts/0 00:00:00 grep pmon

[oracle@racworkshop1 ~]$ crs_stat -t -v | grep gridgrid.prod_grp1 application 0/1 0/0 ONLINE ONLINE racw...hop2grid.siprod.db application 0/5 0/0 ONLINE OFFLINE



Remark:In case of a failure 5 restart attempt are executed before it will failover to node 2. If this is value is too high, change the ra=5 value during the profile creation.

[oracle@racworkshop1 ~]$ crs_profile -create grid.siprod.db -t application -r grid.prod_grp1 -a /u01/app/crs/crs/public/grid_dbfailover.sh -o ci=20,ra=5,osrv=siprod,ol=/u01/app/oracle/product/11.1.0/si_1/,oflags=1,rt=600

Using evmwatch –A command will also show the event messages reported in the Clusterware layer.

11) Application Listener creation.

It is possible to create an Application listener for each instance/application group.If required you need to create a start/stop/check script for the listener. When preformed, create a profile and register this in the OCR. Below an example of an application listener.

Define an application Listener in the listener.ora[oracle@racworkshop1 ~]$ crs_profile -create grid.prod_grp1 -t application -a /u01/app/crs/crs/public/grid_resgroup.sh -o ci=600

Remark: Make sure each listener.ora in the coldfailover environment has the listener configured. Create the profile for the listener.[oracle@racworkshop1 public]$ crs_profile -create grid.siprod.lsnr -t application -r grid.prod_grp1.vip -a /u01/app/crs/crs/public/grid_lsnrfailover.sh -o ci=20,ra=5,osrv=listener_siprod,ol=/u01/app/oracle/product/11.1.0/si_1/

Register the resource in the OCR.[oracle@racworkshop1 ~]$ crs_register grid.siprod.lsnr

See the appendix for an example listener start/stop/check script.

Issue: during testing of killing the listener it was possible to get the resource in unknown state. The check script doesn’t have code to validate this and to correct this. When you have a resource in unknown state you first need to use crs_stop –f <resourcename> before it can be started again.

$ crs_stop –f grid.siprod.lsnr$ crs_start grid.siprod.lnsr

12) Remaining.

Overview Clusterware resources.

Based on the decisions made, you will have several resource profiles or just one for the database. When using a place holder, this will be used for administration actions by the DBA’s.

The table shows which profiles are needed based on business requirements: Solution 1 Solution 2Placeholder Use own developed

scriptApplication VIP Use $CRS_HOME/bin/

usrvip as action script profile.

Clusterware VIP Default configured VIP is used. (part of clusterware) Configure TNS properly to failover.

Application Listener

Use own developed script

Default Listener Configure TNS properly to have listener failover.

Single Database Use own developed script

Single Database Use own developed script

Conclusion:Building a cold failover environment is not hard. Oracle Clusterware takes care of the required functionality. A pre-requirement is to setup a cluster environment where cluster knowledge is needed.

Using a combination of both RAC and NON-RAC makes this very useful, and works as designed. Also the options are endless as long as you use a script to start/stop/check.

For each organization the following questions needs to be answered:- Each single instance its own VIP - Each single instance its own listener- Going to use a placeholder- ASM diskgroup naming, to separate RAC and Single instance database into different

diskgroups (if ASM is used)- Restart Attempt value before failover to other node.- TNSNAMES configuration/client failover configurations.

Appendix:

Please always test before using the below scripts in your environment.

Script to start/stop/check single instance database.

#!/bin/ksh# Script to start/stop and check a single instance database resource for coldfailover environment.## Description:# - This script will used in coldfailover Oracle Clusterware environments# where the basic Oracle Clusterware Framework /API is in place.# Script is tested on 10g and 11g Clusterware environments on Linux.### Requirements:# - Oracle Clusterware is active# - Scripts is placed in $ORA_CRS_HOME/crs/public location and has execute rights for the oracle user# - Script make use of the passed crs_profile parameters setting: _USR_ORA_LANG,_USR_ORA_SRV,_USR_ORA_FLAGS# - value for _USR_ORA_FLAGS indicates if ASM is used (1=> ASM involved, 0=> No ASM is used)# - create a clusterware profile to use the scrip by executing:# $ crs_profile -create grid.siprod.db -t application -r grid.prod_grp1 \# -a /u01/app/crs/crs/public/grid_dbfailover.sh \ # -o ci=20,ra=5,osrv=siprod,ol=/u01/app/oracle/product/11.1.0/si_1/,oflags=1,rt=600# $ crs_register grid.siprod.db ### Version: active version 1.0# - version 1.0 created 24 april 2007.# -> 1.1## - Bernhard de Cock Buning# - Grid IT / www.grid-it.nl / www.rachelp.nl#

set -x -vexec > /tmp/dbfailover.log 2>&1##_USR_ORA_LANG=/u01/app/oracle/product/11.1.0/si_1#_USR_ORA_SRV=siprod#_USR_ORA_FLAGS=1

export ORACLE_SID=$_USR_ORA_SRVexport ORACLE_HOME=$_USR_ORA_LANGexport ASM=$_USR_ORA_FLAGS

PID1=`ps -ef | grep -v grep | grep pmon_$ORACLE_SID | awk '{print $8}'`PID2=`ps -ef | grep -v grep | grep asm_pmon | awk '{print $8}'`

case $1 in'start')

export ASM=$ASM export ORACLE_SID=$ORACLE_SID export ORACLE_HOME=$ORACLE_HOME export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH # export TNS_ADMIN=$ORACLE_HOME/network/admin # optionally set TNS_ADMIN here

if [ $ASM=1 ]then

if [ "$PID2" = "" ]then

$ORACLE_HOME/bin/srvctl start asm -n `hostname -s` fi

fi $ORACLE_HOME/bin/sqlplus /nolog <<EOF connect / as sysdba startup quitEOF ;;

'stop') export ORACLE_SID=$ORACLE_SID export ORACLE_HOME=$ORACLE_HOME export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH # export TNS_ADMIN=$ORACLE_HOME/network/admin # optionally set TNS_ADMIN here $ORACLE_HOME/bin/sqlplus /nolog <<EOF connect / as sysdba shutdown immediate quitEOF ;;'check') if [ "$PID1" != "" ] then exit 0 else exit 1 fi ;;*) echo "Usage: "`basename $0`" {start|stop|check}" ;;Esac

Script to start/stop/check Application Listener.

#!/bin/ksh# Script to start/stop and check a single instance database resource for coldfailover environment.## Description:

# - This script will used in coldfailover Oracle Clusterware environments# where the basic Oracle Clusterware Framework /API is in place.# Script is tested on 10g and 11g Clusterware environments on Linux.### Requirements:# - Oracle CLusterware is active# - Scripts is placed in $ORA_CRS_HOME/crs/public location and has execute rights for the oracle user# - Script make use of the passed crs_profile parameters setting: _USR_ORA_LANG,_USR_ORA_SRV# - _USR_ORA_LANG = $ORACLE_HOME location and _USR_ORA_SRV = listener name# - create a clusterware profile to use the scrip by executing:# $ crs_profile -create grid.siprod.lsnr -t application -r grid.prod_grp1.vip \# -a /u01/app/crs/crs/public/grid_lsnrfailover.sh \ # -o ci=20,ra=5,osrv=listener_siprod,ol=/u01/app/oracle/product/11.1.0/si_1/# $ crs_register grid.siprod.lsnr ### Version: active version 1.0# - version 1.0 created 21 mei 2008 for GRID.# -> 1.1## - Bernhard de Cock Buning# - Grid IT / www.grid-it.nl / www.rachelp.nl# set -x -vexec > /tmp/lsnrfailover.log 2>&1##_USR_ORA_LANG=/u01/app/oracle/product/11.1.0/si_1#_USR_ORA_SRV=listener_siprod

export ORACLE_LISTENER=$_USR_ORA_SRVexport ORACLE_HOME=$_USR_ORA_LANG

PID1=`ps -ef | grep -v grep | grep $ORACLE_LISTENER | awk '{print $9}'`

case $1 in'start') export ORACLE_LISTENER=$ORACLE_LISTENER export ORACLE_HOME=$ORACLE_HOME export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH # export TNS_ADMIN=$ORACLE_HOME/network/admin # optionally set TNS_ADMIN here $ORACLE_HOME/bin/lsnrctl start $ORACLE_LISTENER ;; 'stop') export ORACLE_LISTENER=$ORACLE_LISTENER export ORACLE_HOME=$ORACLE_HOME export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH # export TNS_ADMIN=$ORACLE_HOME/network/admin # optionally set TNS_ADMIN here

http://www.rachelp.nl/

http://www.grid-it.nl/

$ORACLE_HOME/bin/lsnrctl stop $ORACLE_LISTENER ;;'check') if [ "$PID1" != "" ] then exit 0 else exit 1 fi ;;*) echo "Usage: "`basename $0`" {start|stop|check}" ;;esac

This document is based on an own implementation. GRID-IT is not responsible for any errors related to the above implementation.Always test scripts, functionality in your own environment.

Have fun with the document.RACHELP and GRID-IT team.

WWW.GRID-IT.nlWWW.RACHELP.NL

http://WWW.RACHELP.NL/

http://WWW.GRID-IT.nl/

Date post:	02-Mar-2015
Category:	Documents
Upload:	dharma-nayak
View:	72 times
Download:	2 times

Cold Failover With Oracle Cluster Ware V2(1)

Documents