+ All Categories
Home > Technology > Con8780 nair rac_best_practices_final_without_12_2content

Con8780 nair rac_best_practices_final_without_12_2content

Date post: 27-Jan-2017
Category:
Upload: anil-nair
View: 84 times
Download: 0 times
Share this document with a friend
47
Transcript
Page 1: Con8780 nair rac_best_practices_final_without_12_2content
Page 2: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Oracle Real Application Clusters (RAC)

Anil Nair Oracle Real Application Clusters (RAC) Product Management, ST DevelopmentOctober 27th, 2015

@OracleRACpm, @AmNairhttp://www.linkedin.com/in/anil-nair-01960b6http://www.slideshare.net/AnilNair27

Best Practices

Page 3: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 3

Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 4: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 4

Program Agenda

Why and How to Upgrade

Features You Don’t Want to Miss

Appendixes

Grid Infrastructure Upgrade

Enhanced OJVM Patching steps

Hang Manager in Action

1

2

A

B

C

Page 5: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 5

Program Agenda

Why and How to Upgrade

Features You Don’t Want to Miss

Appendixes

Grid Infrastructure Upgrade

Enhanced OJVM Patching steps

Hang Manager in Action

1

2

A

B

C

Page 6: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 6

Program Agenda

Why and How to Upgrade

Features You Don’t Want to Miss

Appendixes

Grid Infrastructure Upgrade

Enhanced OJVM Patching steps

Hang Manager in Action

1

2

A

B

C

Page 7: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 7

The original slide deck presented at Oracle

Openworld 2015 includes information about a future

release. That information is not

included in this slide deck.

Page 8: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 8

The Question is not Whether or Not to Upgrade!

• The upgrade is the result of a long plan: – OOW 2013 we presented: “Oracle RAC 12c [Install] Best Practices”(**)

– OOW 2014 we presented: “Oracle RAC 12c Operational Best Practices” (**)

– In both years we presented “Oracle RAC Internals” with a different focus:• Oracle Grid Infrastructure and Configuration Internals(**)

• Oracle RAC Internals – The Cache Fusion Edition(**)

• Bottom line: You are all prepared for using Oracle RAC 12c (after this session)!

(**) See: http://www.slideshare.net/MarkusMichalewicz for previous years’ slides

The questions are why, when and how?

Page 9: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 9

Why Upgrade?

• Oracle 12c has been out for more than 2 years – current 12.1.0.2

• Adoption rate is exemplary– Oracle 12c options such as Oracle Multitenant

and Oracle In-Memory Database facilitate upgrade

• Benefit from the latest features and avoid running a de-supported version of your software stack

Why upgrade now?11.2.0.2/3

PSU11.2.0.2/3.X

11.2.0.4Patch Set

11.2.0.4.X

PSU

Upgrade 12.1.0.2

12.1.0.2.X

PSUDe-Supported

Almost De-Supported

Page 10: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 10

Why Upgrade?

• Using Oracle RAC, you have a choice– You can upgrade both the database

and Grid Infrastructure (GI) OR only GI.• More information in MOS note 756671.1

• Upgrading Oracle GI is always recommended: – Apply PSUs, Patch Sets & upgrades rolling!– Oracle GI needs to be of at least the same

if not higher version as the highest version database you want to operate

– Most applications only certify the database, not the Oracle GI version

– Benefit from all new features in Oracle GI• Some features may require Oracle 12c databases

You have a choice to upgrade step-by-step

Oracle Flex Cluster

Page 11: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 11

How to Upgrade?

• Use “rolling” as much as possible! –Oracle RAC and Oracle RAC One Node

provide rolling upgrade features

– Drain workload prior to upgrading• $srvctl stop service – Stops sending new work to the node– Active sessions continue to work

• Wait for sessions to drain• Execute dbms_service.disconnect_session

With a plan and based on testing

Gold_SvcInstance 1

Gold_SvcInstance 2

Gold_SvcInstance 3

SingletonInstance 4

Connection pool

2

3

1 1 1

2 2 3

3 2 3

Page 12: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 12

• Before Oracle RAC 12.1: – All nodes of the cluster have to

be available during upgrade– If any node crashes (e.g. HW/OS issues)

and cannot be restarted in time, then the upgrade cannot proceed

– Result: downtime required to downgrade, deleteNode and then attempt an upgrade

• With Oracle RAC 12.1:– Completion of the upgrade can be enforced

despite unavailable nodes

Oracle GI 12.1 – Node Failure Handling During Upgrade 1

Node 1 Node 2 Node 3 Node 4

MyCluster

1. Execute the following as root on all the available nodes except one: # rootupgrade.sh

2. Execute the following as root on the last node: # rootupgrade.sh --force

3. Click Continue in the install GUI screen which will run cluster verification

4. Once node3 is fixed, Execute the following as root: #rootcrs.sh –join –existingNode node3

Page 13: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 13

• Case: the first node on which the upgrade was initiated crashes and remains unavailable– Before Oracle 12.1, the first node needed to be re-

activated prior to proceeding

• Starting with Oracle RAC 12.1 you can:– Run “rootupgrade.sh –first –force”

as root on any other node– After successful execution of the upgrade on

all remaining nodes, execute “$GRID_HOME/cfgtoollogs/configToolAllCommands” as the grid owner on one node

– If another node in the cluster is down at the time of upgrade, refer to the previous solution

Oracle GI 12.1 – Node Failure Handling During Upgrade 2

Node 1 Node 2 Node 3 Node 4

1. On any other node, execute as root#rootupgrade.sh –first –force

2. Execute as root on rest of the available nodes# rootupgrade.sh

3. Execute as Grid Owner from new first node$GRID_HOME/cfgtoollogs/configToolAllCommands

4. Once node1 is fixed, Execute as root#rootcrs.sh –join –existingNode node1

MyCluster

More Information in Appendix A

Page 14: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 14

• Oracle GI 12.1 BATCH upgrade is a new feature allowing to upgrade nodes in “batches of nodes”

• BATCH Upgrade improves upgrade performance in large clusters as they can be patched in groups.

• Downtime can be reduced by grouping nodes with different service availability requirements

Oracle GI 12.1 – BATCH Upgrade

Node 1 Node 2 Node 3 Node 4

1. Nodes with different services availability requirements (silver, gold and bronce)

2. Gold_Svc availability can be improved by ensuring rootupgrade execution on Node2 & Node3 is batched separately.

3. For example, node1 can be batch1, Node2 and Node4 can be second and Node 3 can be last.

Gold_Svc Gold_Svc Bronce_Svc

MyCluster

Silver_Svc

More Information in Appendix A

Page 15: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 15

Program Agenda

Why and How to Upgrade

Features You Don’t Want to Miss

Appendixes

Grid Infrastructure Upgrade

Enhanced OJVM Patching steps

Hang Manager in Action

1

2

A

B

C

Page 16: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 16

Oracle RAC 12c Rel. 1 – The Standard Going Forward

Flex Cluster Flex ASM

Hang Manager

Full Oracle Multitenant

Support

Detect

HeuristicsAnalyze

and Verify

Decide and

Resolve

Page 17: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 17

Overlooked and Underestimated – Hang Manager

• Customers experience database hangs for a variety of reasons– High system load, workload contention, network congestion or errors

• Before Hang Manager was introduced with Oracle RAC 11.2.0.2 – Oracle required information to troubleshoot a hang - e.g.:

• System state dumps• For RAC: global system state dumps

– Customer usually had to reproduce with additional parameters

• With Oracle RAC 11.2.0.2:– Mechanism to automatically detect and resolve hangs– Note that Deadlock detection has been available in the database for years

Why is a Hang Manager required?

Detect

HeuristicsAnalyze

and Verify

Decide and

Resolve

Page 18: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 18

How does it work? - Detection

• Hang(s) in the context of Hang Manager are database process(es) that are not progressing– Cross-layer hangs are also managed:– E.g.: Resolving a hang that is caused by a blocked-ASM resource.

• Hang Manager only considers DB sessions holding local/global resources on which sessions are waiting – Local resource: Local to the instance / Global resource: Global to the database

• Session(s) may hang due to underlying OS Resource issues

• Hung session(s) could be progressing but may be extremely slow

• Deadlocks and User Locks are not managed by Hang Manager

• Deadlock detection has been available in the database for years

The main problem to solve: When does a wait represent a hang?

Page 19: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 19

How does it work? - Resolution

• Once the holder is found– Confirm whether the stack is progressing. – If stack progresses, it’s not deemed a hang.

• If Quality of Service Management (QoS) is used and service level definitions have been set:– The algorithm considers service level definition as part of the hang

resolution.

• Everything considered, eliminate holder – Logs containing an ORA-32701 – Possible hangs up to hang ID=%s

detected – can be found in Alert log & diag trace

• This functionality is available in Oracle RAC, RAC One Node & Single Instance(**)

After confirming hang, eliminate the holder

H

Res

W1

W2

W3

W4

W5

W6

W7

H

One holder and multiple waiters waiting on a resource

More Information in Appendix C

Page 20: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 20

Kill node running DWH (Swingbench) workload on standard cluster (Hub node)How Flex Cluster Optimizes an 8-year-old Use Case

Swingbench operations: Impact running DWH and OLTP at the same time

Kill node Starting Instance

Impact on transactions

Reconfigcomplete

Page 21: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 21

Kill node running DWH (Swingbench) workload on Flex Cluster (Leaf node)How Flex Cluster Optimizes an 8-year-old Use Case

Swingbench operations: Impact running DWH and OLTP at the same time

Kill node Starting Instance

Impact on transactions:unnoticeable

Reconfigcomplete

Page 22: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 22

Better performance with DWH workload on Leaf nodes with BIG SGA and IMPQHow Flex Cluster Optimizes an 8-year-old Use Case

Decrease in response time of more than 80% depending on query and using In-Memory PQ (IMPQ) on Leafs with SGA_TARGET of 40GB compared to 16GB on HUBs only.

DB A

DB A

DBA

DB A

Hub Nodes

Read Only workload on Leaf Nodes

DB A

DBA

Page 23: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 23

Recommended Oracle RAC Sessions during OOW ‘15Session Day Time Room Presenter

Introducing the Next Generation of Oracle Real Application Clusters

Monday, Oct 26th 4:00pm Moscone South - 102 Markus Michalewicz

Anil Nair

Oracle Real Application Clusters Best Practices

Tuesday, Oct 27th 12:15pm Moscone South - 104 Markus Michalewicz

Anil Nair

Manage Patching and Upgrades of Large-Scale Oracle Database Deployments

Tuesday, Oct 27th 4:00pm Moscone South - 102 Raj Kammend

Customer: Dell

Making the Right Choice for Managing Storage in Oracle Database: Oracle ASM and

Oracle ACFS Wednesday,

Oct 28th 12:15pm Moscone South – 103 Jim WilliamsAra Shakian

Hide the Impact of Scheduled Maintenance from Your Applications

Wednesday, Oct 28th 3:00pm Moscone South - 102

Carol ColrainTroy Anthony

Customer: EpsilonHow to Efficiently Monitor Your Database

Systems with Oracle’s Integrated ToolsThursday, Oct 29th 9:30am Moscone South – 307 Mark Scardina

Ankita Khandelwal

Page 24: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 24

Page 25: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 25

Appendix - AGrid Infrastructure Upgrade

Page 26: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 26

Upgrade to 12.1.0.21

Ensure “Upgrade” is automatically selected as this indicates that the system has detected the current installation.

Page 27: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 27

• All cluster nodes should be listed Hint: The information comes from inventory

• New features allows upgrade even if some nodes are down

Ensure Node List is complete2

nodenode

nodenode

nodenode

Page 28: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 28

Space requirements

Do not attempt to change groups during upgrade Ensure space requirements are met in /u01

3 4

Page 29: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 29

Repeat Steps on all nodes of Cluster5

• For I in 1 to max nodes{– #mkdir –p /u01/app/12.1.0/grid– #mkdir –p /u01/app/crsusr– chown –R grid:oinstall /u01/app/12.1.0/grid

– chown –R grid:oinstall /u01/app/crsusr

}

• The installer will call Cluster Verification Utility (CVU) to check and confirm

Page 30: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 30

Use Batching (BATCH Upgrade) to Improve Service Uptime6

• Use BATCH Upgrade to speed-up upgrade• “Batching” can help providing

improved availability of services• The feature prompts between batch runs,

this gives opportunity to relocate services• Local node will always be in “Batch 1”

nodenode

nodenode

nodenode

Page 31: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 31

Review final Summary7

• Double check Available Disk space• Ensure Install option is Upgrade,

not “fresh install”• Optionally configure root script

execution via sudo• Ensure “Upgrade ASM” is true

node node node node node node

Page 32: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 32

Confirm Upgrade is complete8

• Ensure “The upgrade of Oracle Grid Infrastructure for a cluster was successful” message is displayed• Success is reported after the installer

executes CVU for checks to ensure the newly upgraded Grid Infrastructure stack is healthy• The checks include inventory checks

Page 33: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 33

De-Install the Old Home9

• Execute deinstall• This de-configures and de-installs

the old Grid Infrastructure Home• Log files in:/tmp/deinstall`date`.log• Check for complete removal– Remove left over files using

rm –rf <old_grid_home> directory

Page 34: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 34

Appendix - BEnhanced OJVM Patching Steps

Page 35: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 35

Enhanced steps to Patch OJVM Security Vulnerability• The process described hereafter is an optimization of the current OJVM

patching process described in MOS Note:1929745.1 to further reduce downtime• This enhanced procedure is NOT fully RAC rolling and NOT applicable to

Stand-By First but increases the availability of non-Java services during the patching process• This procedure still requires stopping Java usage for a period of time and

restarting of instances to use the new oracle executables but allows increased availability of facilities other than Java by doing the restarting in a standard RAC rolling manner

Page 36: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 36

When should I use these steps?

• These steps have not been tested enough so we are not 100% sure there won't be side effects impacting customer's system.• Limited testing has shown that it does help reduce the overall downtime• The OJVM patch only affects RDBMS homes. It is not applicable to Grid

Home• Downtime is reduced only for Applications not using OJVM or OJVM

dependent options like XDB, Text, Spatial etc• You can consider using the mitigation patch if you do not use OJVM. The

steps are documented in MOS Note 1929745.1

Page 37: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 37

Step by Step instructions1) Stop OJVM services: – $srvctl stop service –db <dbname> -service <java_srv>

** Do not omit –service else all services for that DB will be stopped

2) Ensure no sessions are using OJVM using the following query– select s.sid, s.username u_name, n.name p_name, st.value from v$session s,

v$sesstat st,v $statname n where s.sid=st.sid and n.statistic# = st.statistic# and n.name like 'java call heap total size' and st.value > 0 order by s.sid;0 row(s) returned

3) $srvctl disable service –db <dbname> -service <java_srv>4) Terminate sessions that have not stopped using – dbms_service.disconnect_session(‘java_srv’, DBMS_SERVICE.POST_TRANSACTION)

5) Kill JIT background process using SQL>“alter system kill session ‘XX,XXXX’”;

Page 38: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 38

Continuation of Instructions – I)6) Disable JIT using “sqlplus / as sysdba”

SQL>alter system set events '29560 trace name context forever, level 2';7) Install OJVM patch using steps documented in Note 1929745.1 but do not

stop the RAC instances8) Relink the binaries ($ORACLE_HOME/bin/relink all) on each node and

restart the instances, one at a time as ina) Shutdown a instance (“Only one instance”)b) Relink only that instance while others instance(s) remain up c) restart the instance and all NON-JAVA services.d) proceed to the next instance and repeat steps (a), (b) & (c) for all instances

Page 39: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 39

Continuation of Instructions – II9) Once steps (a),(b),(c) & (d) are executed on all the instances, 10) Replace Java classes– Perform post install steps of the OJVM patching process without stopping the

instance

11) Re-Enable JIT using “$sqlplus “/as sysdba”SQL>alter system set events '29560 trace name context forever, level 2';

12) Enable Java services– $srvctl enable service –db <db_name> -service Javasvc– $srvctl start service –db <db_name> -service Javasvc

Page 40: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 40

Appendix - CHang Manager in Action

Page 41: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 41

Early Warning entries

*** 2015-04-03T15:54:27.919500+17:00HM: Early Warning - Session ID 32 serial# 39797 OS PID 15825 is in an involuntary wait 'enq: TX - contention' for 32 seconds blocking 5 sessions p1=0x54580006, p2=0x5, p3=0x0 Blocked by Session ID 117 serial# 65210 on instance 1 which is waiting on 'latch free' for 33 seconds p1=0x69535c98, p2=0xf0, p3=0x0

HM: Dumping Short Stack of pid[39.15825] (sid:32, ser#:39797)...-- Short Stack – EXCEPT on WINDOWS...

*** 2015-04-03T15:54:27.919500+17:00HM: Current SQL: BEGIN simulate_hang(7, 14, 6, 1, 0); END;

Sample diag0 entries ** (Subject to change)

Page 42: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 42

Hang Manager In Action – Part 1Session information

Page 43: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 43

Hang Manager In Action – Part 2Continuation of dia0 trace

Page 44: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 44

Hang Manager Initiates Resolutiondia0 trace

Page 45: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 45

Hang Manager Initiated ResolutionSample dia0 and Alert log entries on victim

Page 46: Con8780 nair rac_best_practices_final_without_12_2content

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 46

Hang Manager Initiated ResolutionSample dia0 and Alert log entries at Master

Page 47: Con8780 nair rac_best_practices_final_without_12_2content

Recommended