Date post: | 27-Jan-2017 |
Category: |
Technology |
Upload: | anil-nair |
View: | 84 times |
Download: | 0 times |
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Oracle Real Application Clusters (RAC)
Anil Nair Oracle Real Application Clusters (RAC) Product Management, ST DevelopmentOctober 27th, 2015
@OracleRACpm, @AmNairhttp://www.linkedin.com/in/anil-nair-01960b6http://www.slideshare.net/AnilNair27
Best Practices
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 3
Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 4
Program Agenda
Why and How to Upgrade
Features You Don’t Want to Miss
Appendixes
Grid Infrastructure Upgrade
Enhanced OJVM Patching steps
Hang Manager in Action
1
2
A
B
C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 5
Program Agenda
Why and How to Upgrade
Features You Don’t Want to Miss
Appendixes
Grid Infrastructure Upgrade
Enhanced OJVM Patching steps
Hang Manager in Action
1
2
A
B
C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 6
Program Agenda
Why and How to Upgrade
Features You Don’t Want to Miss
Appendixes
Grid Infrastructure Upgrade
Enhanced OJVM Patching steps
Hang Manager in Action
1
2
A
B
C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 7
The original slide deck presented at Oracle
Openworld 2015 includes information about a future
release. That information is not
included in this slide deck.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 8
The Question is not Whether or Not to Upgrade!
• The upgrade is the result of a long plan: – OOW 2013 we presented: “Oracle RAC 12c [Install] Best Practices”(**)
– OOW 2014 we presented: “Oracle RAC 12c Operational Best Practices” (**)
– In both years we presented “Oracle RAC Internals” with a different focus:• Oracle Grid Infrastructure and Configuration Internals(**)
• Oracle RAC Internals – The Cache Fusion Edition(**)
• Bottom line: You are all prepared for using Oracle RAC 12c (after this session)!
(**) See: http://www.slideshare.net/MarkusMichalewicz for previous years’ slides
The questions are why, when and how?
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 9
Why Upgrade?
• Oracle 12c has been out for more than 2 years – current 12.1.0.2
• Adoption rate is exemplary– Oracle 12c options such as Oracle Multitenant
and Oracle In-Memory Database facilitate upgrade
• Benefit from the latest features and avoid running a de-supported version of your software stack
Why upgrade now?11.2.0.2/3
PSU11.2.0.2/3.X
11.2.0.4Patch Set
11.2.0.4.X
PSU
Upgrade 12.1.0.2
12.1.0.2.X
PSUDe-Supported
Almost De-Supported
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 10
Why Upgrade?
• Using Oracle RAC, you have a choice– You can upgrade both the database
and Grid Infrastructure (GI) OR only GI.• More information in MOS note 756671.1
• Upgrading Oracle GI is always recommended: – Apply PSUs, Patch Sets & upgrades rolling!– Oracle GI needs to be of at least the same
if not higher version as the highest version database you want to operate
– Most applications only certify the database, not the Oracle GI version
– Benefit from all new features in Oracle GI• Some features may require Oracle 12c databases
You have a choice to upgrade step-by-step
Oracle Flex Cluster
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 11
How to Upgrade?
• Use “rolling” as much as possible! –Oracle RAC and Oracle RAC One Node
provide rolling upgrade features
– Drain workload prior to upgrading• $srvctl stop service – Stops sending new work to the node– Active sessions continue to work
• Wait for sessions to drain• Execute dbms_service.disconnect_session
With a plan and based on testing
Gold_SvcInstance 1
Gold_SvcInstance 2
Gold_SvcInstance 3
SingletonInstance 4
Connection pool
2
3
1 1 1
2 2 3
3 2 3
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 12
• Before Oracle RAC 12.1: – All nodes of the cluster have to
be available during upgrade– If any node crashes (e.g. HW/OS issues)
and cannot be restarted in time, then the upgrade cannot proceed
– Result: downtime required to downgrade, deleteNode and then attempt an upgrade
• With Oracle RAC 12.1:– Completion of the upgrade can be enforced
despite unavailable nodes
Oracle GI 12.1 – Node Failure Handling During Upgrade 1
Node 1 Node 2 Node 3 Node 4
MyCluster
1. Execute the following as root on all the available nodes except one: # rootupgrade.sh
2. Execute the following as root on the last node: # rootupgrade.sh --force
3. Click Continue in the install GUI screen which will run cluster verification
4. Once node3 is fixed, Execute the following as root: #rootcrs.sh –join –existingNode node3
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 13
• Case: the first node on which the upgrade was initiated crashes and remains unavailable– Before Oracle 12.1, the first node needed to be re-
activated prior to proceeding
• Starting with Oracle RAC 12.1 you can:– Run “rootupgrade.sh –first –force”
as root on any other node– After successful execution of the upgrade on
all remaining nodes, execute “$GRID_HOME/cfgtoollogs/configToolAllCommands” as the grid owner on one node
– If another node in the cluster is down at the time of upgrade, refer to the previous solution
Oracle GI 12.1 – Node Failure Handling During Upgrade 2
Node 1 Node 2 Node 3 Node 4
1. On any other node, execute as root#rootupgrade.sh –first –force
2. Execute as root on rest of the available nodes# rootupgrade.sh
3. Execute as Grid Owner from new first node$GRID_HOME/cfgtoollogs/configToolAllCommands
4. Once node1 is fixed, Execute as root#rootcrs.sh –join –existingNode node1
MyCluster
More Information in Appendix A
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 14
• Oracle GI 12.1 BATCH upgrade is a new feature allowing to upgrade nodes in “batches of nodes”
• BATCH Upgrade improves upgrade performance in large clusters as they can be patched in groups.
• Downtime can be reduced by grouping nodes with different service availability requirements
Oracle GI 12.1 – BATCH Upgrade
Node 1 Node 2 Node 3 Node 4
1. Nodes with different services availability requirements (silver, gold and bronce)
2. Gold_Svc availability can be improved by ensuring rootupgrade execution on Node2 & Node3 is batched separately.
3. For example, node1 can be batch1, Node2 and Node4 can be second and Node 3 can be last.
Gold_Svc Gold_Svc Bronce_Svc
MyCluster
Silver_Svc
More Information in Appendix A
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 15
Program Agenda
Why and How to Upgrade
Features You Don’t Want to Miss
Appendixes
Grid Infrastructure Upgrade
Enhanced OJVM Patching steps
Hang Manager in Action
1
2
A
B
C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 16
Oracle RAC 12c Rel. 1 – The Standard Going Forward
Flex Cluster Flex ASM
Hang Manager
Full Oracle Multitenant
Support
Detect
HeuristicsAnalyze
and Verify
Decide and
Resolve
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 17
Overlooked and Underestimated – Hang Manager
• Customers experience database hangs for a variety of reasons– High system load, workload contention, network congestion or errors
• Before Hang Manager was introduced with Oracle RAC 11.2.0.2 – Oracle required information to troubleshoot a hang - e.g.:
• System state dumps• For RAC: global system state dumps
– Customer usually had to reproduce with additional parameters
• With Oracle RAC 11.2.0.2:– Mechanism to automatically detect and resolve hangs– Note that Deadlock detection has been available in the database for years
Why is a Hang Manager required?
Detect
HeuristicsAnalyze
and Verify
Decide and
Resolve
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 18
How does it work? - Detection
• Hang(s) in the context of Hang Manager are database process(es) that are not progressing– Cross-layer hangs are also managed:– E.g.: Resolving a hang that is caused by a blocked-ASM resource.
• Hang Manager only considers DB sessions holding local/global resources on which sessions are waiting – Local resource: Local to the instance / Global resource: Global to the database
• Session(s) may hang due to underlying OS Resource issues
• Hung session(s) could be progressing but may be extremely slow
• Deadlocks and User Locks are not managed by Hang Manager
• Deadlock detection has been available in the database for years
The main problem to solve: When does a wait represent a hang?
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 19
How does it work? - Resolution
• Once the holder is found– Confirm whether the stack is progressing. – If stack progresses, it’s not deemed a hang.
• If Quality of Service Management (QoS) is used and service level definitions have been set:– The algorithm considers service level definition as part of the hang
resolution.
• Everything considered, eliminate holder – Logs containing an ORA-32701 – Possible hangs up to hang ID=%s
detected – can be found in Alert log & diag trace
• This functionality is available in Oracle RAC, RAC One Node & Single Instance(**)
After confirming hang, eliminate the holder
H
Res
W1
W2
W3
W4
W5
W6
W7
H
One holder and multiple waiters waiting on a resource
More Information in Appendix C
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 20
Kill node running DWH (Swingbench) workload on standard cluster (Hub node)How Flex Cluster Optimizes an 8-year-old Use Case
Swingbench operations: Impact running DWH and OLTP at the same time
Kill node Starting Instance
Impact on transactions
Reconfigcomplete
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 21
Kill node running DWH (Swingbench) workload on Flex Cluster (Leaf node)How Flex Cluster Optimizes an 8-year-old Use Case
Swingbench operations: Impact running DWH and OLTP at the same time
Kill node Starting Instance
Impact on transactions:unnoticeable
Reconfigcomplete
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 22
Better performance with DWH workload on Leaf nodes with BIG SGA and IMPQHow Flex Cluster Optimizes an 8-year-old Use Case
Decrease in response time of more than 80% depending on query and using In-Memory PQ (IMPQ) on Leafs with SGA_TARGET of 40GB compared to 16GB on HUBs only.
DB A
DB A
DBA
DB A
Hub Nodes
Read Only workload on Leaf Nodes
DB A
DBA
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 23
Recommended Oracle RAC Sessions during OOW ‘15Session Day Time Room Presenter
Introducing the Next Generation of Oracle Real Application Clusters
Monday, Oct 26th 4:00pm Moscone South - 102 Markus Michalewicz
Anil Nair
Oracle Real Application Clusters Best Practices
Tuesday, Oct 27th 12:15pm Moscone South - 104 Markus Michalewicz
Anil Nair
Manage Patching and Upgrades of Large-Scale Oracle Database Deployments
Tuesday, Oct 27th 4:00pm Moscone South - 102 Raj Kammend
Customer: Dell
Making the Right Choice for Managing Storage in Oracle Database: Oracle ASM and
Oracle ACFS Wednesday,
Oct 28th 12:15pm Moscone South – 103 Jim WilliamsAra Shakian
Hide the Impact of Scheduled Maintenance from Your Applications
Wednesday, Oct 28th 3:00pm Moscone South - 102
Carol ColrainTroy Anthony
Customer: EpsilonHow to Efficiently Monitor Your Database
Systems with Oracle’s Integrated ToolsThursday, Oct 29th 9:30am Moscone South – 307 Mark Scardina
Ankita Khandelwal
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 24
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 25
Appendix - AGrid Infrastructure Upgrade
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 26
Upgrade to 12.1.0.21
Ensure “Upgrade” is automatically selected as this indicates that the system has detected the current installation.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 27
• All cluster nodes should be listed Hint: The information comes from inventory
• New features allows upgrade even if some nodes are down
Ensure Node List is complete2
nodenode
nodenode
nodenode
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 28
Space requirements
Do not attempt to change groups during upgrade Ensure space requirements are met in /u01
3 4
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 29
Repeat Steps on all nodes of Cluster5
• For I in 1 to max nodes{– #mkdir –p /u01/app/12.1.0/grid– #mkdir –p /u01/app/crsusr– chown –R grid:oinstall /u01/app/12.1.0/grid
– chown –R grid:oinstall /u01/app/crsusr
}
• The installer will call Cluster Verification Utility (CVU) to check and confirm
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 30
Use Batching (BATCH Upgrade) to Improve Service Uptime6
• Use BATCH Upgrade to speed-up upgrade• “Batching” can help providing
improved availability of services• The feature prompts between batch runs,
this gives opportunity to relocate services• Local node will always be in “Batch 1”
nodenode
nodenode
nodenode
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 31
Review final Summary7
• Double check Available Disk space• Ensure Install option is Upgrade,
not “fresh install”• Optionally configure root script
execution via sudo• Ensure “Upgrade ASM” is true
node node node node node node
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 32
Confirm Upgrade is complete8
• Ensure “The upgrade of Oracle Grid Infrastructure for a cluster was successful” message is displayed• Success is reported after the installer
executes CVU for checks to ensure the newly upgraded Grid Infrastructure stack is healthy• The checks include inventory checks
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 33
De-Install the Old Home9
• Execute deinstall• This de-configures and de-installs
the old Grid Infrastructure Home• Log files in:/tmp/deinstall`date`.log• Check for complete removal– Remove left over files using
rm –rf <old_grid_home> directory
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 34
Appendix - BEnhanced OJVM Patching Steps
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 35
Enhanced steps to Patch OJVM Security Vulnerability• The process described hereafter is an optimization of the current OJVM
patching process described in MOS Note:1929745.1 to further reduce downtime• This enhanced procedure is NOT fully RAC rolling and NOT applicable to
Stand-By First but increases the availability of non-Java services during the patching process• This procedure still requires stopping Java usage for a period of time and
restarting of instances to use the new oracle executables but allows increased availability of facilities other than Java by doing the restarting in a standard RAC rolling manner
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 36
When should I use these steps?
• These steps have not been tested enough so we are not 100% sure there won't be side effects impacting customer's system.• Limited testing has shown that it does help reduce the overall downtime• The OJVM patch only affects RDBMS homes. It is not applicable to Grid
Home• Downtime is reduced only for Applications not using OJVM or OJVM
dependent options like XDB, Text, Spatial etc• You can consider using the mitigation patch if you do not use OJVM. The
steps are documented in MOS Note 1929745.1
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 37
Step by Step instructions1) Stop OJVM services: – $srvctl stop service –db <dbname> -service <java_srv>
** Do not omit –service else all services for that DB will be stopped
2) Ensure no sessions are using OJVM using the following query– select s.sid, s.username u_name, n.name p_name, st.value from v$session s,
v$sesstat st,v $statname n where s.sid=st.sid and n.statistic# = st.statistic# and n.name like 'java call heap total size' and st.value > 0 order by s.sid;0 row(s) returned
3) $srvctl disable service –db <dbname> -service <java_srv>4) Terminate sessions that have not stopped using – dbms_service.disconnect_session(‘java_srv’, DBMS_SERVICE.POST_TRANSACTION)
5) Kill JIT background process using SQL>“alter system kill session ‘XX,XXXX’”;
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 38
Continuation of Instructions – I)6) Disable JIT using “sqlplus / as sysdba”
SQL>alter system set events '29560 trace name context forever, level 2';7) Install OJVM patch using steps documented in Note 1929745.1 but do not
stop the RAC instances8) Relink the binaries ($ORACLE_HOME/bin/relink all) on each node and
restart the instances, one at a time as ina) Shutdown a instance (“Only one instance”)b) Relink only that instance while others instance(s) remain up c) restart the instance and all NON-JAVA services.d) proceed to the next instance and repeat steps (a), (b) & (c) for all instances
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 39
Continuation of Instructions – II9) Once steps (a),(b),(c) & (d) are executed on all the instances, 10) Replace Java classes– Perform post install steps of the OJVM patching process without stopping the
instance
11) Re-Enable JIT using “$sqlplus “/as sysdba”SQL>alter system set events '29560 trace name context forever, level 2';
12) Enable Java services– $srvctl enable service –db <db_name> -service Javasvc– $srvctl start service –db <db_name> -service Javasvc
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 40
Appendix - CHang Manager in Action
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 41
Early Warning entries
*** 2015-04-03T15:54:27.919500+17:00HM: Early Warning - Session ID 32 serial# 39797 OS PID 15825 is in an involuntary wait 'enq: TX - contention' for 32 seconds blocking 5 sessions p1=0x54580006, p2=0x5, p3=0x0 Blocked by Session ID 117 serial# 65210 on instance 1 which is waiting on 'latch free' for 33 seconds p1=0x69535c98, p2=0xf0, p3=0x0
HM: Dumping Short Stack of pid[39.15825] (sid:32, ser#:39797)...-- Short Stack – EXCEPT on WINDOWS...
*** 2015-04-03T15:54:27.919500+17:00HM: Current SQL: BEGIN simulate_hang(7, 14, 6, 1, 0); END;
Sample diag0 entries ** (Subject to change)
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 42
Hang Manager In Action – Part 1Session information
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 43
Hang Manager In Action – Part 2Continuation of dia0 trace
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 44
Hang Manager Initiates Resolutiondia0 trace
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 45
Hang Manager Initiated ResolutionSample dia0 and Alert log entries on victim
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 46
Hang Manager Initiated ResolutionSample dia0 and Alert log entries at Master