Date post: | 24-Jan-2018 |
Category: |
Software |
Upload: | markus-michalewicz |
View: | 188 times |
Download: | 4 times |
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Oracle Real Applica@on Clusters (RAC) 12c Release 2 – For Con@nuous Availability
Markus Michalewicz Senior Director of Product Management, Oracle RAC Development
[email protected] @OracleRACpm hQp://www.linkedin.com/in/markusmichalewicz hQp://www.slideshare.net/MarkusMichalewicz
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement The following is intended to outline our general product direc@on. It is intended for informa@on purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or func@onality, and should not be relied upon in making purchasing decisions. The development, release, and @ming of any features or func@onality described for Oracle’s products remains at the sole discre@on of Oracle.
3
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Edi@on-‐based Redefini@on, Online Redefini@on, Data Guard, GoldenGate – Minimal down+me maintenance, upgrades, migra+ons
Ac@ve Data Guard – Data Protec+on, DR – Query Offload
GoldenGate – Ac+ve-‐ac+ve replica+on – Heterogeneous
Ac@ve Replica
Oracle Maximum Availability Architecture (MAA)
RMAN, Oracle Secure Backup – Backup to disk, tape or cloud
Enterprise Manager Cloud Control – Coordinated Site Failover Applica@on Con@nuity – Applica+on HA Global Data Services – Service Failover / Load Balancing
RAC – Scalability – Server HA
Flashback – Human error correc+on
Produc@on Site
ASM – ASM mirroring
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Edi@on-‐based Redefini@on, Online Redefini@on, Data Guard, GoldenGate – Minimal down+me maintenance, upgrades, migra+ons
Ac@ve Data Guard – Data Protec+on, DR – Query Offload
GoldenGate – Ac+ve-‐ac+ve replica+on – Heterogeneous
Ac@ve Replica
Oracle Maximum Availability Architecture (MAA)
RMAN, Oracle Secure Backup – Backup to disk, tape or cloud
Enterprise Manager Cloud Control – Coordinated Site Failover Applica@on Con@nuity – Applica+on HA Global Data Services – Service Failover / Load Balancing
RAC – Scalability – Server HA
Flashback – Human error correc+on
Produc@on Site
Edi@on-‐based Redefini@on, Online Redefini@on, Data Guard, GoldenGate – Minimal down+me maintenance, upgrades, migra+ons
Ac@ve Data Guard – Data Protec+on, DR – Query Offload
GoldenGate – Ac+ve-‐ac+ve replica+on – Heterogeneous
Ac@ve Replica
RMAN, Oracle Secure Backup – Backup to disk, tape or cloud
Enterprise Manager Cloud Control – Coordinated Site Failover Applica@on Con@nuity – Applica+on HA Global Data Services – Service Failover / Load Balancing
RAC – Scalability – Server HA
Flashback – Human error correc+on
Produc@on Site
ASM – ASM mirroring
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
High Availability Improvements
Con@nuous Availability Features
1
2
6
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
High Availability Improvements
Con@nuous Availability Features
1
2
7
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Reduced failure detec(on @me for an increased number of monitored components
8
Reduced (me to recover from local failures due to
reduced reconfigura@on @mes
Preven(on of system or database failures using ML-‐based real-‐@me
analysis of diagnos@c data
RAC High Availability Improvements
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Reduced failure detec(on @me for an increased number of monitored components
9
Reduced (me to recover from local failures due to
reduced reconfigura@on @mes
Preven(on of system or database failures using ML-‐based real-‐@me
analysis of diagnos@c data
RAC High Availability Improvements
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
More Components Checked More Frequently • Oracle Clusterware checks – more components • Mul@ple public networks checked with Ping Targets
– more frequently • VIPs checked every second • 30 secs CSS misscount default, zero brownout allows for less
– more efficiently • Agent changes allow for more checks using lesser resources • Data from auxiliary systems are taken into account • Engineered System-‐op(mized failure detec(on and fencing
– and offline • Offline monitoring of failed components for faster recovery
– to detect failures sooner and to recover faster
10
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Reduced failure detec(on @me for an increased number of monitored components
11
Reduced (me to recover from local failures due to
reduced reconfigura@on @mes
Preven(on of system or database failures using ML-‐based real-‐@me
analysis of diagnos@c data
RAC High Availability Improvements
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Smart Fencing
12
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 13
• Pre-‐12.2, node evic@on follows a rather “ignorant” paQern – Example in a 2-‐node cluster: The node with the lowest node number survives.
• Customers must not base their applica@on logic on which node survives the split brain. – As this may(!) change in future releases
Node Evic@on Basics h=p://www.slideshare.net/MarkusMichalewicz/oracle-‐clusterware-‐node-‐management-‐and-‐vo(ng-‐disks
✔ 1 2
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 14
• Node Weigh@ng is a new feature that considers the workload hosted in the cluster during fencing
• The idea is to let the majority of work survive, if everything else is equal – Example: In a 2-‐node cluster, the node hos@ng the
majority of services (at fencing @me) is meant to survive
Node Weigh@ng in Oracle RAC 12c Release 2 Idea: Everything equal, let the majority of work survive
✔ 1 2
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
A three node cluster will benefit from “Node Weigh@ng”, if three equally sized sub-‐clusters are built as s result of the failure, since two differently sized sub-‐clusters are
not equal.
15
Secondary failure considera(on can influence which node survives. Secondary failure considera@on will be enhanced successively.
A fallback scheme is applied if considera@ons do not lead to an ac@onable outcome.
Let’s Define “Equal”
✔
Public network card failure. “Conflict”.
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
CSS_CRITICAL can be set on various levels / components to mark them as
“cri@cal” so that the cluster will try to preserve them in case of a failure.
16
CSS_CRITICAL will be honored if no other technical reason prohibits survival of the node which has at least one cri@cal
component at the @me of failure.
A fallback scheme is applied if CSS_CRITICAL sepngs do not lead
to an ac@onable outcome.
CSS_CRITICAL – Fencing with Manual Override
crsctl set server css_cri(cal {YES|NO}
+ server restart
srvctl modify database -‐help |grep cri@cal
… -‐css_cri@cal {YES | NO}
Define whether the database or service is CSS cri@cal
✔ Node evic@on despite WL; WL will failover.
“Conflict”.
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Recovery Buddies
17
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 18
• Recovery Buddies • Track block changes on buddy instance
• Quickly iden@fy blocks requiring recovery during reconfigura@on
• Allow rapid processing of transac@ons awer failures
Near Zero Reconfigura@on Time with Recovery Buddies A.k.a. Buddy Instances
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 19
• Buddy Instance mapping is simple (random) – e.g. I1 à I2, I2 à I3, I3 à I4, I4 à I1
• Recovery buddies are assigned during startup • RMS0 on each recovery buddy instance maintains an in-‐memory area for redo log change
• An in-‐memory area is used during recovery – Eliminates the need to physically read the redo
Near Zero Reconfigura@on Time with Recovery Buddies How it works under the hood
Instance I1
Instance I2
Instance I3
Instance I4
Recovery Buddy I3
Recovery Buddy I4
Recovery Buddy I1
MyCluster
Recovery Buddy I2
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
How Recovery Buddies Help Reducing Recovery Time
Without Recovery Buddies With Recovery Buddies
20
Detect
Evict
Elect Recovery
Read Redo
Apply Recovery
Detect
Evict
Elect Recovery
Read Redo
Apply Recovery
Up to 4x
faster
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Database Hang Manager
21
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Overlooked and Underes@mated – Hang Manager
• Customers experience database hangs for a variety of reasons – High system load, workload conten@on, network conges@on, general errors, etc.
• Before Hang Manager was introduced with Oracle RAC 11.2.0.2 – Oracle required quite some informa@on to troubleshoot a hang -‐ e.g.: • System state dumps • For RAC: global system state dumps
– Customer usually had to reproduce “the” hang with addi@onal events to analyze it
22
Why having a Hang Manager is useful
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 23
• Always on, as enabled by default • Reliably detects database hangs • Autonomically resolves hangs
• Considers QoS policies for hang resolu@on • Logs all detected hangs & their resolu@ons
Introduc@on to Hang Manager How it works Session
DIAG0
EVALUATE
DETECT
ANALYZE
Hung?
VERIFY
Vic(m
QoS Policy
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 24
• Hang Manager auto-‐tunes itself by periodically collec@ng instance-‐and cluster-‐wide hang sta@s@cs
• Metrics like cluster health/instance health is tracked over a moving average
• This moving average is considered during resolu@on
• Holders wai@ng on SQL*Net break/reset are fast tracked
Hang Manager Op@miza@ons with Oracle RAC 12c Tuning under the hood
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 25
• Early warning exposed via (V$ view)
• Sensi@vity can be set higher – If the default level is too conserva@ve
• Hang Manager considers QoS policies and data during the valida@on process
DBMS_HANG_MANAGER.Sensi@vity A new SQL interface to set Hang Manager sensi@vity
Hang Sensi(vity Level
Descrip(on Note
NORMAL Hang Manager uses its default internal opera@ng parameters to try to meet typical requirements for any environments.
Default
HIGH Hang Manager is more alert to sessions wai@ng in a chain than when sensi@vity is in NORMAL level.
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Reduced failure detec(on @me for an increased number of monitored components
26
Reduced (me to recover from local failures due to
reduced reconfigura@on @mes
Preven(on of system or database failures using ML-‐based real-‐@me
analysis of diagnos@c data
RAC High Availability Improvements
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Oracle Autonomous Health Framework (AHF)
• Integrates next genera@on tools running as components -‐ 24/7 • Discovers Poten@al Issues and No@fies or takes Correc@ve Ac@ons • Speeds up Issue Diagnosis and Recovery • Preserves Database and Server Availability and Performance • Autonomously Monitors and Manages resources to maintain SLAs
27
Working for You Con(nuously
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
AHF – Availability by Pla}orm
28
Linux x86-‐64 zLinux Solaris (Sparc) HP-‐UX Itanium IBM AIX Windows z86-‐64
Cluster Verifica(on U(lity (CVU)
✔ ✔V: March 2015 ✔ ✔
V: August 2015 ✔
V: August 2015 ✔
V: August 2015
ORAchk ✔ ✔ ✔ ✔ ✔ ✔
Cluster Health Monitor (CHM) ✔ ✗���
Not planned ✔✗���
Not planned ✔ ✔
Cluster Health Advisor (CHA)
✔Since 12.2.0.1
✗���Not planned
✗���Future Release
✗���Not planned
✗���Future Release
✗���Not planned
Trace File Analyzer (TFA) ✔ ✔ ✔ ✔
(no TFA web) ✔ ✔(no TFA web)
Hang Manager ✔ ✔ ✔ ✔ ✔ ✔
Memory Guard ✔✗���
Not planned ✔✗���
Not planned ✔ ✔
Quality of Service Management (QOS) ✔
✗���Not planned ✔
✗���Not planned ✔ ✔
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 29
Generates Diagnos(c Metrics View of Cluster and Databases Cluster Health Monitor (CHM)
• Always on -‐ Enabled by default • Provides Detailed OS Resource Metrics
• Assists Node evic@on analysis • Locally logs all process data • User can define pinned processes • Listens to CSS and GIPC events • Categorizes processes by type • Supports plug-‐in collectors (ex. traceroute, netstat, ping, etc.)
• New CSV output for ease of analysis
GIMR
ologgerd (master)
osysmond
12c Grid Infrastructure Management Repository
OS Data
osysmond
osysmond
OS Data
OS Data
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Introducing Oracle 12c Cluster Health Advisor (CHA)
• Real @me monitoring of Oracle RAC database systems and their hosts • Early detec@on of impending as well as ongoing system faults • Diagnoses and iden@fies the most likely root causes • Provides correc@ve ac@ons for targeted triage. • Generates alerts and no@fica@ons for rapid recovery
30
Proac(ve Health Prognos(cs System
Full presenta@on: hQp://www.oracle.com/technetwork/database/op@ons/clustering/ahf/learnmore/oracle-‐12cr2-‐cha-‐3623186.pdf
Recorded WebSeminar: hQps://www.youtube.com/watch?v=TbdkGsmSgcQ
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor (CHA) Architecture Overview
31
OS Data
GIMR
ochad
DB Data
CHM
Node Health
Prognos(cs Engine
Database Health
Prognos(cs Engine
OS Model
DB Model
• cha – Cluster node resource • Single Java ochad daemon per node
• Reads Cluster Health Monitor data directly from memory
• Reads DB ASH data from SMR w/o DB connec@on
• Uses OS and DB models and data to perform prognos@cs
• Stores analysis and evidence in the GI Management Repository
• Sends alerts to EMCC Incident Manager per target
EMCC Alert
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor -‐ Scope of Problem Detec@on
• Over 30 node and database problems have been modeled • Over 150 OS and DB metric predictors iden@fied • Problem Detec@on in 12.2.0.1 includes – Interconnect , Global Cache and Cluster Problems – Host CPU and Memory , PGA Memory stress – IO and Storage Performance issues – Reconfigura@on and Recovery issues – Workload and Session abnormal varia@ons
32
Best Effort Immediate Guided Diagnosis
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 33
Data Sources and Data Points Cluster Health Advisor
Time CPU ASM IOPS
Network % u(l
Network_Packets Dropped
Log file sync
Log file parallel write
GC CR request
GC current request
GC current block 2-‐way
GC current block busy
Enq: CF -‐conten(on
…
15:16:00 0.90 4100 13% 0 2 ms 600 us 0 0 300 us 1.5 ms 0
A CHA Data Point contains > 150 signals (sta@s@cs and events) from mul+ple sources
OS, ASM , Network DB ( ASH, AWR session, system and PDB sta(s(cs )
Sta@s@cs are collected at a 1 second internal sampling rate , synchronized, smoothed and aggregated to a Data Point every 5 seconds
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 34
Models Capture the Dynamic Behavior of all Normal Opera?on Models Capture all Normal Opera@ng Modes
0
5000
10000
15000
20000
25000
30000
35000
40000
10:00 2:00 6:00
5100 9025
4024
2350
4100
22050 10000
21000
4400
2500
4900
800
IOPS
user commits (/sec)
log file parallel write (usec)
log file sync (usec)
• Release ships with conserva@ve models to minimize false warnings • A model captures the normal load phases and their sta@s@cs over @me, and thus the characteris@cs for all load
intensi@es and profiles. During monitoring, any data point similar to one of the vectors is NORMAL. • One could say that the model REMEMBERS the normal opera?onal dynamics over ?me
In-‐Memory Reference Matrix (Part of “Normality” Model)
IOPS #### 2500 4900 800 ####
User Commits #### 10000 21000 4400 ####
Log File Parallel Write #### 2350 4100 22050 ####
Log File Sync #### 5100 9025 4024 ####
… … … … … …
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 35
CHA Model: Find Similarity with Normal Values Cluster Health Advisor
Observed values (Part of a Data Point)
CHA es(mator/predictor: “based on my normality model, the value of IOPS should be in the vicinity of ~ 4900, but it is reported as 10500, this is causing a residual of ~ 5600 in magnitude”,
CHA fault detector: “such high magnitude of residuals should be tracked carefully! I’ll keep an eye on the incoming sequence of this signal IOPS and if it remains deviant I’ll generate a fault on it”.
In-‐Memory Reference Matrix (Part of “Normality” Model)
IOPS #### 2500 4900 800 ####
User Commits #### 10000 21000 4400 ####
Log File Parallel Write #### 2350 4100 22050 ####
Log File Sync #### 5100 9025 4024 ####
… … … … … …
10500
20000
4050
10250
…
Residual Values (Part of a Data Point)
5600
-‐1000
-‐50
325
…
Observed -‐ Predicted =
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor (CHA) Opera@on Overview
36
• SRVCTL lifecycle daemon management • Enabled by default -‐ Ac@vates when 1st RAC instance starts
• New CHACTL command line tool for all local opera@ons
• Java GUI Tool available on OTN soon • Integrated into EMCC Incident Manager and no@fica@ons
• Monitoring has no impact on DB performance or availability
CHACTL Client
CHA Java GUI Client
SRVCTL
OS Data
GIMR
DB Data
CHM
Node Health
Prognos(cs Engine
Database Health
Prognos(cs Engine
OS Model
DB Model
Local to Cluster
EM Cloud Control
CHADDriver
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
CHA Command Line Opera@ons
37
Checking for Health Issues and Correc(ve Ac(ons with CHACTL QUERY DIAGNOSIS $ chactl query diagnosis -db oltpacdb -start "2016-10-28 01:52:50" -end "2016-10-28 03:19:15" 2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected] 2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected] 2016-10-28 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected] 2016-10-28 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected] Problem: DB Control File IO Performance Description: CHA has detected that reads or writes to the control files are slower than expected. Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were slow because of an increase in disk IO. The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR) performance. Action: Separate the control files from other database files and move them to faster disks or Solid State Devices. Problem: DB Log File Switch Description: CHA detected that database sessions are waiting longer than expected for log switch completions. Cause: The Cluster Health Advisor (CHA) detected high contention during log switches because the redo log files were small and the redo logs switched frequently.
Action: Increase the size of the redo logs.
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor – Command Line Opera@ons
38
HTML Diagnos(c Health Output Available (-‐html <file_name>)
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Using EMCC for Alerts and Correc@ve Ac@ons
39
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 40
Using the CHA GUI to Perform Root-‐Cause Analysis Overview
• Standalone Java GUI Client • Must be run on local cluster node
• Can be run against live GIMR or MDB (dump) file chactl export repository -format mdb -start '2017-05-01 00:00:00' -end '2017-05-10 00:00:00'
• Used internally for development
• Will be available and maintained on Oracle Technology Network soon.
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Calibra@ng CHA to your RAC Deployment
• Calibra@on Goal: Increase sensi@vity and accuracy with sufficient warning • Release ships with conserva@ve models to minimize false warnings – DEFAULT_CLUSTER for each cluster node – DEFAULT_DB for each database instance
• Use your own data for periods of “normal opera@ons” to increase sensi@vity – Recommended minimum 6 hour period – Should include all normal workload phases for that model
• Models may be changed dynamically online using CHACTL
41
Overview
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Calibra@ng CHA to your RAC deployment
42
Choosing a Data Set for Calibra(on – Defining “normal” $ chactl query calibration –cluster –timeranges ‘start=2016-10-28 07:00:00,end=2016-10-28 13:00:00’ Cluster name : mycluster Start time : 2016-10-28 07:00:00 End time : 2016-10-28 13:00:00 Total Samples : 11524 Percentage of filtered data : 100% 1) Disk read (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.11 0.00 2.62 0.00 114.66 <25 <50 <75 <100 >=100 99.87% 0.08% 0.00% 0.02% 0.03% 2) Disk write (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.01 0.00 0.15 0.00 6.77 <50 <100 <150 <200 >=200 100.00% 0.00% 0.00% 0.00% 0.00%
3) Disk throughput (ASM) (IO/sec) MEAN MEDIAN STDDEV MIN MAX 2.20 0.00 31.17 0.00 1100.00 <5000 <10000 <15000 <20000 >=20000 100.00% 0.00% 0.00% 0.00% 0.00% 4) CPU utilization (total) (%) MEAN MEDIAN STDDEV MIN MAX 9.62 9.30 7.95 1.80 77.90 <20 <40 <60 <80 >=80 92.67% 6.17% 1.11% 0.05% 0.00%
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Calibra@ng CHA to your RAC deployment
• Create and store the new model $ chactl query calibrate cluster –model daytime –timeranges ‘start=2016-10-28 07:00:00, end=2016-10-28 13:00:00’
• Begin using the new model $ chactl monitor cluster –model daytime
• Confirm the new model is being used $ chactl status –verbose
monitoring nodes svr01, svr02 using model daytime monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB
43
Crea(ng a new CHA Model with CHACTL
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
High Availability Improvements
Con@nuous Availability Features
1
2
44
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Availability for applica@ons – Applica(on Con(nuity
45
Availability during Planned Maintenance
Con@nues Availability
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Availability for applica@ons – Applica(on Con(nuity
46
Availability during Planned Maintenance
Con@nues Availability
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Oracle Real Applica(on Clusters 12c Release 2 Con(nuous Service Availability
Real Applica(on Service Levels
• Scales PDBs and Services
• 2 second detec@on on EXA
• Recovery in low seconds
• Drains work gradually
• Recovers in-‐flight with AC
“Always Running”
47
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
• Recover in-‐flight with Applica@on Con@nuity • ADG sessions survive standby role change • Drain then switchover, AC recovers stragglers
Switchover to <db_resource_name> [wait]
FAILOVER
Data Guard Observer
RAC Primary RAC Standby Site A Site B
Oracle Ac(ve Data Guard 12c Release 2 Con(nuous Service Availability
48
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
§ Replays in-‐flight work on recoverable errors
§ Masks hardware, sowware, network, storage errors and @meouts
§ 12.1 JDBC-‐Thin, UCP, WebLogic Server, 3rd Party Java applica@on servers
§ OCI, ODP.NET unmanaged, JDBC Thin on XA, Tuxedo, SQL*Plus
§ RAC, RAC One, & Ac@ve Data Guard
In-‐flight work con(nues
Applica(on Con(nuity
49
12.2
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
1 – Normal Opera(on
• Client marks database requests
• Server decides which calls can & cannot be replayed
• Directed, client holds original calls, their inputs, and valida@on data
2 – Outage Phase 1: Reconnect
• Checks replay is enabled
• Verifies @meliness
• Creates a new connec@on
• Checks target database is valid for replay
• Uses Transac@on Guard to guarantee last outcome
50
3 – Outage Phase 2: Replay
• Replays captured calls
• Ensures results returned to app match original
• On success, returns control to the applica@on
Under the Covers
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 51
Steps to use Applica@on Con@nuity
Check What to do
Iden@fy Requests Return connec(ons to pool -‐ UCP, WebLogic Ac@ve GridLink, 3rd Party Containers using UCP , OCI Session Pool, ODP.NET Unmanaged, Tuxedo
JDBC Deprecated Classes Replace non-‐standard classes (MOS 1364193.1); Use AC orachk to know
Side Effects Use disable or another connec@on if a request should not be replayed
Callbacks UCP and WLS – with labels do nothing. 12.2 set FAILOVER_RESTORE=LEVEL1 Else register a callback for applica@ons that change state outside requests
Mutable Func@ons Grant keeping mutable values, e.g. sequence.nextval
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Run the AC Assessments
52
How effec@ve is Applica@on Con@nuity for your applica@on ? Where Applica@on Con@nuity is not in effect -‐ what steps need to be taken ?
No Steps
1 Analyze and Report Coverage
2 Report usage of deprecated Java Classes
Assessment tool input
output
Applica@on traces
user Out put
orachk
read
h=ps://blogs.oracle.com/WebLogicServer/entry/using_orachk_for_coverage_analysis
52
Available in ORAchk
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
For owned sequences: ALTER SEQUENCE.. [sequence] [KEEP|NOKEEP];
CREATE SEQUENCE.. [sequence] [KEEP|NOKEEP];
Grant and Revoke for other users:
GRANT [KEEP DATE TIME | KEEP SYSGUID] [to USER]
REVOKE [KEEP DATE TIME | KEEP SYSGUID] [from USER]
GRANT KEEP SEQUENCE on [sequence] [to USER] ;
REVOKE KEEP SEQUENCE on [sequence] [from USER]
53
Grant Mutables Keep original func@on results at replay
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Decide if any requests should not be replayed
e.g. Autonomous Transac@ons UTL_HTTP UTL_URL UTL_FILE UTL_FILE_TRANSFER UTL_SMTP UTL_TCP UTL_MAIL DBMS_JAVA callouts EXTPROC
54
Don’t Want to Replay Disable replay for requests that should not be replayed
Use another connec(on or disable API
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Configura@on
FAILOVER_TYPE = TRANSACTION for Applica@on Con@nuity
FAILOVER_RESTORE = LEVEL1 for common states restored at failover
AQ_HA_NOTIFICATIONS=True for FAN with OCI driver , ODP.NET, Tuxedo, SQL*Plus
55
For Java
Set Service A=ributes
Use a replay data source (local or XA) replay datasource=oracle.jdbc.replay.OracleDataSourceImpl For OCI, ODP.NET, Tuxedo, SQL*Plus
On when enabled on the service
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Killing Sessions -‐ Extended DBA Command Replays
alter system kill session … noreplay BEST METHOD
dbms_service.disconnect_session([service], dbms_service.noreplay) BEST METHOD
srvctl stop service -‐db orcl -‐instance orcl2 -‐force YES
srvctl stop service -‐db orcl -‐node rws3 -‐force YES
srvctl stop service -‐db orcl -‐instance orcl2 –noreplay -‐force
srvctl stop service -‐db orcl -‐node rws3 –noreplay -‐force
alter system kill session … immediate YES
56
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Availability for applica@ons – Applica(on Con(nuity
57
Availability during Planned Maintenance
Con@nues Availability
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
• Complex build process repeated for each node
• Error prone • Longest down-‐@me and maintenance window
• Have to create backup (no built-‐in fallback plan)
• How do you enforce standardiza@on?
• Build gold image once, use everywhere
• Fewest steps, simplest process
• Shortest down-‐@me and maintenance window
• Built-‐in Fallback • Built-‐in standardiza@on
• Complex build process repeated for each node
• Error prone • Shorter down-‐@me and maintenance window
• Built-‐in Fallback • How do you enforce standardiza@on?
58
What is the best way to apply maintenance?
1 2 3 1 2 3 1 2
Update in Place Clone, Update and Switch Deploy Gold Image, Switch
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
• Driw not seen un@l scan takes places
• Scanning unchanged targets is unnecessary work
• Does not prevent driw
• No @me lag between driw and alert
• No extra work • Does not prevent driw
59
• Locked configs cannot driw • Can trigger alert if unauthorized changes aQempted
• Can trigger alert if authorized changes made
What is the best approach to handling sowware driw?
Scan
Trigger Alert
Prevent
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Streamline the Distribu@on Process
• Ship only once – To a customer, to a site, to a pool
• Ship to interested par@es only – Subscribers • Ship only what is necessary – Updated Modules, Updated Files, Updated Blocks
• Deploy non-‐disrup@vely – Ship any @me, choose when to use it
60
Customer
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 61
• Simple • Prevent errors, enable easy correc@ons • Use Gold Images for all scenarios • Enable mass opera@ons on 1000s of nodes
Rapid Home Provisioning and Maintenance
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Build Inventory of Gold Images
62
Create once on RHP Server
Installed homes
11.2.0.4.1
DB
12.1.0.2 Custom
RHP Server • Uptake current estate by promo(ng exis(ng homes to gold images
• Create new homes and promote to gold images axer valida(on
• Assign states to images for lifecycle management
GRID 11.2.0.4.3
WLS 12.2.1
• Oracle internal users: import image from GIaaS Grid
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 63
Supported targets and environments Manage exis(ng and create new Pools, Homes, and Databases
• Patch and Upgrade exis@ng deployments – No pre-‐requisites (config, agent, daemon…) for targets – Database and Grid Infrastructure 11.2.0.3, 11.2.0.4, 12.1.0.2, 12.2.0.1
• Provision, Scale, Patch and Upgrade new Clusters and Databases – 11.2.0.4, 12.1.0.2, 12.2.0.1
• Bare metal, VMs, CDBs, non-‐CDBs • SI (standalone, Restart, Grid Infr), RAC One, RAC • Linux, Solaris, AIX • Generic sowware homes
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Easy to create Server, start managing current estate
• RHP Server fully self-‐contained – Commodity hardware or engineered systems, can be clustered for HA – Enable with single srvctl command – Lightweight -‐ can co-‐exist with other func@ons
• No new sowware needed on targets • No run-‐@me dependency between Server and targets
64