www.informatik-aktuell.de
IT-Tage Frankfurt 2015
Infrastructure at your Service.
Oracle Grid Infrastructure Cold failover cluster
IT-Tage Frankfurt 2015
Jérôme Witt Senior Consultant Mobile +41 79 961 27 73 [email protected] www.dbi-services.com
16/12/15
About me
Infrastructure at your Service.
Page 2
IT-Tage Frankfurt 2015
Experts At Your Service > 40 specialists in IT infrastructure > Certified, experienced, passionate
Based In Switzerland > 100% self-financed Swiss company > Over CHF 6 mio. turnover
Leading In Infrastructure Services > More than 100 customers in CH, D, & F > Over 40 SLAs dbi FlexService contracted
dbi services Who we are
Page 3 16/12/15
dbi services is hiring in Basel & Zürich ([email protected])
IT-Tage Frankfurt 2015
1. Introduction
2. Oracle Grid Infrastructure main components
3. Oracle Clusterware resource management
4. Cold failover cluster vs RAC One Node
5. Cold failover cluster @ dbi-services
6. User experience
7. Core Message
8. Homework
16/12/15
Agenda
Page 4
IT-Tage Frankfurt 2015 16/12/15
Introduction Oracle Grid. Infra. – Cold failover cluster
> Products > Oracle cluster software stack > Licensing
Page 5
IT-Tage Frankfurt 2015
Cold failover cluster > Very popular high availability solution > Database active only at one node at a time > Large choice of vendors
> Service Guard > Veritas cluster > Open Source solutions
Introduction - products
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Page 6
IT-Tage Frankfurt 2015
Oracle Grid Infrastructure, Clusterware do we speak the same language? > Oracle Grid Infrastructure is a suite of software which includes many
components like ASM,ACFS, Database Qualify of service > Oracle Clusterware is a generic, general purpose clustering solution for all
applications
Introduction – Oracle cluster software stack
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Consolidated pool of storage with Oracle Automatic Storage Management (ASM)
RAC HA_App1
Oracle Clusterware
Oracle ASM/ACFS
HA_App2
Page 7
IT-Tage Frankfurt 2015
Oracle Clusterware is free of charge > Reference “Database Licensing information 12.1” > For any kind of application J > Support: only if the server is running an Oracle product
Database licensing > Reference: “Licensing Data Recovery Environments”
> Method “Failover data recovery” > Multiple nodes have access to one single storage (SAN) > Includes the right to run the licensed program on an unlicensed spare
computer for up to a total of 10 separate days > Downtime for maintenance purposes counts towards! > If failover period exceeds 10 days then the failover node must be licensed
Introduction – Licensing
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Page 8
IT-Tage Frankfurt 2015
1. Introduction
2. Oracle Grid Infrastructure main components
3. Oracle Clusterware resource management
4. Cold failover cluster vs RAC One Node
5. Cold failover cluster@dbi-services
6. User experience
7. Core Message
8. Homework
16/12/15
Agenda
Page 9
IT-Tage Frankfurt 2015 16/12/15
Oracle Grid Infrastructure main components Oracle Grid. Infra. – Cold failover cluster
> Network > Shared storage > Single Client Access Name > The Big picture
Page 10
IT-Tage Frankfurt 2015
Network - Each node must have at least two interfaces > Private network – interconnect
> Used for communication between all cluster nodes to maintain the integrity of the cluster (CSS deamon) and by database instances for global cache management (Oracle RAC)
> Critical component which shall be fault tolerant > Oracle Clusterware support one to four interfaces (HAIP)
> Public network – node IPs, node VIPs .. > Required on each node by Oracle Clusterware for node applications
Or applications managed by Oracle Clusterware
Shared storage – “cluster quorum” > OCR is a file that contains the configuration information and status of the
cluster. (ie: DBCA uses the OCR for storing configuration) > Voting files are used to monitor cluster node status
Oracle Grid Infrastructure main components - 1
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Page 11
IT-Tage Frankfurt 2015
Shared storage - Oracle ASM/ACFS > ASM is the preferred storage manager for all database files
> Volume manager and a file system > 3 levels of mirroring (EXTERNAL,NORMAL,HIGH)
> ACFS is a file system on top of ASM for all non-database files
Single Client Access Name (SCAN) > Single name for client to access any database running on the cluster
> Default- composed of 3 IP addresses and 3 scan listener > Node assignment automatically controlled by Oracle Clusterware
> Database(s) register with the SCAN listeners remotely
> INIT parameter local_listener is automatically updated by the agent process “oraagent.bin”
Oracle Grid Infrastructure main components - 2
16/12/15
Oracle Grid. Infra. – Cold failover cluster
jdbc:oracle:thin:ora-de-ch-scan.jew.local:1521/CDBUTF8
remote_listener=ora-de-ch-scan.jew.local:1521
Page 12
IT-Tage Frankfurt 2015
The Big picture
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Page 13
hba2
+ASM1
DB11
NIC3 NIC4 HAIP NIC1
NIC2 PUB
hba1
VIP1
SCANVIP1
SCAN lsnr 1 SCAN lsnr 3
SCAN Address
a bunch of disks
jdbc/sqlnet/oci
data ocr/voting
tcp tcp
udp/infiniband/rds
hba2
+ASM2
NIC3 NIC4 HAIP NIC1
NIC2 PUB
hba1
VIP2
SCANVIP2
SCAN lsnr 2
VIP3
SCANVIP3
IT-Tage Frankfurt 2015
1. Introduction
2. Oracle Grid Infrastructure main components
3. Oracle Clusterware resource management
4. Cold failover cluster vs RAC One Node
5. Cold failover cluster @ dbi-services
6. User experience
7. Core Message
8. Homework
16/12/15
Agenda
Page 14
IT-Tage Frankfurt 2015 16/12/15
Oracle Clusterware resource management Oracle Grid. Infra. – Cold failover cluster
> Types > Resources > Attributes
Page 15
IT-Tage Frankfurt 2015
Oracle Clusterware use types to organize “similar” resources > Every resource must have a type > Manage (inherit) only necessary resource attributes
Oracle Clusterware resource management - 1
16/12/15 Page 16
Oracle Grid. Infra. – Cold failover cluster
Resource Local listener(s), ASM … Database(s) ,VIP, SCAN …
Resource Attributes
RESTART_ATTEMPTS CHECK_INTERVAL
…
STOP_TIMEOUT START_DEPENDENCIES
….
Oracle Clusterware managed resource (CRSD)
Resource (base) Types
Nx resource types Nx resource types
local_resource cluster_resource generic_application
ora.database.type …
ora.listener.type …
IT-Tage Frankfurt 2015
Resource base types > All base types are supplied by Oracle
Local resource > Runs on each server of the cluster > CRS automatically extends local resources tied to the new server > They do not failover from one server to another
Cluster resource > Subject to switchover and failover > Subject to resource attribute CARDINALITY
Generic application > Deprecated > Pre-Oracle CRS 11.2 resources
Oracle Clusterware resource management - 2
16/12/15 Page 17
Oracle Grid. Infra. – Cold failover cluster
IT-Tage Frankfurt 2015
Attributes > Default values can be used for some of them > Others must be specified depending on your needs J
Oracle Clusterware resource management - 3
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Resource Attributes
Configuration Monitoring
Read-Write Read-Write Read-Only
- ACTION_SCRIPT - ACL - AUTO_START - PLACEMENT - HOSTING_MEMBERS - SERVER_POOLS - USR_ORA_ENV - ACTION_SCRIPT - START_DEPENDENCIES - CARDINALITY - aso …
- CHECK_INTERVAL - RESTART_ATTEMPS
FAILURE_INTERVAL - STOP_TIMEOUT - START_TIMEOUT - UPTIME_THRESHOLD - aso …
- STATE - LAST_STATE_CHANGE - TARGET - RESTART_COUNT - aso …
Page 18
IT-Tage Frankfurt 2015
1. Introduction
2. Oracle Grid Infrastructure main components
3. Oracle Clusterware resource management
4. Cold failover cluster vs RAC One Node
5. Cold failover cluster @ dbi-services
6. User experience
7. Core Message
8. Homework
16/12/15
Agenda
Page 19
IT-Tage Frankfurt 2015 16/12/15
Cold failover cluster vs RAC One Node Oracle Grid. Infra. – Cold failover cluster
> Technology comparison > RAC One Node > Cold failover cluster
Page 20
IT-Tage Frankfurt 2015
Cold Failover cluster > Single instance database > Officially “replaced” by Oracle RAC
One Node starting with 11.2! L ? > Available without restriction for
Oracle SE,SE1 and EE! > Service disruption during service
relocation
Oracle RAC One Node > RAC enabled Oracle Home > Easy conversion to RAC from
RAC One Node and vice-versa > Online database relocation > Option for Oracle EE only
(cost ± 20%)
Cold failover cluster vs Oracle RAC One Node - 1
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Page 21
IT-Tage Frankfurt 2015
Oracle RAC One node > Based on a Oracle supplied type“ora.database.type”
(base type: cluster_resource) > Convenient Resource registration (instance suffixed by <OracleSID>_1) > CAUTION! DBCA creates the necessary “RAC aware database settings”
> Undo, Redo Threads depending on the database management
Cold failover cluster vs Oracle RAC One Node – 2
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Administrator managed database (as oracle RDBMS software owner)
# srvctl add database -db $ORACLE_SID
\ -oraclehome $ORACLE_HOME -dbtype RACONENODE -server orabs,orazh
Policy managed database (as oracle RDBMS software owner)
# srvctl add database -db $ORACLE_SID \ -oraclehome $ORACLE_HOME –dbtype SINGLE -serverpool my_pool
Manage Rac One Node
# srvctl start database –db $ORACLE_SID
# srvctl relocate database –db $ORACLE_SID
# srvctl add service –s <ServiceName> -d $ORACLE_SID
Page 22
IT-Tage Frankfurt 2015
Addendum “Oracle RAC One Node” > SINGLE instance database type restricted for Oracle Restart
> Oracle RAC One Node Home @ ACFS = BUG LL > Oracle Grid Infra + RDBMS 12.1.0.2 GI - PSU Jan2015 > Failover capabilities are not affected!
Cold failover cluster vs Oracle RAC One Node – 3
16/12/15
Oracle Grid. Infra. – Cold failover cluster
# Administrator managed database
PRCD-1146 : Database CDB1 is not a RAC One Node database
# Policy managed database
PRKO-2067 : The size of server pool should be one for single instance database
# srvctl relocate database -db $ORACLE_SID
...
PRKH-1001 : HASContext Internal Error
PRKH-3005 : Entity ncdb_1 was not registered with CSS CLSS return code=12
Page 23
IT-Tage Frankfurt 2015
Cold failover cluster > 2 variants are possible:
> Using the deprecated resource (base) type “generic_application”
> Using a user defined resource and type with an action script
Cold failover cluster vs Oracle RAC One Node – 4
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Attribute Value
START_PROGRAM /u01/app/oracle/admin/DB1/dbstart.sh
STOP_PROGRAM /u01/app/oracle/admin/DB1/dbstop.sh
CLEAN_PROGRAM /u01/app/oracle/admin/DB1/dbclean.sh
PID_FILES /u01/app/oracle/admin/DB1/pid.file
Page 24
IT-Tage Frankfurt 2015
1. Introduction
2. Oracle Grid Infrastructure main components
3. Oracle clusterware resource management
4. Cold failover cluster vs RAC One Node
5. Cold failover cluster @ dbi services
6. User experience
7. Core Message
8. Homework
16/12/15
Agenda
Page 25
IT-Tage Frankfurt 2015 16/12/15
Cold failover cluster @ dbi services Oracle Grid. Infra. – Cold failover cluster
> Setup > Resource dependencies > Monitoring > Placement
Page 26
IT-Tage Frankfurt 2015
Oracle Clusterware manage the resources using resource attributes
Cold failover cluster @ dbi services - 1
16/12/15 Page 27
Oracle Grid. Infra. – Cold failover cluster
User defined resource
cluster_resource clu_db.type
HOSTING_MEMBERS= node1,node2
STOP_TIMEOUT=600
RESTART_ATTEMPTS=5
…
ACTION_SCRIPT=
…
new attribute(s)
STOP_TIMEOUT=600
RESTART_ATTEMPTS=2
…
attributes inherited or modified
IT-Tage Frankfurt 2015
The resource is managed by an Oracle Clusterware supplied agent which calls the action script (start/stop/clean/check)
> 12.1.0.1-$GRID_HOME/log/<Node>/agent/crsd/scriptagent_<Usr> > 12.1.0.2-$GRID_BASE/diag/crs/<Node>/crs/trace/crsd_scriptagent<Usr>
Cold failover cluster @ dbi services - 2
16/12/15
Oracle Grid. Infra. – Cold failover cluster
2015-04-09 21:28:21.302: [DB1.db][1749710592] {1:62596:2} [check] Executing action script: /u01/app/oracle/local/dmk/bin/db_crs.ksh[check]
DB1 START/STOP/CHECK/ CLEAN
CRSD
Rac One Node/RAC VIP
scriptagent (cluster_resource)
orarootagent (ora.cluster_vip.type)
oraagent (ora.database.type) …
ACTION_SCRIPT
user defined resource/type
Page 28
IT-Tage Frankfurt 2015
Resource start/stop dependencies
Cold failover cluster @ dbi services - 3
16/12/15 Page 29
Oracle Grid. Infra. – Cold failover cluster
ora.scan_listener.type
ora.listener.lsnr
ora.DATA.dg ora.FRA.dg
ora.asm ACFS
database resource
hard()
ora.DATA.dg ora.FRA.dg
Databse resource
pullup()
weak()
hard() shutdown
intermediate
IT-Tage Frankfurt 2015
Step 1: Create the user defined type “clu_db.type”
Step 2: Create the database resource > Beware! The resource can’t be prefixed by “ora.” (Oracle supplied res.)
That’s it J
Cold failover cluster @ dbi services - 4
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Run as Grid infrastructure software owner (usually O.S user grid)
# crsctl add type clu_db.type -basetype cluster_resource
\ -file [path to attributes file]
Run as the Oracle database O.S user (or add ACL explicitly)
# crsctl add resource <OraSID>.db -type clu_db.type –attr
”HOSTING_MEMBERS=orabs orazh
START_DEPENDENCIES=hard(ora.DATA.dg,ora.FRA.dg) weak(type:ora.scan_listener.type) pullup(ora.LISTENER.lsnr) STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DATA.dg,shutdown:ora.FRA.dg,intermediate:ora.acfs.vol0_u01.acfs) ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--"
Page 30
IT-Tage Frankfurt 2015
Monitoring > Do not shutdown the database from SQL*Plus
Cold failover cluster @ dbi services - 5
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Run as the user referenced by the ACL type/resource attribute (usually O.S user oracle or grid)
# crsctl status res NCDB.db -f | grep ENABLED
ENABLED=1
# crsctl modify res NCDB.db -attr "ENABLED=0"
# crsctl status res NCDB.db -t ----------------------------------------------------------------------Name Target State Server State details ----------------------------------------------------------------------Cluster Resources ----------------------------------------------------------------------NCDB.db 1 ONLINE ONLINE orabs STABLE ----------------------------------------------------------------------
# crsctl status res NCDB.db -f | grep AUTO_START
AUTO_START=restore
Page 31
IT-Tage Frankfurt 2015
Placement policies for Cold failover clusters > Administrator Managed (previous example) > Policy Managed “server pools”
Cluster “partitioning”: Two (2) or N nodes, all active protecting single instance databases > Spread the the databases (evenly) across all nodes > Remain protected against node failures > Shutdown all non critical database in case of node failure
Cold failover cluster @ dbi services - 6
16/12/15
Oracle Grid. Infra. – Cold failover cluster
§
Critical.pool
ERP
ECOM
Low.pool
DWH
UAT
Page 32
IT-Tage Frankfurt 2015
Partitioned “(Cold) failover cluster”
Cold failover cluster @ dbi services - 7
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Modify type to spread the resource evenly (reset HOSTING_MEMBERS)
crsctl modify type clu_db.type
\ -attr "... dispersion(type:clu_db.type )’”
Add the serverpools for “partitioning”
crsctl add srvpool HCRITICITY.sp -attr "IMPORTANCE=1, MIN_SIZE=1, MAX_SIZE=1"
crsctl add srvpool LCRITICITY.sp -attr "IMPORTANCE=0, MIN_SIZE=1, MAX_SIZE=1”
Adapt the resources
crsctl modify res ECOM.db -attr "SERVER_POOLS=HCRITICITY.sp, PLACEMENT=favored ACTIVE_PLACEMENT=1 ..."
crsctl modify res UAT.db -attr "SERVER_POOLS=LCRITICITY.sp, PLACEMENT=favored ACTIVE_PLACEMENT=1 ..."
Page 33
IT-Tage Frankfurt 2015
1. Introduction
2. Oracle Grid Infrastructure main components
3. Cold failover cluster vs RAC One Node
4. Cold failover cluster @ dbi services
5. User experience
6. Core Message
7. Homework
16/12/15
Agenda
Page 34
IT-Tage Frankfurt 2015 16/12/15
User experience Oracle Grid. Infra. – Cold failover cluster
> DNS flooding > ASYNC I/O > srvctl vs crsctl
Page 35
IT-Tage Frankfurt 2015
DNS flooding? (process mDNSd.bin) > Enable DNS cache on each nodes > Allows Oracle Clusterware to better tolerate network failures
User experience - 1
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Page 36
IT-Tage Frankfurt 2015
ASYNC I/O operations > Oracle recommends to limit the maximum number of allowable
concurrent AIO requests within the database in Grid Infrastructure Installation guide: > fs.aio-max-nr = 1048576
> Kernel parameters are automatically by installing the RPMs > oracle-rdbms-server-12cR1-preinstall > oracle-rdbms-server-11gR2-preinstall
UNABLE TO RESERVE KERNEL RESOURCES FOR ASYNCHRONOUS DISK I/O (Doc ID 579108.1) > Oracle’s M.O.S recommendation fs.aio-max-nr= 3145728
User experience - 2
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Fri Feb 27 10:50:21 2015 Error 27090 requesting async resources falling back to sync, expect performance degradation Error 27090 requesting async resources falling back to sync, expect performance degradation
Page 37
IT-Tage Frankfurt 2015
SRVCTL vs CRSCTL – Oracle discourage to use CRSCTL to modify > Use SRVCL for Oracle supplied resources (Listener, network, SCAN IPs,
VIPs, SAN listeners, and so ...) > Use CRSCTL to manage Oracle Clusterware
> Version,CSS, CRSD, cluster maintenance, aso … > User defined resources and types
EXCEPTION > User defined type/resource, user defined server pools, aso …
User experience - 3
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Run as the user referenced by the ACL type/resource attribute (usually O.S user oracle or grid)
# crsctl start resource <OraSID>.db
# crsctl stop resource <OraSID>.db
# crsctl relocate resource <OraSID>.db
# crsctl modify resource <OraSID>.db –attr “ENABLED=0”
Page 38
IT-Tage Frankfurt 2015
Never trust in any kind of hardware > CISCO USC everything is redundant
> Are your sure that the system engineers are gurus? > Of course it’s redundant but it must be correctly configured > Split-Brain because of a Fabric Interconnect failure
> IBM SAN Storwize > Automatic LUN failover from a controller to another > Great! the Voting files are constantly accessed by all cluster nodes > Reboot less fencing (disktimeout 200s) because of a hanging failover
Last one but not least, the interruptions were transparent for the end-users! J
User experience - 4
16/12/15 Page 39
Oracle Grid. Infra. – Cold failover cluster
IT-Tage Frankfurt 2015
1. Introduction > Vocabulary > Oracle cluster software stack > Oracle Highly Available architectures
2. Oracle Grid Infrastructure main components
3. Oracle Clusterware resource management
4. Cold failover cluster vs RAC One Node
5. Cold failover cluster
6. User experience
7. Core Message
8. Homework
16/12/15
Agenda
Page 40
IT-Tage Frankfurt 2015
> Knowledge > Complexity (less than RAC) > EM12c integration > ACFS O.S integration (ie: tail)
> Features > Not only reserved for databases > Flexibility > Command-Line tools JJJ > No additional license fees
(Data Recovery env. 10-days rule) > Single point of support
Core Message
16/12/15
Oracle Grid. Infra. – Cold failover cluster
dbi services “Cold failover cluster” works out of the box J
Page 41
IT-Tage Frankfurt 2015
1. Introduction
2. Oracle Grid Infrastructure main components
3. Oracle Clusterware resource management
4. Cold failover cluster vs RAC One Node
5. Cold failover cluster
6. User experience
7. Core Message
8. Homework
16/12/15
Agenda
Page 42
IT-Tage Frankfurt 2015
Homework – build your own Grid Infra. cluster
16/12/15
Oracle Grid. Infra. – Cold failover cluster
Virtual Box Oracle ZFS storage appliance
DNS Server
iSCSI (vboxnet1)
Private Interconnect (vboxnet0)
Public network (vibr0)
Page 43
IT-Tage Frankfurt 2015
Jérôme Witt Senior Consultant Mobile +41 79 961 27 73 [email protected] www.dbi-services.com
16/12/15
Any questions? Please do ask.
Infrastructure at your Service.
We look forward to working with you!
Page 44