Data Guard – Fast-Start Failover
DOAG Regio Stuttgart – 18.05.2006
Torsten Rosenwald
Oracle 10g High Availability - Data Guard and Maximum Availability 2 © 2006
Agenda
> Introduction
> Concept / Architecture
> Flashback & Reinstate
> Fast-Start Failover
> Core Messages
Data Guard and Maximum Availability
Know-howWe know how
Oracle 10g High Availability - Data Guard and Maximum Availability 3 © 2006
The Data Guard Manager
> Data Guard is part of Enterprise Edition (no special option), it includesall functionality 'to manage Standby Databases'
> The Data Guard Broker Framework (applications and language) facilitates the following tasks of Data Guard» Setup and configuration» Monitoring/control of Redo Log transport- and log apply services» Core operating tasks (Switchover, Failover,Reinstate, Fast-Start Failover, etc.)
> Standard Edition doesn‘t contain Data Guard features like» Automated log-transport or» Managed Recovery Modus
> There is a Trivadis package implementing the basic Data Guardfunctionality ☺
Oracle 10g High Availability - Data Guard and Maximum Availability 4 © 2006
Agenda
> Introduction
> Concept / Architecture
> Flashback & Reinstate
> Fast-Start Failover
> Miscellaneous
> Core Messages
Data Guard Fast-Start Failover
Know-howWe know how
Oracle 10g High Availability - Data Guard and Maximum Availability 5 © 2006
Dataguard Broker Framework
PrimaryDatabase Standby
Database
Primary Site Standby Site
OnlineLog Files
LocalArchiving
StandbyLog Files
ArchivedLog Files
RemoteArchiving
Log Transport
Log Apply if no standby
redo log filesconfigured
Real Time Log Apply
Data Guard monitor
Data Guard monitor
Oracle Data GuardGUI or Command Line Interface
Oracle 10g High Availability - Data Guard and Maximum Availability 6 © 2006
Data Loss Protection Modes
> Maximum Performance: less performance impact to the primary database, asynchronous redo transfer
> Maximum Availability: highest possible level of data protection without compromising the availability of the primary database, synchronous redo transfer when the standby database is up
> Maximum Protection: this protection mode ensures that the primary database and at least one standby database are always synchronous
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
Oracle 10g High Availability - Data Guard and Maximum Availability 7 © 2006
Agenda
> Introduction
> Concept / Architecture
> Flashback & Reinstate
> Fast-Start Failover
> Core Messages
Data Guard and Maximum Availability
Know-howWe know how
Oracle 10g High Availability - Data Guard and Maximum Availability 8 © 2006
Why Flashback?
> ASM is nice…» but the real value is leveraged together with RAC
> Flashback database is nice…» but the real value is leveraged in a Data Guard Environment
» Almost all real cool features provided with 10g Data Guard need Flashback Database!
Oracle 10g High Availability - Data Guard and Maximum Availability 9 © 2006
IncrementalLevel 0
SCN33557 34382 37121 41389 52891 …
Restore
Recovery
Flashback LOGS
Flashback
Concept: Flashback Database
Oracle 10g High Availability - Data Guard and Maximum Availability 10 © 2006
Concept of Database Reinstate (1)
> prior to 10g» backup from the new primary
database» duplicate with backup» recreate standby database
> from 10g» reinstate database
standby database
Failover
LIONOLD Primary
Database
LIONOLD Primary
Database
LIONOLD Primary
Databaseformer primary
database
What about the former primary database?
Oracle 10g High Availability - Data Guard and Maximum Availability 11 © 2006
Concept of Database Reinstate (2)
LIONOLD Primary
Database
REINSTATE DATABASE ‘THEDB_BONN'
LIONOLD Primary
Database
FAILOVER TO ‘THEDB_BERLIN'LIONOLD Primary
Database
LIONOLD Primary
Databaseformer primary
database
standby database
TIGERStandby Databaseprimary
database
Oracle 10g High Availability - Data Guard and Maximum Availability 12 © 2006
Agenda
> Introduction
> Concept / Architecture
> Flashback & Reinstate
> Fast-Start Failover
> Core Messages
Data Guard and Maximum Availability
Know-howWe know how
Oracle 10g High Availability - Data Guard and Maximum Availability 13 © 2006
Physical Standby: Startup Behavior 10g versus 9i
startup nomount
alter database mount;
alter database open;
recover managed standby database disconnect;
startu
p
dmon 10g takes over
dmon 9i takes over
Oracle 10g High Availability - Data Guard and Maximum Availability 14 © 2006
> What is the biggest problem in a cluster?
Split Brain!
> What is the biggest problem in a Data Guard environment?
More than one primary!
> How can this happen? Primary re-availability after standby activation
Physical Standby – Startup Issue (1)
PRIM DB
PRIM DB?
Oracle 10g High Availability - Data Guard and Maximum Availability 15 © 2006
Physical Standby – Startup Issue (2)
> Automatic database startup as part of system startup» 2 primary databases possible after standby has been activated
> Manual database startup after system boot» No issue after activation of standby, because manual intervention is
necessary anyway» Requires additional attention after every system startup
> Is there a better solution?
Oracle 10g High Availability - Data Guard and Maximum Availability 16 © 2006
Physical Standby – Activation Issue
> Main criticism of standby databases: too much manual action
> Manual intervention is required for a failover» Need some administrative checks before to validate the status of the
standby database, e.g. if all redo are applied» More downtime
> Manual intervention to recreate a new standby database » No HA until the setup of the new standby is finished
> How can this be addressed? Fast-Start Failover
Oracle 10g High Availability - Data Guard and Maximum Availability 17 © 2006
Concept
1. Observed Data Guard environment
2. Fast-Start-Failover (automatic)
3. Reinstate (automatic)
Primary Standby
Primary Primary
PrimaryStandby
Oracle 10g High Availability - Data Guard and Maximum Availability 18 © 2006
When is a Fast-Start Failover triggered?
> Primary site failure » Server crash or server shutdown (without database shutdown)
> Primary database failure» Instance failure (last running instance if RAC)» Shutdown abort (but not with normal or immediate)» Data file is taken offline
> Network failure (special case)» Documentation of when and when not automatic activation will
happen is quite large. Read and test carefully. We will show onecase.
Oracle 10g High Availability - Data Guard and Maximum Availability 19 © 2006
Network Failure (1)
primary DB standby DB
log transport
observer
Select fs_failover_status,fs_failover_observer_presentfrom v$database; ---on primary siteFS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT-------------------- -----------------------------SYNCHRONIZED NO
Oracle 10g High Availability - Data Guard and Maximum Availability 20 © 2006
Network Failure (2)
standby DB
log transport
observer
Select fs_failover_status,fs_failover_observer_presentfrom v$database; ---on primary siteFS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT-------------------- ----------------------------STALLED NO
database STALLED
FAILOVER start
Oracle 10g High Availability - Data Guard and Maximum Availability 21 © 2006
Network Failure (3)
observer
new primary DB
Select fs_failover_status,fs_failover_observer_presentfrom v$database; ---on new primary siteFS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT-------------------- ----------------------------REINSTATE REQUIRED YES
database STALLED
Oracle 10g High Availability - Data Guard and Maximum Availability 22 © 2006
Network Failure (4)
log transport
observer
database reinstate
new primary DBnew standby DB
Select fs_failover_status,fs_failover_observer_presentfrom v$database; --on primary site and standby siteFS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT-------------------- -----------------------------SYNCHRONIZED YES
Oracle 10g High Availability - Data Guard and Maximum Availability 23 © 2006
Observer location?
Standby computingcenter
Main computingcenter
primary DB standby DB
log transport
= Observer
1 2
34
Public LANPublic LAN
5
Oracle 10g High Availability - Data Guard and Maximum Availability 24 © 2006
Observer location …
> Not really an option
> Reason» no prevention from system crash!
> Consequence» additional observer machine is necessary!
1 2
primary / standby DB
Oracle 10g High Availability - Data Guard and Maximum Availability 25 © 2006
Observer location …
> Close (same fire prevention area) to theStandby Database
> Advantages» scenario - complete failing main
computing center – will be addressed
> Disadvantages» primary database will be heavily
dependent from the network» therefore unnecessary failover events are
possible
3
Primary Standby
Oracle 10g High Availability - Data Guard and Maximum Availability 26 © 2006
Observer location …
> Close (same fire prevention area) to theprimary database
> Advantages» Fast-start-failover works in most
important error situations- Instance crash, media failure, database file
offline …» primary database is not that much
dependent on the network» no unnecessary activations due to
networking issues
> Disadvantages» loss of the whole main computing center
is not addressed
4
Primary Standby
Oracle 10g High Availability - Data Guard and Maximum Availability 27 © 2006
Observer location …
> Third computing center / somewhere withinthe public LAN
> Advantages» basically all error scenarios addressed
> Disadvantages» Observer is separated from a network point of
view- Therefore the observer itself is more dependent
on the network» most companies do not operate 3 computing
centers» running the observer on some PC or
whatsoever in the public LAN means reducedavailability
5
Primary Standby
Oracle 10g High Availability - Data Guard and Maximum Availability 28 © 2006
Observer location …
> Consequences» Fast-start-failover is not an appropriate solution to overcome the
loss of a whole computing center. It is not a failover cluster!
» After switchover, setup 4 turns into setup 3 and vice versaException: the observer is switched somehow as well
» In many real life situations (no 3 computation centers) option 4 will be the best choice (tradeoff)
Oracle 10g High Availability - Data Guard and Maximum Availability 29 © 2006
Public LANPublic LAN
standbycomputingcenter
maincomputingcenter
primary DB standby DB
log transport
Observer
Observer location – The compromise
Oracle 10g High Availability - Data Guard and Maximum Availability 30 © 2006
Observer - Requirements
> Observer machine and configuration
> Special entry in Data Guard Broker configuration
> Maximum Availability Mode (mandatory)» but: special startup behaviour» but: primary stalls in certain situations
> Flashback database must be activated
Oracle 10g High Availability - Data Guard and Maximum Availability 31 © 2006
Observer - Data Guard additional Configuration
> Not much to configure, but much to describe (see manual)
> Fast-Start Failover is a feature of Oracle Data Guard, and can't run without a Data Guard Broker configuration!
edit database ‘THEDB_BONN'set property FastStartFailoverTarget = ‘THEDB_BERLIN';
edit database ‘THEDB_BERLIN'set property FastStartFailoverTarget = ‘THEDB_BONN';
edit configurationset property FastStartFailoverThreshold = 15;
enable fast_start failover;
Oracle 10g High Availability - Data Guard and Maximum Availability 32 © 2006
Observer – Configuration
» Start of Observer
» Better write a shell script with background execution, “start observer” does not terminate, - use the logfile option
» Change name of the observer binary file, this file is created in the working directory where you start the observer fsfo.dat. With the parameter 'FILE' you can change the file name descriptor, but not the location
dgmgrl -logfile /u00/app/oracle/local/dba/log/observer.log sys@THEDB_BONN "start observer"
Start observer file=fsfo_<DG_configuration_name>.dat
connect sys@THEDB_BONNstart observer
Oracle 10g High Availability - Data Guard and Maximum Availability 33 © 2006
Demo: Fast Start Failover
1. Configure Fast_Start Failover
2. Start Observer with connect to primary
3. Shutdown abort on the primary database THEDB_BONN
4. Wait until Fast_Start occurs on THEDB_BERLIN
5. Restart the old primary THEDB_BONN
6. Verify that observer reinstates database THEDB_BONN
Oracle 10g High Availability - Data Guard and Maximum Availability 34 © 2006
Conclusion (1)
+ Prevention of "Split Brain" due to accidental startup of former primary database
+ Reduced downtime through automatic activation of thestandby database
+ It is a small step for the DBA, but a giant leap from an availability point of view+ It is easy to configure+ The necessary checks are automatically done before a failover is
started
Oracle 10g High Availability - Data Guard and Maximum Availability 35 © 2006
Conclusion (2)
+ A failover solution without a shared disk system+ with additional advantages (enhanced data availibity)+ and even reduced failover time compared to HA cluster
− Many technical prerequisites (Flashback database, specialMaximum Availability Mode)
− No automatic failover to a second standby databasepossible
Oracle 10g High Availability - Data Guard and Maximum Availability 36 © 2006
Agenda
> Introduction
> Concept / Architecture
> Flashback & Reinstate
> Fast-Start Failover
> Core Messages
Data Guard and Maximum Availability
Know-howWe know how
Oracle 10g High Availability - Data Guard and Maximum Availability 37 © 2006
Data Guard andMaximum Availability - Core messages…
> Data Guard 10g » Flashback makes the difference
> Fast-Start Failover» Protection from 2 primary databases
due to inadvertend restart of failedprimary database
» Rather easy implementation / configuration
» Reinstate database – even automatically» Very short failover timesAt the core it's
about data.
> by Trivadis
Data Guard – Fast-Start Failover
DOAG Regio Stuttgart – 18.05.2006
Torsten Rosenwald