Post on 03-Jun-2018
transcript
8/11/2019 sgsmmmomva
1/32
Hitachi-Oracles BCM Platform SolutionVerification Report on Oracle Active Data Guard
Date: March 2008Version: 1.0
- 1 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
2/32
- 2 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
1. Introduction
Oracle Database 11g Release 1 was released in October 2007 as the latest major version of Oracle Database.
In this version, Oracle Data Guard offers a number of new and innovative features to help ensure business
continuity by protecting important corporate data, including a feature that initiates a failover to a remote
standby system in the event the production system fails due to a disaster or emergency.
Oracle Corporation Japan and Hitachi Ltd. performed verification tests of Oracle Data Guard at the Oracle
GRID Center, building a large-scale transaction environment for a simulated production system combining
Hitachi BladeSymphony high-reliability blade servers and Oracle Database 11g Release 1.
This white paper introduces the BCM (Business Continuity Management) platform solution realized by
combining Hitachis hardware and Oracle Database 11g Release 1 and results of verification with respect to
the effectiveness of features provided by Oracle Active Data Guard, a new option in the Oracle Database
11g Release.
8/11/2019 sgsmmmomva
3/32
Acknowledgements
Oracle Corporation Japan established a partnership with Hitachi Ltd. and other grid strategy partner
companies in November 2006, opening the Oracle GRID Center
(http://www.oracle.co.jp/solutions/grid_center/index.html), a facility that incorporates the most advanced
technologies, with the goal of constructing next-generation business solutions capable of optimizing
enterprise system infrastructures. Publication of this white paper was made possible by hardware and
software provided to the Oracle GRID Center by Intel Corporation and Cisco Systems G.K., which support
the purpose of the Oracle GRID Center, as well as support and aid provided by engineers from these
companies. We wish to express our sincere gratitude to the companies and engineers for their support.
*All rights reserved.
Disclaimer
This document is provided for informational purposes only. The contents hereof are subject to change
without prior notice. Oracle Corporation Japan or Hitachi, Ltd does not warrant that this document is
error-free, nor does it provide any other warranties or conditions, whether expressed or implied, including
implied warranties and conditions of merchantability or fitness for a particular purpose. Oracle Corporation
Japan and Hitachi Ltd. specifically disclaim any liability with respect to this document. No contractual
obligations are formed by this document, either directly or indirectly. This document may not be
reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without
prior written permission from Oracle Corporation Japan and Hitachi Ltd.
Trademarks
BladeSymphony is a registered trademark of Hitachi Ltd.
ORACLE is a registered trademark of Oracle Corporation.
Intel and Xeon are trademarks of Intel Corporation in the United States and other countries.
Red Hat is a trademark or a registered trademark of Red Hat Inc. in the United States and other countries.
Linux is a registered trademark of Linus Torvalds.
Cisco is a registered trademark of Cisco Systems, Inc. in the United States and other countries.
Other names of companies and products used herein are trademarks or registered trademarks of their
respective owners.
- 3 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
4/32
2. Contents
1.
Introduction............................................................................................................. ................................ 2
2. Contents......... ................................................................ ................................................................ .......... 4
3.
Criticality of Business Continuity Management (BCM) ........................................................... .......... 6
4. Oracle Data Guard ....................................................... ................................................................ .......... 7
5. Examples of BCM Platform Solutions Realized by Hitachi and Oracle .......................................... 10
6. Verifying Oracle Active Data Guard........................................................... ........................................ 12
6-1 Purpose and specifics of verification tests ........................................................... ............................. 126-2 Verification environment................................ ................................................................ ................... 13
6-2-1 System configuration .......................................................... ................................................... 136-2-2 Hardware used ......................................................... .............................................................. 136-2-3 Software used...................................... ................................................................ ................... 14
6-2-4
About workloads............ ................................................................ ........................................ 14
7. Verification Results............. ............................................................... ................................................... 15
7-1 Creating a standby database using RMAN network duplicate........................................................ .. 157-2 Effective use of standby site via Oracle Active Data Guard and reductions in system downtime
based on effective use of standby site ....................................................... ........................................ 197-3 Measuring REDO apply performance for standby database ............................................................. 237-4 Fast-Start Failover............................................................... .............................................................. 277-5 Failover under high-load transaction condition........................................................................ ......... 29
8. Summary ............................................................ ................................................................ ................... 32
- 4 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
5/32
Figures
Figure 4-1: Schematics of Oracle Data Guard operation............................. .......................................... 7Figure 4-2: Effective use of standby database via Real-time Query...................................................... 8
Figure 4-3: Effective use of standby database with Snapshot Standby ................................................. 8
Figure 4-4: Fast-Start Failover operation ............................................................... ............................... 9Figure 5-1: Online system maintenance based on Hitachi hardware and Oracle Data Guard .......... 10Figure 5-2: Data protection with rapid application of server resources at reduced standby cost...... 11Figure 6-1: Configuration of the system used in verification tests .................................................... .. 13Figure 7-1: Conventional standby database production method........................................................ .. 16Figure 7-2: Creating a standby database using RMAN network duplicate.......................................... 16
Figure 7-3: Previous drawbacksRelationship between standby site use time and systemdowntimes ....................................................... .............................................................. 20
Figure 7-4: Effective use of standby site via Oracle Active Data Guard.................................. ........... 21Figure 7-5: Simulated business scenario used in verification tests...................................................... 22Figure 7-6: Process of failover to physical standby database...................................................... ........ 23Figure 7-7: Low REDO apply performance ........................................................... ............................. 24
Figure 7-8: Adequate REDO apply performance ............................................................. ................... 24Figure 7-9: Fast-Start Failover operation ............................................................... ............................. 27
Figure 7-10: Verifying failover under high-load transaction conditions............................................. 29
Table
Table 7-1: Apply performance comparison patterns .......................................................... ................. 25Table 7-2: Verification configuration patterns ....................................................... ............................. 29Table 7-3: Verified failure patterns ............................................................. ........................................ 29Table 7-4: Verified failure patterns and verification results........................................................ ........ 30
Graphs
Graph 6-1: CPU usage of primary database servers during load generation ....................................... 15
Graph 7-1: Comparison of standby data production times (via conventional method and usingRMAN network duplicate) ............................................................... ............................. 17
Graph 7-2: CPU usage and network transfer volume in creation of standby database via conventionalmethod (top: primary database server, bottom: standby database server) ..................... 17
Graph 7-3: CPU usage and network transfer volume in production of standby database using RMANnetwork duplicate (top: primary database server, bottom: standby database server).....18
Graph 7-4: Business transaction throughput, CPU usage of primary database server, and networktransfer volumes during creation of standby database using RMAN network duplicate19
Graph 7-5: Effective use of CPU resources of standby site with Oracle Active Data Guard .............. 21Graph 7-6: Reductions in system downtime via Oracle Active Data Guard during use of physical
standby site ....................................................... ............................................................. 22Graph 7-7: Comparison of volume of generated REDO against REDO apply performance...............25
Graph 7-8: Apply performance comparison........................................................... ............................. 26
Graph 7-9: Transactions during failure of all instances for the primary database and patterns in CPUusage for individual database servers ......................................................... ................... 31
- 5 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
6/32
3. Criticality of Business Continuity Management (BCM)
IT systems have grown increasingly important for corporations. Even in the event of an
earthquake-induced site failure or system failure caused by hardware malfunction, corporations must
continue to safeguard critical business data such as customer information and rapidly restore system
functionality to ensure continuing services. In particular, corporations must meet the following
requirements:
Business continuity
Interruptions or outages affecting important services pose serious threats to the entire business,
in certain cases resulting not just in lost income, but serious damage to the confidence of
customers and associated companies.
Data protectionData remains a critical asset for any company. Corporate datafor example, payroll or
employee information, client records, valuable research results, financial records, or history
informationcan require both significant sums and effort to reconstruct or regenerate once lost,
if this is even possible, and in some cases such data loss may impair a companys capacity to
continue operating.
System flexibility to adapt to changes
IT systems must ensure business continuity even in the event of unplanned system downtimes,
including system failure. These systems must also minimize the duration of planned downtimes,
including downtimes for software updates and hardware maintenance, to reduce any negative
effects on business operations. Particularly in the case of open systems, the rapid pace ofsoftware development requires that procedures for updating software and applying software
patches be kept as short as possible in order to keep systems up to date and maintain systems in
a robust condition. With respect to hardware, rapid developments in multi-core CPU technology
in recent years now makes it possible in certain cases to improve performance and reduce TCO
simply by replacing existing equipment with the latest hardware. In general, agility and
flexibility have become enterprise system requirements.
Cost efficiencyEffective use of standby sites
Also important for ensuring high cost efficiency is effective use of the server resources at
standby sites set aside for disasters and other emergency situations. Ensuring high cost
efficiency leads to the acquisition of countermeasures against system failure. Low resource
efficiency at established standby sites during ordinary operations, on the other hand, will
generally make it more difficult to acquire adequate funding, etc. for systems.
Combining Hitachi BladeSymphony or Hitachi Storage hardware with Oracle Real Application Clusters
(Oracle RAC) and Oracle Data Guard makes it possible to deliver a solution that resolves such issues.
- 6 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
7/32
- 7 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
4. Oracle Data Guard
Oracle Data Guard creates a standby database as a copy of the production database (called the primary
database) and provides features that perform a series of comprehensive services for that database, including
maintenance, management, and monitoring. A standby database is created as a copy that maintains
transactional consistency with the primary database. Following the creation of the standby database, REDO
sent from the primary database are used to reflect changes made in the primary database. If the primary
database becomes unavailable due to down, whether planned or unplanned, the standby database gains
primary database status to minimize the downtime. The Oracle Data Guard is provided by Oracle
Database Enterprise Edition.
Primary database
In normal operation In emergencies
Standby database
Copy
Primary database connected during normaloperation
Connection switches to standby database inthe event of failure.
Standby database
Primary database
Figure 4-1: Schematics of Oracle Data Guard operation
Standby databases generally come in one of two configurations. One, a physical standby database, is
identical to the primary database at the physical block level. The other, a logical standby database, is
identical to the primary database at the logical row data level.
The version of Oracle Data Guard in Oracle Database 11g Release 1 features various enhancements.
Introduced below are some of the new features examined in our verification testing.
Oracle Active Data Guard
In previous release versions, application of REDO had to be suspended when accessing data in a
physical standby database. A Oracle Active Data Guard option with Oracle Database 11g
Release 1 enables access to data in a physical standby database without suspending the
application of REDO. This feature is called Real-time Query. This feature enhancement allows
normal use of a physical standby database for reporting and other tasks..
8/11/2019 sgsmmmomva
8/32
Physical s tandby databasePrimary database
Normal operation
Patch processreporting
Backupacquisition
Off -loading of reporting process and
backup acquisition to standby database
Oracle Data Guard
Figure 4-2: Effective use of standby database via Real-time Query
Oracle Active Data Guard features a high-speed incremental backup feature based on a change-tracking file
when obtaining backups from a standby database, thereby offering both high availability and convenient
data protection against failures in the event of planned downtimes or unplanned outages at the production
site.
Snapshot Standby
The Snapshot Standby feature enables temporary use of a physical standby database as aneasy-to-use read-write test database. Even while being used as a test database, the physical
standby database can receive REDO from the primary database, allowing it to continue
providing the data protection feature. A snapshot standby database is also easily returned to
physical standby database status.
Snapshot standbyPrimary database
Normal operation
Oracle Data Guard
Client for testing
REDO transferscontinue whiledatabase is open
Open as atemporary read-
write testdatabase
Figure 4-3: Effective use of standby database with Snapshot Standby
- 8 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
9/32
Creating a standby database using RMAN network duplicate
Previous release versions required the acquisition of a full backup of the primary database on
local site, transfer of the backup to standby site and restoring of the backup to create a standby
database. With Oracle Database 11g Release 1, the enhanced Recovery Manager (RMAN)network duplicate feature, used for database duplication, backups primary database while at the
same time restoring over the network to the standby. Network duplicate saves time and storage
Fast-Start Failover
The Fast-Start Failover provides a feature that automatically detects failures in the primary
database and initiates failover after failure detection. Detection of failure and initiation of
failover are performed by the observer set up separately from the primary database and standby
database. The observer is a component of Data Guard Broker. Fast-Start Failover enables
automatic failover in the event of a primary database failure without administrator intervention.
Automatic failover
REDO transfer
Standby database
Primary database
Observer
Monitoring Monitoring
Figure 4-4: Fast-Start Failover operation
In previous release versions, Fast-Start Failover could be used only in Maximum Availability
modewhich required synchronous transfers of REDO. Oracle Database 11g Release 1 now
supports Maximum Performance mode to allow asynchronous REDO transfer settings, allowing
use in a wider range of operating environments. The new version also provides greater flexibility
in determining whether or not to initiate a failover at the time of failure detection, therebymeeting various failover requirements.
- 9 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
10/32
- 10 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
5. Examples of BCM Platform Solutions Realized by Hitachi
and Oracle
Described below are some examples of the BCM solution realized through the combination of Hitachi
hardware and Oracle Database 11g Release 1.
Online system maintenance
Figure 5-1shows an example of a Data Guard system configuration consisting of a production
business environment and a test environment. The test environment is used for report tasks using
Oracle Active Data Guard features or as a development environment using the Snapshot Standby
feature. This sample configuration permits not only the application of patch sets to Oracle
software and version updates, but also BladeSymphony server blade replacements and additions
in combination with the Oracle Data Guard switchover feature, and seamless online disk
addition to production environments via Hitachi Storage virtualization. The combination ofHitachi hardware and Oracle Database 11g Release 1 enables online maintenance of both
software and hardware with minimal impact on production operations.
Tes t enviro nment
Oracle Data Guard
configuration
(1) Switchover to
test environment
(2) Replacement
with new blade
server
Oraclerolling
upgrades
Online hard disk additionto storage pool
Online b lade serverreplacement
No need to set LVM, ASM, or o ther OSNo need to re boot for disk recognition
Switchover of productionenvironment to minimize impact on
business operations
Productionenvironment
Figure 5-1: Online system maintenance based on Hitachi hardware
and Oracle Data Guard
Data protection at reduced standby costs and rapid addition of server resources
Figure 5-2shows an example of a configuration with minimum allocation of standby database
server resources. It provides data protection using Oracle Data Guard while minimizing standby
database costs. If the primary database fails due to a disaster or other reason, a failover to the
standby database is initiated to enable continuing business operations. However, restoring the
8/11/2019 sgsmmmomva
11/32
service levels of the primary database generally requires the allocation of additional resources to
ensure the same level of processing capacity as the primary databasea requirement that
generally costs a great deal of time and money. But combining the provisioning features of
BladeSymphony and Oracle Real Application Clusters can significantly reduce the cost ofadding server resources while enabling immediate response.
Primary database
Normaloperations
4-node RAC 1-node RAC
Primarydatabase
failure
4-node RAC
Primary database failure due t o d isaster...
Maintaining data protection at lowinitial c ost by allocating minimumserver resources to the standbydatabase
Additional server resources are required if thestandby database is used to continue businessoperations. Combining BladeSymphony's andOracle's provisioning functions enablessigni ficantly simplified additional tasks andimmediate response.
+ 3 nodes
Provisioning
Standby database
Primary database Standby database
Data Guar dconfiguration
Data Guar dconfiguration 1-node RAC
Figure 5-2: Data protection with rapid application
of server resources at reduced standby cost
- 11 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
12/32
- 12 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
6. Verifying Oracle Active Data Guard
6-1 Purpose and specifics of verification testsWe performed verification testing at the Oracle GRID Center with the following three main goals:
Confirming the effectiveness of new Oracle Data Guard features
We performed verification tests to confirm the effectiveness and usability of the new Oracle
Data Guard features and to check for any important considerations when using the features. In
the verification testing, we focused mainly on the following features:
Creating a standby database using RMAN network duplicate
Benefits of creating a standby using RMAN network duplicate feature
Oracle Active Data Guard
Benefits of effectively using the standby database with Real-time Query feature of Oracle
Active Data Guard and reductions in system downtimes based on effective use of the
standby database
Snapshot Standby
Fast-start Failover
Performance and failover under large-scale high-volume transaction
We performed the verification tests to check for fast, effective failover to the standby database in
the event of a failure while the primary database was under heavy loads and with the CPU andnetwork resources at maximum capacity. Another goal was to identify any potential issues
associated with use in large-scale, high-volume transaction environments.
These represent critical performance aspects, since the primary purpose of introducing Oracle
Data Guard is to achieve switchover to the standby site in the event of a primary site failure.
Establishing best practices
We performed verification testing to establish procedures for creating a standby database and
managing an Oracle Data Guard environment.
* For a list of the procedures that proved effective in our verification tests, please refer to theseparate document titled Oracle Database 11g Release 1 Physical Standby Setting
Guide(Japanese only).
8/11/2019 sgsmmmomva
13/32
6-2 Verification environment
6-2-1 System configuration
Figure 6-1shows the configuration of the system used in our verification tests. The same public network
was used to connect client machines to the database server and to transmit REDO from the primary site to
the standby site. The network bandwidth was 1 Gbps.
Primary site
Client machines
Standby site
Database server:
Hitachi BladeSy mphony BS320
Primary sit e: 2-node RAC
Standby site: 2-node RAC
Cisco Catalyst 6504
Cisco Catalyst 3750
Storage: Hitachi
Adaptable Modular Storage
Figure 6-1: Configuration of the system used in verification tests
6-2-2 Hardware used
Database server
Model Hitachi BladeSymphony BS320 4 blades
CPU Dual-Core IntelXeonprocessor 3 GHz
2 sockets/blade
Memory 8 GB
Client machine
Model Intel White Box, 4 units
CPU Quad-Core IntelXeonprocessor 2.66 GHz
1 socket/server
Memory 4 GB
Storage
Model Hitachi Adaptable Modular Storage (AMS)
Hard disk 144 GB 28 HDD (+ 2 HDD as spare)
RAID group configuration 2D+1P 8 (for Oracle database)
- 13 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
14/32
6-2-3 Software used
Database server
OS Red Hat Enterprise Linux 4.5Oracle Oracle Database 11g Release 1 (11.1.0.6) Enterprise Edition
Oracle Real Application Clusters
Oracle Active Data Guard
Oracle Partitioning
Client machine
OS Red Hat Enterprise Linux 4 Update 3
Oracle Oracle Client 10g Release 2 (10.2)
6-2-4 About workloadsIn our verification tests, we used an online transaction processing system (OLTP) for a simulated online
Web shopping site as a workload model. SQL statements generated by JPetStore were provided as a sample
application for Spring Framework (http://www.springframework.org), an open-source J2EE framework,
were multi-executed by a custom application. The process flow is described below.
(1) User sign-on
A user ID was randomly selected and a search performed for user information.
select from account, profile, signon
where account.userid=? and signon.password = ? and ;
(2) Product search
A keyword for product search was randomly generated and a search performed for the product.Adjustments were made so that the search results totaled approximately 100 on average.
select from category where catid = ?;
select from product wherelowernamelike ?;
(3) Product selection
One item was selected from the search results (hits).
select from item, product
where i.itemid = ? and
(4) Stock quantity check
The quantity of the selected item in stock was checked.
select from inventory where itemid = ?
(5) Order placement
Order data for the specified product was issued.
insert into orders ;
insert into orderstatus ;
insert into lineitem ;
The quantity of ordered products was subtracted from the inventory quantity in the stock
management list.
Update inventory set qty=qty-1 where itemid = ?;
(6) Order finalizationcommit
- 14 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
15/32
8/11/2019 sgsmmmomva
16/32
Creating a standby database from an active database (Figure 7-2)
(1) The online primary database file was copied directly to the standby database.
Primary database
(1) Creating a
backup file
(online backup by
RMAN)
Primary site
Standby database
Backup
file
Standby site
Conventional standby databaseconstruction method
(3) Database
restored by
backup file using
RMAN.
(2) Transfer of backup file
by scp
Backup
file
Figure 7-1: Conventional standby database production method
Primary database
(1) Directly copying an online
database file
Primary site Standby site
Standby database
using RMAN network dupli cateCreating a standby database
Figure 7-2: Creating a standby database using RMAN network duplicate
Graph 7-1compares the time required to create a standby database by the conventional method
and directly from the active database. Creating a standby database from the active database does
not require the creation of a backup at the primary site and the restoration of the database at the
standby site, enabling creation of the standby database in about 1/3 the time required by the
conventional method.
- 16 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
17/32
0 3000 6000 9000 12000
Time (sec)
Conventional method
Creating a standb y database usingRMAN network duplicate
Graph 7-1: Comparison of standby data production times
(via conventional method and using RMAN network duplicate)
Graph 7-2shows the CPU usage of the primary database server and standby database server and
network transfer volumes during the creation of the standby database by the conventional
method. Approximately 30% of the CPU resources were used to create a backup file at the
primary site and to restore the database at the standby site.
Graph 7-3shows the CPU usage of the primary database server and standby database server and
network transfer volumes during the creation of the standby database from the active database.
Compared to the conventional method, creating a standby database from the active database kept
CPU usage at low levels and achieved efficient network transfer/copying of online data files.
And network transfer volumes per unit time are high, resulting in higher speeds than copying by
scp.
CPU usage of standby datab ase server
0102030405060708090
100
0
1200
2400
3600
4800
6000
7200
8400
9600
Time (sec)
CPUusage(%) Database
restoration byRMAN
CPU usage of primary d atabase server
0102030405060708090
100
0 1200 2400 3600 4800 6000 7200 8400 9600
Time (sec)
CPUusage(%)
us er s ys tem iowait
Network transfer vol ume of pri mary database s erver
010000
2000030000400005000060000
70000
8000090000
0 1200240036004800600072008400960010800
Time (sec)
Networktransfervolume
(Kbytes/s)
Receiving volume (k B/s)
Online bac kup byRMAN
Backup filetransfer by scp
Backup filetransfer by scp
Transmitting volume (kB/s)
Network transfer vol ume for standby database server
0
10000
200003000040000500006000070000
8000090000
0 1200240036004800600072008400960010800
Time (sec)
Networktransfervolume
(Kbytes/s)
Backup filereception byscp
Receiving volume (k B/s)
Transmitting volume (kB/s)us er s ys tem iowait
Graph 7-2: CPU usage and network transfer volume in creation of standby database via
conventional method (top: primary database server, bottom: standby database server)
- 17 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
18/32
CPU usage of primary d atabase server
0
20
40
60
80
100
0 600 1200 1800 2400 3000
Time (sec)
CPUusag
e(%)
user s ys tem iowait
CPU usage of secondar y database server
0
20
40
60
80
100
0
600
1200
1800
2400
3000
Time (sec)
CPUusage(%)
user s ys tem iowait
Network transfer vol ume of pri mary database s erver
01000020000300004000050000
60000
700008000090000
0 600 1200 1800 2400 3000
Time (sec)
Networktransfe
rvolume
(Kbytes/s
)
rxKB/s txKB/s
Network transfer volu me of secondary database server
0100002000030000400005000060000700008000090000
0
600
1200
1800
2400
3000
Time (sec)
Direct copying of online database file
Receiving volume (k B/s)
Transmitting volume (kB/s)
rxKB/s txKB/sReceiving volume (k B/s) Transmitting volume (kB/s)
Networktransfervolume
(Kbytes/s)
Graph 7-3: CPU usage and network transfer volume in production of standby database
using RMAN network duplicate (top: primary database server, bottom: standby database
server)
Effect on business transactions during creation of standby database using RMAN network
duplicate
To examine the effects on business transactions of creating a standby database while transactions
are being processed, we created a standby database from the active database while generating a
business transaction load on the primary database. Graph 7-4shows results for measurements ofbusiness transaction throughput, CPU usage of the primary database server, and network transfer
volumes. In this case, contention between business transaction processing and database file
transfer processing reduced business transaction throughput by approximately 20%. Transfer
volumes of nearly 80 MB/s were recorded during the transfer of the database file. Since business
transactions under ordinary operating conditions utilized approximately 20 MB/s, database file
volumes transferred to the standby site are estimated to be about 60 MB/s. Since transfer
volumes would be lower than under conditions with no load, it took longer to create a standby
database in this test case.
The effect on the business transaction performance is expected to vary depending on the process
characteristics of the transaction being processed. In actual use, we recommend that usersconsider creating a standby database in a time with low business loads to minimize effects on
business operations, as well as configuring a separate network to transfer REDO.
In high latency network environment like WAN, throughput of network duplicate might be
improved by tuning network I/O buffer size. Please refer to '14.2 Configuring I/O buffer space'
of Net Services Administrators Guide 11g Release 1(11.1).
- 18 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
19/32
0
20000
40000
60000
80000
100000
120000
0 360 720 10801440 180021602520288032403600396043204680
Time (sec)
Networktrans
fervolume
(Kbyte
s/s)
rxKB txKB/s
CPU usage of primary database server
0
20
40
60
80
100
0 36072010801440180021602520288032403600396043204680
Time (sec)
CPUusa
ge(%)
user system iowait
Transaction throughput
0 360 720 1080144018002160252028803240360039604320 4680
Time (sec)
Transactionthroughput
Effect of creation of standby database using RMANnetwor k duplicate on tr ansaction throughput was about20% in our verific ation tests.
Standby database production inprocess
Receiving volume (kB/s)
Network transfer volume of primary database server
Total transfer volume was about 8 0 MB/s. B y subtractingabout 20 MB/s used by busi ness transactions from thisfigure, the database file transfer volume is estimated to beabout 60 MB/s.
Trans mitting volume (k B/s)
Graph 7-4: Business transaction throughput, CPU usage of primary database server, and network
transfer volumes during creation of standby database using RMAN network duplicate
7-2 Effective use of standby site via Oracle Active Data Guard and
reductions in system downtime based on effective use of standbysite
Oracle Data Guard versions up to Oracle Database 10g had the following issue related to effective use of
the standby site.
Application of REDO had to be stopped when the standby site is used on a read-only basis
by physical standby features.
A periodic data synchronizing process was required to reduce downtimes caused by
primary site failure. This meant the standby site had to be set to the managed recovery
mode at regular intervals, making operations more complicated.
Logical Standby are accessible during application of REDO, but there are limitations relate
to the data type and other factors.
These restrictions meant using the standby site previously required complex procedures. Longer standby
site use times meant longer times required to recovery the database in case of failure, impairing availability
(Figure 7-3).
- 19 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
20/32
Standby site use time
(Log data application downtime)
Volume of log data required in
case of primary site failure
Proportional tosystem d owntime
caused by failure
Figure 7-3: Previous drawbacksRelationship between standby site use time
and system downtimes
Real-time Query of Oracle Active Data Guard, a new feature provided with Oracle Database 11g Release 1,
resolves these issues and enables effective use of the standby site while ensuring system availability. Thefollowing two points were verified to confirm the effectiveness of Oracle Active Data Guard.
(1) Effective use of standby site with Oracle Active Data Guard
We confirmed that the standby site could be used for read-only at all times while a physical
standby feature accessed the REDO.
(2) Reducing system downtimes during effective use of physical standby site
We confirmed the absence of any need to perform periodic synchronization due to (1), allowing
reductions in downtimes attributable to a primary site failure to a specific duration.
Effective use of standby site with Oracle Active Data GuardIn the simulated situation shown in Figure 7-4, we confirmed the behavior resulting from
applying additional loads on the standby site, like daily processing and report batch application,
while the primary site was under online transaction loads associated with online shopping
operations. Real-time Query feature of Oracle Active Data Guard enabled the transfer and
application of REDO while additional tasks were performed at the standby site.
- 20 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
21/32
REDO transfer and
application
Primary database
OLTP transaction
Standby database
SELECT/query load
Real-time Query
Date/time processing,
report batch
Online shopping
business
Addi tional operations
Figure 7-4: Effective use of standby site via Oracle Active Data Guard
Graph 7-5 compares CPU usage of the standby database server while the Real-time Query
applies a SELECT load to the standby site and CPU usage with no load applied. When no
SELECT load is applied by Real-time Query, the standby database server performs only the
REDO apply process, and CPU usage is less than 20%. Application by Real-time Query of an
additional load results in CPU resource use exceeding 90%, confirming full use of CPU
resources previously not fully utilized.
CPU usage of st andby database serv er
0
20
40
60
80
100
0
60
120
180
240
300
360
420
480
540
600
Time (sec)
CPU
usage(%)
With SELECT load
Without SELECT load
Only REDO log apply is perfor med.CPU use is lo w.
Even as REDO log data is b eingappli ed, a SELECT load was applied,resulting in effective resource use.
Graph 7-5: Effective use of CPU resources of standby site with Oracle Active Data Guard
Reduced system downtimes during effective use of physical standby site
The primary site was assumed to ran a 24-hour online shopping business as shown in Figure 7-5,
and the standby site was assumed to operate in the Read Only mode for report batch application
and daily processing in the period from nighttime to daytime.
- 21 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
22/32
6:00
Online shopping service
12:00 18:00 24:00
Primary s ite
Report batchDaily processing
Online shopping service
Failover
Generation of primary site failureduring use of s tandby s ite
Online shopping serv ice downtime
Standby site
Figure 7-5: Simulated business scenario used in verification tests
If a failure occurs in the primary site while the physical standby database is in use, failover of
the online shopping service to the standby site takes place, but application of all REDOtransferred from the primary database must also be completed. Much of transferred REDO might
be applied under the conventional method because REDO application cant be performed while
the physical standby runs. If the Real-time Query feature of Oracle Active Data Guard is used,
the REDO application is performed as needed while the standby site runs, thereby reducing
failover time. Graph 7-6 gives the results of the verification test performed based on this
assumption. The graph shows transaction throughput remained at 0 from the time of failure to
the time of regenerating loads on the new primary database after the standby database was
changed the role to the primary database to resume services. This duration is defined as the
failover time. We compared one case based on the conventional method against another based on
Oracle Active Data Guard. The failover time with Oracle Active Data Guard was greatly reduced
compared to the failover time with the conventional method. With the conventional method, the
volume of REDO not applied at the time of the failover was approximately 20 GB. Volumes of
unapplied REDO exceeding this amount will lengthen failover times accordingly.
Time
Transactionthroughput
Time
Transactionthroughput
Extended downtime foronline shoppingfunctionsUse of standby s ite
with convent ional
method
Use of standby s ite
based with Oracle
Act iv e Data Guard
Short failover time resulting fr omcontinuous application of l og dataeven during the use of the physic alstandby database
Graph 7-6: Reductions in system downtime via Oracle Active Data Guard during use of physical
standby site
- 22 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
23/32
- 23 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
7-3 Measuring REDO apply performance for standby database
The following two objectives generally need to be considered when examining system availability:
Recovery Point Objective (RPO) and Recovery Time Objective (RTO). In Oracle Data Guard, the PRO is
related to the settings made for REDO transfer from the primary database to the standby database and
transfer performance. This is because REDO not transferred to the standby database at the time of failover
are lost. REDO apply performance for the standby database affects the RTO because failover time in
Oracle Data Guard included the time required to process unapplied REDO.(*) Figure 7-6 illustrates the
general process of failover to a physical standby database.
Time untilfailure isdetected
Generationof failure
Start of failoveroperation
Completion offailover operation
Downtime from an application perspective
Failover operation of
Data Guard
Application of unappliedREDO
Rolechange
Opening ofinstance
Figure 7-6: Process of failover to physical standby database
(*) Although Oracle Data Guard can resume service immediately after a failure, without application of
unapplied REDO, we recommend processing all applicable REDO before resuming services for
maximum data security.
One way to assess the adequacy of REDO apply performance is to compare the REDO apply performance
for the standby database against the volume of REDO generated by the primary database. If the REDO
apply performance falls short of the volume of generated REDO, the difference in the most recent databetween the primary database and standby database will occur, increasing the volume of unapplied REDO.
This can extend failover times in the event of a failure.
8/11/2019 sgsmmmomva
24/32
REDO transfer
Standby database
Low apply performanceexpands the differencebetween received andapplied REDO.
After
time n
Primary database
Transferred/received REDO
Applied REDO
Primary database
REDO transfer
Standby database
Figure 7-7: Low REDO apply performance
REDO apply performance that exceeds the volume of generated REDO minimizes the volume of unapplied
REDO and reduces failover times.
REDO transfer
Standby database
Adequate applyperformance minimizesdifferences.
After
time n
Primary database
Transferred/received REDO
Applied REDO
Primary database
REDO transfer
Standby database
Figure 7-8: Adequate REDO apply performance
- 24 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
25/32
We compared the volume of REDO generated when the primary database is under large transaction loads
against the REDO apply performance of the standby database to assess REDO apply performance.
Oracle statistical information was obtained before and after load generation for the primary database and
the difference between the two values used to calculate the volume of REDO generated per second. Wemeasured REDO apply performance by applying a group of archived REDO log files totaling about 3 GB.
Oracle instances in standby were restarted before the start of measurement, and
V$RECOVERY_PROGRESS view was used to confirm the REDO apply size per second to measure the
apply performance. Since Oracle Active Data Guard was used during measurement, Oracle instances for
the standby database were read-only.
Graph 7-7 shows the results of a comparison of the volume of generated REDO against REDO apply
performance.
0 2 4 6 8 10
Ratio of amount of generati on to appl y performance
Volume of generated REDO
REDO apply performance
Graph 7-7: Comparison of volume of generated REDO against REDO apply performance
The graph indicates that the REDO apply performance far surpassed the total volume of REDO generated
by primary database instances. In Oracle Database 11 g Release 1, one instance handles REDO applications
for a standby database in an Oracle RAC configuration. Although the configuration of the disks on which
online REDO log files and archived REDO log files are located affects REDO apply performance, the
measurements indicate performance in the verification test environment is sufficient to apply the REDO
generated by multiple nodes without delays.
We then compared REDO apply performance in a case in which the physical standby database was set to
READ ONLY OPEN against performance in a case in which the physical standby database was set to
MOUNT status. The comparison sought to determine whether Oracle Active Data Guard affects REDO
apply performance. The measurement method was the same as the method previously described. We used
the following three patterns to compare measurements.
Pattern No. Standby instance 1 Standby instance 2
1 MOUNT MOUNT
2 READ ONLY OPEN MOUNT
3 READ ONLY OPEN READ ONLY OPEN
Table 7-1: Apply performance comparison patterns
- 25 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
26/32
Graph 7-8shows the results of the performance comparison (value of 1 assigned to the apply
performance for pattern 1)
0
0.2
0.4
0.6
0.8
1
1.2
1
2
3
Pattern No.
Applyperformanceratio
Apply
performance
ratio
Graph 7-8: Apply performance comparison
The apply performance was consistent whether or not the instances of the physical standby database were
in the MOUNT or READ ONLY OPEN status. This indicates Oracle Active Data Guard has no impact on
REDO apply performance.
- 26 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
27/32
- 27 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
7-4 Fast-Start Failover
The Fast-Start Failover feature automatically detects failures in the primary database and starts failover
after failure detection. In Oracle Database 10g Release 2, protection mode is set to Maximum Availability
to use the Fast-Start Failover feature. This required setting synchronous REDO transfers. Synchronous
transmission of REDO guarantees commit-level protection of update data to the primary database, but its
effects on performance, including slower response times for the primary database due to network
performance limitations, must be considered when business functions require high response performance.
In Oracle Database 11g Release, Fast-Start Failover can be used in Maximum Performance protection
mode, which enables setting for asynchronous REDO transfer, allowing correspond with greater numbers
of cases.
When asynchronous REDO transfer is set, a lag may arise between the most recent data for the primary
database and the standby database, which would result in data loss in a failover. The Fast-Start Failover
feature in Oracle Database 11g Release 1 allows the administrator to preset the allowed time lag forfailover and determines whether or not to start failover based on that value in the event of failure. In our
verification testing, we set the time lag value to 60 seconds, then halted all instances of the primary
database using the abort option to check First-Start Failover operations. Figure 7-9shows the behavior
after failure generation.
Standby databasePrimary database
(2)
Observer(1)
Figure 7-9: Fast-Start Failover operation
(1) When the primary database connection remains unavailable for a certain duration, the observer
concludes a failure has occurred. Any value can be set for the time period used to determine a
failure.
(2) The observer checks the time lag in the latest update information for the primary database and
standby database. If the value of the time lag is less than the preset value, a failover is initiated.
The value of the time lag can be checked with v$dataguard_stats view on the standby database.
In our verification testing, the time lag was 0 seconds, as shown below. Thus, a failover was
executed.
8/11/2019 sgsmmmomva
28/32
SQL> sel ect name, val ue f r om v$dataguard_st ats wher e
name=' t r anspor t l ag' ;
NAME VALUE- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
t r anspor t l ag +00 00: 00: 00
If the lag exceeds the preset threshold value, a failover will not be initiated. This is because a time lag value
greater than the threshold value means the volume of lost data is unacceptable. In this case, the Fast-Start
Failover status is shown to be TARGET OVER LAG LIMIT when checked in v$database view of the
standby database.
SQL> sel ect f s_f ai l over _st at us f r om v$dat abase;
FS_FAI LOVER_STATUS
- - - - - - - - - - - - - - - - - - - - - -TARGET OVER LAG LI MI T
As above, we confirmed that the Fast-Start Failover feature of Oracle Database 11g Release 1 was capable
of achieving automatic failover to meet the data protection requirements of each system, even with the
asynchronous REDO transfer setting set to Maximum Performance mode.
Oracle Database 11g Release 1 allows the setting of various conditions in addition to the time lag value to
allow detailed control of automatic failover behavior. These extended features should reduce the time and
work required for failover management.
- 28 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
29/32
- 29 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
7-5 Failover under high-load transaction condition
While Oracle RAC provides features to ensure business continuity in the event of local failures within
sitesfor example, single-node failures in the primary databaseOracle Data Guard helps ensure business
continuity even against site failures on a scale involving all nodes of the primary database. In our
verification testing, we simulated a number of possible failure types while generating high loads to the
primary database, executing failovers to the standby database when necessary to confirm transaction
processing continuity. Figure 7-10shows the failure cases used in the verification tests. In each of the three
Oracle Data Guard configurations (A, B, and C shown in Table 7-2), failures 1 through 5 (Table 7-3) were
simulated.
(1) Failure of all
instances of
the primary
database
Primary database
Primary site Standby site
(2) Total primary
database
server failure
(4) Failure of all
instances of
the standby
database
(3) Network communication
failure between primary
and standby databases
Standby database(5) Listener
failure of the
standby
database
Failure verification patterns
Figure 7-10: Verifying failover under high-load transaction conditions
Configuration Oracle Data Guard Protection mode Status of standby site
A Maximum Performance mode Oracle Active Data Guard
B Maximum Availability mode Oracle Active Data Guard
C Maximum Performance mode Snapshot Standby
Table 7-2: Verification configuration patterns
# Simulated failure Failure-reproducing method
1 Failure of all Oracle instances for theprimary database
Execution of srvctl stop database -o abortcommand for primary node 1
2 Failure of all primary database servers Execution of halt-n -f command for
primary node 1 and node 2
3 Network communication failure between
primary and standby databases
Network cable disconnection
4 Failure of all Oracle instances for the
standby database
Execution of srvctl stop database -o abort
command for standby node 1
5 Listener failure for the standby database Simultaneous kill of listener process for
standby node 1 and node 2
Table 7-3: Verified failure patterns
8/11/2019 sgsmmmomva
30/32
We used the following verification procedure:
(1) Began generating load to primary database.
(2) Simulated primary database failure.
(3) Stopped load generation.(4) Initiated failover to standby database.
(5) Resumed load generation.
In all configurations, the verification result showed the expected behavior (Table 7-4). We confirmed that
failover to the standby database would enable continuous processing of transactions for cases involving the
failure of all Oracle instances for the primary database and all server failure.
# Simulated failure Behavior after failure
1 Failure of all Oracle instances for the
primary database
For each configuration, we confirmed
continuous processing of transactions
following the execution of failover to the
standby database.
2 Failure of all primary database servers For each configuration, we confirmed
continuous processing of transactions
following the execution of failover to the
standby database.
3 Network communication failure between
primary and standby databases
For each configuration, we confirmed that
continuous processing of transactions was
possible using the primary database.
For configuration B, we halted transaction
processing for the duration (set to 30
seconds in the verification test) set with
the NET_TIMEOUT attribute, after which
continuous processing was possible.
4 Failure of all Oracle instances for standby
database
For each configuration, we confirmed that
continuous processing of transactions was
possible using the primary database.
For configuration B, we also confirmed
continuous processing of transactions was
possible.
5 Listener failure for standby database For each configuration, we confirmed that
continuous processing of transactions was
possible using the primary database.
Table 7-4: Verified failure patterns and verification results
The following introduces one of the characteristic behaviors exhibited by the failover operation occurring
under high-load transaction conditions.
Graph 7-9 shows transaction throughput during the all-instances failure of the primary database in
configuration A and patterns of CPU usage in the individual primary and standby servers. After failure in
(1), the failover was completed and transactions resumed in (2). Transaction throughput declined before (3)
due to contention between disk I/O resulting from standby REDO log files clearing performed by the
database server following the failover and disk I/O associated with online REDO log files generated by the
resumed transactions. The time required to clear standby REDO log files depends on total file size and disk
- 30 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
31/32
I/O performance. This behavior can be circumvented by having enough I/O bandwidth to handle normal
work load and additional I/O caused by clearing of the standby REDO log files. or by configuring online
REDO log files and standby REDO log files on separate disks to avoid disk I/O contention.
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
Transactionthroughput
CPU usage of primary
instance 1
CPU usage of primary
instance 2
CPU usage of standby
instance 1
CPU usage of standby
instance 2
Transaction
throughput
(1)
(2)
(3)
(1)Generati on of failure of allinstances for the primar ydatabase
(1) to (2)Failover to standby database
(2) to (3)Clear REDO processing
Graph 7-9: Transactions during failure of all instances for the primary database
and patterns in CPU usage for individual database servers
- 31 -Copyright 2008 Hitachi, Ltd. All Rights Reserved.
Copyright 2008 Oracle Corporation Japan. All Rights Reserved.
8/11/2019 sgsmmmomva
32/32
32
8. Summary
Verification tests at the Oracle GRID Center confirmed the effectiveness of Oracle Data Guard in Oracle
Database 11g Release 1 with a Hitachi platform. Specifically, we confirmed the capabilities of the Oracle
Active Data Guard, a new option introduced in Oracle Database 11g Release 1, to make effective use of
resources at the standby site and reduce failover times in the event of failures based on effective use of the
standby database. We believe that Oracle Database 11g Release 1 with its new feature can dramatically
improve the cost efficiency of disaster recovery systems over previous versions.
We also examined patterns resulting from failures under a large-scale transaction load environment,
confirming transaction continuity. We are confident that a disaster recovery solution based on a
combination of Hitachi hardware and Oracle Database 11g Release 1/Oracle Data Guard will provide the
support needed to ensure high levels of BCM for corporate infrastructures.
Precautions concerning use of this document
The contents of this white paper are based on the results of verification tests performed at the Oracle GRID
Center. We make no guarantees that the same results will be achieved under all conditions. Actual results
will depend on various factors, including the specific conditions under the clients environment.