+ All Categories
Home > Documents > H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath,...

H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath,...

Date post: 25-Mar-2018
Category:
Upload: vuphuc
View: 224 times
Download: 1 times
Share this document with a friend
43
White Paper EMC Solutions Group Abstract This white paper describes how EMC ® Symmetrix ® VMAX ® 40K series with Enginuity™, used with EMC PowerPath ® , EMC TimeFinder ® Snap, and EMC Solutions Enabler, can support an Oracle data warehouse environment for very large database (VLDB) and enterprise data warehouse (EDW) scenarios. The solution provides the scalability, performance, security, and ease of use required for mission-critical business demands. This solution can be scaled many magnitudes bigger in terms of users, performance, and storage volumes. May 2012 ORACLE DATA WAREHOUSE ON EMC SYMMETRIX VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Scalable query and ETL performance for very large database (VLDB) Reduced backup time and impact enabled by TimeFinder VP Snap Unisphere for VMAX enables efficient array management and provisioning
Transcript
Page 1: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

White Paper

EMC Solutions Group

Abstract

This white paper describes how EMC® Symmetrix® VMAX® 40K series with Enginuity™, used with EMC PowerPath®, EMC TimeFinder® Snap, and EMC Solutions Enabler, can support an Oracle data warehouse environment for very large database (VLDB) and enterprise data warehouse (EDW) scenarios. The solution provides the scalability, performance, security, and ease of use required for mission-critical business demands. This solution can be scaled many magnitudes bigger in terms of users, performance, and storage volumes.

May 2012

ORACLE DATA WAREHOUSE ON EMC SYMMETRIX VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

• Scalable query and ETL performance for very large database (VLDB) • Reduced backup time and impact enabled by TimeFinder VP Snap

• Unisphere for VMAX enables efficient array management and provisioning

Page 2: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

2 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Copyright © 2012 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

All trademarks used herein are the property of their respective owners.

Part Number H10571

Page 3: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

3 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Table of contents

Executive summary ............................................................................................................................. 5 Business case .................................................................................................................................. 5 Solution overview ............................................................................................................................ 5 Key results ....................................................................................................................................... 6

Introduction ....................................................................................................................................... 7 Purpose ........................................................................................................................................... 7 Scope .............................................................................................................................................. 7 Audience.......................................................................................................................................... 7 Terminology ..................................................................................................................................... 7

Key technology components ............................................................................................................... 9 Overview .......................................................................................................................................... 9 EMC Symmetrix VMAX 40K ............................................................................................................... 9 EMC Unisphere for VMAX ............................................................................................................... 10 EMC Virtual Provisioning ................................................................................................................ 10 EMC TimeFinder Virtual Provisioning Snap ..................................................................................... 10 EMC PowerPath .............................................................................................................................. 11 Oracle Database 11g R2 Enterprise Edition .................................................................................... 12

Solution architecture and design ...................................................................................................... 13 Configuration overview .................................................................................................................. 13 Solution architecture ...................................................................................................................... 14 Hardware environment ................................................................................................................... 14 Software environment .................................................................................................................... 15 Storage connectivity overview ........................................................................................................ 16 Brocade DCX 8510 Backbone ......................................................................................................... 17 Storage Virtual Provisioning design ................................................................................................ 17 Drive type ....................................................................................................................................... 17 Oracle ASM disk group configuration ............................................................................................. 18 Database and workload profile ...................................................................................................... 18

Preliminary throughput testing using Orion on a single node ............................................................ 19 Overview ........................................................................................................................................ 19 Test objective ................................................................................................................................. 19 Test scenario and methodology ..................................................................................................... 19 Test results .................................................................................................................................... 20

Page 4: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

4 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Storage provisioning on a VMAX 40K array ....................................................................................... 21 Overview ........................................................................................................................................ 21 Step 1: Create devices and storage groups..................................................................................... 21 Step 2: Label the devices to identify their purposes ....................................................................... 22 Step 3: Provision the storage to the hosts ...................................................................................... 23

Query workload test ......................................................................................................................... 24 Overview ........................................................................................................................................ 24 Test objective ................................................................................................................................. 24 Test scenarios and methodology .................................................................................................... 24

Test scenario 1 .......................................................................................................................... 24 Test scenario 2 .......................................................................................................................... 24

Test results .................................................................................................................................... 25

ETL workload test ............................................................................................................................. 31 Overview ........................................................................................................................................ 31 Test objective ................................................................................................................................. 31 Test scenarios and methodology .................................................................................................... 31 Test results .................................................................................................................................... 32

Rapid deployment of test/dev database environments ...................................................................... 35 Overview ........................................................................................................................................ 35 Test objective ................................................................................................................................. 35 Test scenario .................................................................................................................................. 35 Test procedure ............................................................................................................................... 35 Test results .................................................................................................................................... 38

Offload RMAN backup ....................................................................................................................... 40 Overview ........................................................................................................................................ 40 Test scenario and methodology ..................................................................................................... 40 Test results .................................................................................................................................... 41

Conclusion ....................................................................................................................................... 42 Summary ....................................................................................................................................... 42 Findings ......................................................................................................................................... 42

References ....................................................................................................................................... 43 White papers ................................................................................................................................. 43 Product documentation .................................................................................................................. 43 Other documentation ..................................................................................................................... 43

Page 5: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

5 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Executive summary

Today’s enterprise data warehouses (EDWs) are mission-critical resources for supporting business operations systems. Data warehouses must address an increasingly broad range of demands:

• Enterprise data is growing exponentially because businesses incorporate more detailed data from very large databases (VLDBs).

• Business intelligence (BI) tools provide deeper insight into the data, which can lead to new revenue streams. The EDW capacity must be able to scale and provide predictable performance to keep pace with these new workloads.

• Online analytical processing (OLAP) requires realtime or near-realtime analytics that provide timely insight into large or complex data sets. High volume and frequent data loads are essential to provide the OLAP analytics required for decision making in the fast-moving businesses.

• Enterprise-class protection of the data in the data warehouse has become an implicit business requirement. IT departments must employ techniques to back up larger data sets during limited backup windows, with minimum impact on system performance.

The EMC® Symmetrix® VMAX® 40K series with Enginuity™ and EMC Unisphere® for VMAX can meet the mission-critical requirements of VLDBs and EDWs with exceptional scalability, enterprise-class data integrity and security, and performance optimized for today’s environments.

This solution uses the EMC Symmetrix VMAX 40K series to support a realistic data warehouse environment that can meet enterprise demands for scalability, performance, and security:

• This solution implements a 30 TB data warehouse environment on Symmetrix VMAX 40K and demonstrates how customers can implement key processes, such as high-speed data loading and efficient backup of the database, with little or no disruption to the performance, scalability, or availability of the database system and associated applications.

• The solution describes a realworld-like lifecycle of an enterprise data warehouse by simulating the query and ETL (extract, transform, and load) workloads separately, and validating the performance scalability.

• The solution demonstrates how to offload backup of the database by leveraging EMC TimeFinder® Virtual Provisioning™ (VP) Snap. During the backup against the Virtual Provisioning Snap copy, the query workload is running on the production, and we validate the impact on the performance caused by the database backup.

Overall, this solution provides a recommended practice that can scale bandwidth and users on the data warehouse database deployed on VMAX 40K. In this paper, the configurations can be extended nearly linearly to support larger and more scalable configurations.

Business case

Solution overview

Page 6: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

6 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

The solution tests demonstrated the following key results:

• Performance and scalability

The VMAX 40K environment demonstrated consistent, optimal performance scalability in query and ETL workloads when more users and more database servers were added into the 30 TB data warehouse. VMAX 40K is a flexible platform that grows with a business’s need to provide higher performance by adding additional database nodes, HBA ports, front-end ports, and disk drives into the workload.

• Minimal impact was achieved by offloading backup from the production server to the backup host by using TimeFinder VP Snap.

When the TimeFinder VP Snap copy was created and mounted on a backup host, the database was backed up on the backup host. Comparing to when no backup was running, the average query throughput was minimally affected.

• Easy to configure and monitor

Unisphere for VMAX provides easier storage management, including logical unit number (LUN) provisioning, performance monitoring, and other management tasks.

Key results

Page 7: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

7 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Introduction

The solution shows an overall approach to data warehousing on EMC VMAX 40K, and focuses on the baseline performance and functionality needed to manage a modern data warehouse, such as query, data load, and other routing data warehouse management tasks.

Throughout this white paper, we assume that you have some familiarity with the concepts and operations related to storage and backup technologies and their use in data warehouse infrastructures.

This white paper discusses multiple EMC products as well as products from other vendors. Some general configuration and operational procedures are outlined. However, for detailed product installation information, refer to the user documentation for those products.

The scope of this paper is to describe:

• The scalability of decision support system (DSS) query and data loading performance on VMAX 40K

• New environment provisioning based on TimeFinder VP Snap

• Offloading data warehouse backup to a TimeFinder VP Snap backup host while under query workload

This solution is intended for chief information officers (CIOs), data center directors, Oracle database administrators (DBAs), technical managers, and any others involved in evaluating, acquiring, managing, operating, or designing data warehouse environments in large and medium-sized enterprises.

Table 1 lists the terminologies used in this solution paper.

Table 1. Terminology

Term Definition

Decision support system (DSS) Decision support systems are applications that use the summarized data in data warehouses and turn this into information on which users and managers can make key business decisions.

Data warehouse A data warehouse is a database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, and it can include and consolidate data from other sources. An enterprise can use a data warehouse to monitor and report on operations and to deploy analytical tools for forecasting and modeling.

Purpose

Scope

Audience

Terminology

Page 8: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

8 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Term Definition

ETL The data warehouse is required to load data regularly to facilitate business analysis. To do this, data from one or more systems needs to be extracted and copied into the data warehouse. The process of extracting data from source systems and bringing it into the data warehouse is commonly called ETL, which stands for extract, transform, and load.

Oracle ASM Oracle Automatic Storage Management (ASM) is a volume manager and a file system for Oracle database files that supports single-instance Oracle Database and Oracle Real Application Clusters (RAC) configurations.

Oracle ACFS The ASM Cluster File System (ACFS) extends ASM by providing a robust, general-purpose, extent-based, and journaling file system. ACFS provides support for files such as Oracle binaries, report files, trace files, alert logs, and other application data files.

Oracle 11g RAC database RAC is software that enables customers to use clustered hardware by running multiple instances against the same database.

SYMCLI Symmetrix Solutions Enabler command line interface.

Page 9: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

9 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Key technology components

This solution used the following key hardware and software components:

• EMC Symmetrix VMAX 40K

• EMC Unisphere for VMAX

• EMC Virtual Provisioning

• EMC TimeFinder Virtual Provisioning Snap

• EMC PowerPath®

• Oracle Database 11g R2 Enterprise Edition

The EMC Symmetrix VMAX 40K builds on EMC’s industry-leading Symmetrix storage array platform for powerful, trusted, smart storage that provides higher levels of performance, availability, and intelligence in the virtual data center.

Built on the strategy of simple, intelligent, and modular storage, VMAX 40K incorporates a highly scalable virtual matrix architecture that enables Symmetrix VMAX arrays to grow seamlessly and cost-effectively from an entry-level configuration into the world’s largest storage system. Symmetrix VMAX supports Flash drives, Fibre Channel (FC) drives, and Serial Attached SCSI (SAS) drives within a single array, as well as an extensive range of RAID types. EMC Symmetrix VMAX 40K provides the performance and mission-critical availability to address the challenges of enterprise data warehouses:

• Powerful: Symmetrix VMAX 40K delivers the highest levels of performance to demanding EDW and analytic applications. Line of business users and BI analysts can experience the fastest response and gain insights more quickly. The scalable data loading capability of Symmetrix VMAX 40K enables optimization of business processes by using more frequent data refreshes.

• Trusted: Enterprise-class protection of the data in the Symmetrix VMAX 40K data warehouse can be achieved without significant impact on performance.

• Smart: Enhanced management tools enable efficient deployment of mixed workloads on the same array.

The EMC Symmetrix VMAX 40K with Enginuity 5876 software simplifies and automates the management and protection of information:

• Federated Tiered Storage

• Fully Automated Storage Tiering for Virtual Pools (FAST VP) enhancements

• EMC RecoverPoint integration

• Unisphere for VMAX

• Dynamic back-end configuration

• EMC TimeFinder enhancements

Overview

EMC Symmetrix VMAX 40K

Page 10: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

10 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Unisphere for VMAX is an advanced GUI for managing Symmetrix VMAX arrays. Unisphere for VMAX enables you to provision, manage, and monitor any Symmetrix VMAX array from one screen to significantly reduce storage administration time.

Unisphere for VMAX uses the same GUI framework as the unified EMC VNX® family platform. For customers who use Symmetrix VMAX and VNX in the same data center, Unisphere provides a consistent look and feel that simplifies management operations.

Unisphere provides a web interface that includes the following Symmetrix Solutions Enabler CLI operations:

• Access management

• Configuration management

• Replication management

• Monitoring and alerts

In addition, Unisphere includes a performance option that collects and stores historical performance data to analyze and report on workload and resource usage trends.

EMC Virtual Provisioning simplifies storage management, improves capacity utilization, and enhances performance. Virtual Provisioning provides the separation of physical storage devices from the storage devices as perceived by the host system.

Virtual Provisioning introduces the following concepts and components:

• Thin devices (TDEVs) are devices that do not have storage allocated to them when they are created. Thin devices can be created with an inflated capacity, because data devices provide the actual storage space for data written to them. To a host operating system, thin devices look like regular devices with their configured capacity, and the host interacts with them in the same way as with regular devices.

• Data devices are special devices (not mapped to the host) that provide physical storage for thin devices. Data devices must be enabled in a virtual pool before they can be used.

• A virtual pool is a collection of data devices that provides storage capacity for the thin devices that are bound to the pool. All data devices in a given virtual pool share the same RAID protection type and are of the same drive technology.

EMC TimeFinder VP Snap is a replication technology that creates a point-in-time copy of the source device when a target device is virtually provisioned. The copied data resides on allocations in a virtual pool, but not a save pool. If more VP Snap sessions are added to the same source device, data is copied to the targets based on whether the source data is new with respect to the point in time of each copy. When data is copied to more than one target, only a single, shared copy resides in the VP pool.

EMC Unisphere for VMAX

EMC Virtual Provisioning

EMC TimeFinder Virtual Provisioning Snap

Page 11: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

11 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

For example, as shown in Figure 1, multiple VP Snap copies (SNAP1, 2…n) are created from the same source devices and are bound to the same virtual pool. When a second VP Snap (SNAP2) session is created and activated, allocation sharing begins. Write I/Os (target to source devices) that are new to multiple VP Snap copies are saved in a single set of allocations that is shared by multiple target devices.

Figure 1. VP Snap extent sharing within a virtual pool

Multiple copies of the data warehouse production environment can be created by VP Snap, and then mounted on other servers for special purposes; for example, test or development.

EMC PowerPath is host-based software that provides automated data path management and load-balancing capabilities for heterogeneous servers, networks, and storage deployed in physical and virtual environments. PowerPath provides I/O multipath functionality. With PowerPath, a server node can access the same SAN volume via multiple paths (HBA ports), which enables both load balancing across the multiple paths and transparent failover between the paths. PowerPath offers significant benefits for the high-throughput requirements of an Oracle data warehouse:

• Standardize data path management across physical and virtual environments, and grow without purchasing more infrastructure components.

• Automate multipathing policies and load balancing to provide predictable and consistent database availability.

• Improve service-level agreements by eliminating application impact from I/O failures.

EMC PowerPath

Page 12: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

12 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

This paper presents a storage solution for Oracle Database 11g R2 data warehouse environments. This solution implements many Oracle Database11g R2 features including RAC, ASM, and ACFS.

In Oracle 11g R2, Oracle ASM and Oracle Clusterware have been integrated into the Oracle Grid Infrastructure. This provides the cluster and storage services required to run Oracle RAC databases. Oracle ASM is also extended to include support for Oracle Cluster Registry (OCR) and voting files to be placed within ASM disk groups.

Oracle ACFS extends ASM functionality to act as a general-purpose cluster file system. Oracle database binaries can be placed on ACFS together with other files, such as trace and alert logs. In this solution, we use Oracle ACFS to store the comma separated value (CSV) files for data loading.

Oracle Database 11g R2 Enterprise Edition

Page 13: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

13 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Solution architecture and design

The data warehouse environment in the solution consists of a four-node Oracle 11g R2 RAC database. We use 10 gigabit Ethernet (GbE) for the cluster interconnect network and we allocate storage to Oracle ASM from virtual pools. The underlying storage is provided by a Symmetrix VMAX 40K array running Enginuity 5876.

In this solution, we populate a 30 TB data warehouse database by using a DSS-like toolkit. We run two types of workload to validate the performance scalability on the VMAX 40K array when the number of concurrent users is increased:

• Query workload for reporting is generated by multiple concurrent users against the database. We gradually increase the number of multiple concurrent users, on different numbers of database nodes, to run queries against the database.

• ETL data loading workload is simulated by multiple concurrent sessions to load data by external tables. In the ETL workload, multiple sessions are generated on different database nodes to load data simultaneously.

The configuration used in this solution provides a good performance for this 30 TB data warehouse environment, and it is the foundation for significant performance increase in a near-linear manner by adding additional database nodes, HBA and front-end ports, and disk drives to the workload.

On the VMAX 40K, EMC TimeFinder VP Snap provides the benefits of space saving by sharing allocation between different VP Snap sessions. In addition, EMC TimeFinder VP Snap enables offloading of the backup for the data warehouse onto another dedicated server. It provides minimal impact on the query workload and system throughput.

Configuration overview

Page 14: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

14 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Figure 2 depicts the architecture of the solution environment.

Figure 2. Solution architecture

Table 2 details the hardware environment for the solution.

Table 2. Hardware environment

Equipment Quantity Configuration

Production server

4 4-node Oracle RAC configuration; each node contains:

• 4 x 8-core central processing units (CPUs)

• 256 GB random access memory (RAM)

• Dual-port 1 GbE network interface cards (NICs)

• Dual-port 10 GbE Converged Network Adapters (CNAs)

• 2 x dual-port 8 Gb host bus adapters (HBAs)

Solution architecture

Hardware environment

Page 15: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

15 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Equipment Quantity Configuration

Backup host 1 One backup host configuration:

• 4 x 4-core CPUs

• 32 GB RAM

• Dual-port 1 GbE NICs

• Dual-port 8 Gb HBAs

Test/dev server 1 One test/dev server configuration:

• 4 x 4-core CPUs

• 32 GB RAM

• Dual-port 1 GbE NICs

• Dual-port 8 Gb HBAs

Storage 1 EMC Symmetrix VMAX 40K with:

• Enginuity 5876

• 4 engines, 128 GB cache each

• 368 x 600 GB 10K 3.5” FC drives

• 39 x 2 TB 7.2K 3.5” SAS drives

Storage area network (SAN) switch

2 8 Gb/s FC switches

Ethernet switch 2 10 GbE switches

Table 3 details the software environment for the solution.

Table 3. Software environment

Software Version Description

VMAX Enginuity code 5876 VMAX micro code

EMC Solutions Enabler 7.4 Host CLI storage management software

EMC PowerPath 5.6 Multipathing and load balancing software

Oracle Grid Infrastructure Enterprise Edition 11.2.0.3

Clusterware and ASM software

Oracle Database 11g R2 Enterprise Edition 11.2.0.3

Oracle database software

Red Hat Enterprise Linux 5.5 Production and backup host operating system

Microsoft Windows 2008 R2 Operating system of the storage management server

Software environment

Page 16: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

16 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

The VMAX 40K array and the servers of the four-node RAC cluster are connected as shown in Figure 3. The SAN switches are Brocade DCX 8510 Backbones that use 16 FC ports for the connection between the servers and the VMAX 40K array. The Brocade DCX 8510 Backbones provide redundant SAN infrastructure, including fabric extension. The colored scheme indicates the SAN zoning configuration on each switch.

The VMAX 40K array has four engines with sixteen 8 Gb front-end FC ports that are cabled and split across two SAN switches. Each server contains two dual HBA ports. Each server HBA port is zoned to each VMAX 40K engine. This configuration offers the highest level of protection with high performance and uses the full features of PowerPath.

In this configuration, we connect multiple HBAs to the host. Therefore, there are redundant paths to each VMAX engine. There is no single point of failure. Data availability is ensured in the event of a HBA, cable, switch, or engine failure. Because there are multiple paths per engine, this configuration benefits from PowerPath’s load-balancing feature and thus provides additional performance. With this configuration, the bandwidth can be scaled by adding additional HBA ports, front-end ports, and SAN switch ports.

Figure 3. Storage connectivity

Storage connectivity overview

Page 17: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

17 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Networks are essential to support the demands for data growth. Brocade DCX 8510 Backbones provide a powerful FC switching infrastructure, and offer a reliable, scalable, high-performance foundation for IT infrastructure. Brocade DCX 8510 Backbones are designed to increase business agility while providing non-stop access to information and reducing infrastructure and administrative costs.

The Brocade DCX 8510 offers high throughput to meet the requirements of an Oracle data warehouse:

• Enable simpler, flatter, low-latency chassis connectivity to reduce network complexity, management, and costs.

• Optimize data center connectivity over distance with integrated high-performance metro and global connectivity.

• Simplify and centralize end-to-end SAN management with comprehensive diagnostics, monitoring, and automation.

• Maximize performance for I/O- and bandwidth-intensive applications with more than seven times the performance of competitive offerings.

• Protect investments in existing SAN fabrics and automation tools while reducing operational costs and minimizing business disruption.

EMC Virtual Provisioning automatically stripes data across all data devices in a virtual pool, and balances the workload across storage devices. To ensure even striping of data, it is recommended that all data devices in a virtual pool are the same size.

Table 4 shows the RAID selections and number of spindles for each virtual pool. In this solution, Oracle data files, redo log files, temporary files, and CSV files are located on thin devices using RAID 1 protection and 360 physical spindles for the best performance and capacity. Use 7.2K SAS drives with RAID 6 protection for Flash Recovery Area (FRA) files, which are less sensitive to performance.

Table 4. Virtual pool design on VMAX 40K

Virtual Pool RAID protection

Drive type Physical spindles size

Number of active spindles

Item

10K_FC_pool RAID 1 FC 10K 600 GB 360 DATA, REDO, TEMP, CSV, ETL

SAS_pool RAID 6 (6+2)

SAS 7.2K 2 TB 32 FRA

Brocade DCX 8510 Backbone

Storage Virtual Provisioning design

Page 18: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

18 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Table 5 details the ASM disk group design based on EMC storage best practices. We used six ASM disk groups to store the relevant database files, including data files, control files, online redo log files, temporary files, and CSV files (used for ETL).

Table 5. ASM disk group design

Item LUN size (GB)

Meta members

Number of LUNs

ASM disk group name

Virtual pool

DATA 512 8 128 +DATA 10K_FC_pool

TEMP 136 1 4 +TEMP 10K_FC_pool

REDO 16 1 16 +REDO 10K_FC_pool

FRA 512 8 16 +FRA SAS_pool

CSV 512 8 16 +CSV 10K_FC_pool

ETL 512 8 12 +ETL 10k_FC_pool

In this solution, we use:

• 10K_FC_pool to store the database relevant files in the DATA, REDO, TEMP, and ETL disk groups, because there are 360 FC drives in the pool to provide high performance and capacity for data read and writes. The ACFS file system is created on the CSV disk group and is used to store the flat files for data loading.

• SAS_pool to store the archive log files in the FRA disk group.

Table 6 details the database and workload profile for the solution.

Table 6. Database and workload profile

Profile characteristic Details

Database type Data warehouse

Database size 30 TB

Oracle database 11g R2 4-node RAC on ASM

Oracle RAC (physical environment) 4 nodes

Workload profile DSS workload

Data load source External flat files on Oracle ACFS used for external tables

Network connectivity 8 Gb FC for SAN; 10 GbE for IP

We used a DSS-like toolkit to generate a data warehouse database and deliver the DSS workloads, including the query and ETL workloads required for the solution.

Oracle ASM disk group configuration

Database and workload profile

Page 19: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

19 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Preliminary throughput testing using Orion on a single node

We conducted a preliminary test to measure the pure I/O capabilities of the architecture and validate the solution configuration. To do this, we simulated I/O using an Oracle toolset called Orion. After the I/O profile of the architecture was established by using these simulation tools, real-world queries were generated through the DSS-like toolkit and the real database I/O operations, including queries and data loads, were generated by this toolkit.

Orion is an Oracle I/O Numbers Calibration Tool designed to simulate Oracle I/O workloads without creating and running an Oracle database. It can also simulate the effect of the striping performed by ASM. Orion uses the Oracle database I/O libraries and can simulate online transaction processing (OLTP) workloads (small I/Os), data warehouses (large I/Os), and mixed workloads.

Orion is useful to measure the performance capabilities of a storage system: either to uncover performance issues or to size a new database installation.

Note Orion overwrites the existing data so you should only run it against raw devices before installing any database or application.

The objective of our testing was as follows:

• Use Orion to generate random large I/O workloads to determine the throughput and to evaluate storage for the Oracle data warehouse.

• Use Orion to validate the throughput scalability of HBA ports.

• Use the test results as the baseline to validate the throughput generated by the DSS-like toolkit.

The test scenario included the following steps:

1. Enable one HBA port and disable the other ports by using PowerPath software on a single node.

2. Run Orion to generate the workload for one HBA port.

3. Measure the throughput.

4. Enable the second, third, and fourth HBA ports in turn and repeat the previous three steps after enabling each port.

Overview

Test objective

Test scenario and methodology

Page 20: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

20 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Figure 4 demonstrates the observed throughput scaling on a single node.

Figure 4. 8Gb HBA ports scalability on a single node using Orion

The results clearly show that the throughput scaled in a linear manner when adding HBAs and front-end ports to scale an environment. For example, the average throughput was 725 MB/s when the workload was running on one HBA port and one front-end port. It increased to 1,444 MB/s after enabling the second port. Linear scaling was observed when we enabled the third and the fourth ports. The average throughput was 2,892 MB/s (2.82 GB/s) when the workload was running on four 8 Gb HBA ports, which is double the average throughput achieved when the workload was running on two ports.

We can scale the Orion throughput on one node based on this fundamental configuration by introducing additional HBA ports and front-end ports.

Test results

Page 21: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

21 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Storage provisioning on a VMAX 40K array

Unisphere for VMAX, the GUI tool, provides a simple method for storage administrators to configure VMAX 40K arrays, including simplified navigation, display, and other storage management tasks.

This section demonstrates how to provision storage through a central console and outlines the main steps for Virtual Provisioning on a Symmetrix VMAX 40K array:

1. Create devices and storage groups.

2. Label the devices to identify their purpose (optional but recommended).

3. Provision the storage to the hosts.

Figure 5 shows the interface for creating devices through the Unisphere GUI. Unisphere provides an easy way to create devices based on the Oracle database requirements.

The example shown here uses the Create a storage group wizard from the Common Tasks view of Unisphere, which is visible at all times on the upper right of the Unisphere GUI. The operation in the example creates Oracle redo log devices and puts those devices in the storage group called ORACLE_DW.

Figure 5. Creating devices in the storage group ORACLE_DW

Overview

Step 1: Create devices and storage groups

Page 22: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

22 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

After the devices and groups are created, we label the thin devices to identify their purpose. Figure 6 shows how to set the volume identifier name through Unisphere. To access this menu, right-click the volume and select Set Volume Identifier.

Figure 6. Setting the volume identifier name

These identifiers are visible using the EMC inq tool when the LUNs are presented to the host. The identifiers can also make the devices easier to distinguish by having a meaningful name as well as the hexadecimal system id of the volume as shown in Figure 7.

Figure 7. EMC inq command output

Note This step is optional but it makes the configuration process easier for storage administrators.

When labeling multiple volumes, you can generate and run a script using the CLI to speed up the process if you have many devices. For example:

#symconfigure -sid xxx -cmd "set dev 0BEF:0BFE device_name='REDO_ASM';" commit

Step 2: Label the devices to identify their purposes

Page 23: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

23 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

VMAX arrays use the concept of Masking Views for presenting storage to a host. The Masking View groups three components to form the structure:

• A storage group is a grouping of devices that are presented to the host.

• A port group is a grouping of front-end ports that make devices accessible.

• An initiator group is a container of one or more host initiators that accesses the storage.

Figure 8 shows the Provision Storage wizard from Unisphere.

Figure 8. Unisphere Provision Storage wizard

The Provision Storage wizard is accessed through the Common Tasks view. In this operation, we select the hosts that are created by the Create a new host wizard on the Common Tasks view, and we use the existing storage group ORACLE_DW created in Step 1: Create devices and storage groups. The host and storage group are associated in the Unisphere Provision Storage wizard. The devices in the storage group become visible to the host.

Step 3: Provision the storage to the hosts

Page 24: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

24 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Query workload test

The DSS-like toolkit provides an Oracle data warehouse test workload to test and validate the performance of typical Oracle data warehouse workloads on the EMC VMAX 40K storage platform.

The schema in the kit has 12 tables including two fact tables: sales and returns. The remaining 10 tables act as dimension tables. The two main fact tables are range-partitioned by date and sub-partitioned by hash on their join key. Multiple concurrent users run a series of typical queries against the data. The throughput is measured during the workload.

• Benchmark results are highly dependent upon workload, specific application requirements, and system design and implementation. Relative system performance will vary as a result of these and other factors. Therefore, the solution test workloads should not be used as a substitute for a specific customer application benchmark when critical capacity planning and/or product evaluation decisions are contemplated.

• All performance data contained in this report was obtained in a rigorously controlled environment. Results obtained in other operating environments may vary significantly.

• EMC Corporation does not warrant or represent that a user can or will achieve similar performance expressed in transactions per minute.

The objective of the test is to measure the performance scalability when the concurrent users are scaling. The concurrent users are generated by the DSS-like workload toolkit with each user running similar queries.

All the test results were captured from Automatic Workload Repository (AWR) reports and Unisphere for VMAX.

Test scenario 1

Description: gradually increase the number of concurrent users on a single data warehouse node.

Each user runs a series of queries where the next query cannot start until the previous one is completed. Multiple users are running simultaneously. However, the query order changes from one user to another.

The test process was running the workload with 2, 4, 8, 16, 32, and 64 concurrent users separately.

Test scenario 2

Description: measure the performance scalability by adding nodes into the test.

When a database node is added (doubling the CPU power and HBA ports), additional front-end connectivity is added. With the addition of each new node, we tested the system again by gradually increasing the number of concurrent query users. For this

Overview

Test objective

Test scenarios and methodology

Notes

Page 25: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

25 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

test, queries are running simultaneously from all the nodes added against the database.

The test process included the following steps:

1. Run the workload with two concurrent users on each node of the two RAC nodes – that is, with four concurrent users running simultaneously.

2. Run the workload with 4, 8, 16, 32, and 64 concurrent users on each of the two RAC nodes separately – that is, with a total of 8, 16, 32, 64, and 128 concurrent users running simultaneously.

3. Repeat the previous two steps after adding the third and fourth nodes separately.

Performance statistics were captured using AWR reports and Unisphere for VMAX:

• AWR reports

Read the throughput (GB/s) data from the “physical read total bytes” row in the AWR report. For example, the highlighted statistics in Figure 9 show the throughput of 64 concurrent users running on one node.

Figure 9. Query workload test (AWR report): throughput on one node when 64 concurrent users running

• VMAX management console: Unisphere

Enable realtime storage monitoring using Unisphere, and open the monitoring window for the metric (Host MBs/sec) to get the throughput number from the storage perspective. Figure 10 shows an example.

Test results

Page 26: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

26 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Figure 10. Query workload test: Unisphere for VMAX monitoring example

Figure 11 shows the result of Test Scenario 1.

Figure 11. Query workload test: concurrent user scaling and throughput of one Oracle RAC node

Page 27: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

27 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

In Figure 11:

• The x-axis shows the number of concurrent users.

• The y-axis shows the average throughput during the workload.

Figure 11 shows that the average throughput (GB/s) increased to a certain extent with the scale of the users, because the higher workload increased storage utilization and cache sharing when additional users were running. For example, Figure 12 shows the throughput monitored from Unisphere while two, four, and eight users were running.

Figure 12. Query workload test: realtime throughput when Unisphere for 2, 4, and 8 users running

Although the throughputs at the peak time were similar for two, four, and eight users, at around 2,830 MB/s (which was the limit of one server with four 8 Gb HBA ports, as validated by the Orion test), the throughput cycling decreased when additional users were added. Scaling the number of users caused higher workload, which increased storage utilization and cache sharing to some degree. Therefore, the average throughput calculated in the AWR reports increased to some extent when additional users were running simultaneously.

The average throughput reached 2.77 GB/s when 64 concurrent users were running simultaneously, when the throughput of the HBA ports on the server reached its limit. From the performance of the HBA port scalability testing, the HBA throughput limit was verified in the Preliminary throughput testing using Orion on a single node section: the average throughput of four 8 Gb HBA ports was 2.82 GB/s.

Higher throughput can be achieved if additional HBA ports are added into the database server because the throughput was limited by the number of HBA ports in the configuration used in this solution.

2 users with average throughput: 2.25 GB/s

4 users with average throughput: 2.37 GB/s

8 users with average throughput: 2.53 GB/s

Page 28: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

28 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Figure 13 shows the Test Scenario 2 results, including the performance statistics concluded from the AWR reports.

Figure 13. Query workload test: average throughput comparison between one, two, three, and four RAC nodes

In Figure 13:

• The x-axis shows the number of concurrent users on each node. For example, for “2 users on each node” in the figure:

The red bar describes two users running on a single node.

The blue bar describes two users running on each of two RAC nodes; in total four users are running.

The purple bar describes two users running on each of three RAC nodes; in total six users are running.

The green bar describes two users running on each of four RAC nodes; in total eight users are running.

• The y-axis shows the average throughput during the workload.

The results confirmed that the average throughput scaled nearly linearly along with the scale of nodes and concurrent users. For example, when 64 concurrent users were running on a single node, the average throughput was 2.77 GB/s. The throughput almost doubled to 5.48 GB/s when the second node was added into the environment; it tripled to 8.23 GB/s when the third node was added into the workload; it almost quadrupled to 10.68 GB/s when four RAC nodes were running for the workload.

When the query workload was running, the CPU and memory resources were not bottlenecked, according to the AWR reports.

Page 29: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

29 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Figure 14 shows the operating system statistics when the throughput reached 10.68 GB/s, which was generated by 256 concurrent users running on four RAC nodes.

Figure 14. Query workload test: foreground wait classes percentage

Figure 14 shows the host CPU usage for the duration of the tests. CPU utilization across all nodes was under 30%. 95.37% of all the CPU busy time was spent on user I/O operations, and the remaining 3.76% was spent on database running.

Figure 15 describes the throughput with 64, 128, 192, and 256 concurrent users running on one, two, three, and four RAC nodes respectively. Figure 16 shows the corresponding realtime throughput received from Unisphere.

Figure 15. Query workload test: Oracle RAC nodes scaling and throughput

Page 30: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

30 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

One node (64 concurrent users)

Two nodes (128 concurrent users)

Three nodes (192 concurrent users)

Four nodes (256 concurrent users)

Figure 16. Query workload test: realtime throughput from Unisphere

The throughput increased nearly linearly with each additional node added into the test environment. The throughput of four RAC nodes reached 10.68 GB/s (37.55 TB/hour), which could be higher if additional nodes, front-end ports, and disk drives were added into the environment.

Page 31: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

31 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

ETL workload test

Modern data warehouses require frequent and large data loads periodically throughout the day. The 24x7 nature of the EDW no longer allows a long window of data loading for DBAs. Therefore, it is important to simulate the impact of ETL data load on the performance of the database.

This test scenario demonstrates the ETL data loading on the production database and records the performance data, especially the throughput (physical write total megabytes per second), during the ETL load.

We used the function of Oracle external table for the data loading. This test scenario shows the throughput scalability when data is loading from external tables from Oracle ACFS file system into the Oracle 11g R2 database.

The testing procedure started with loading the data on one of the four RAC nodes, and then scaled out to two, three, and four RAC nodes.

The objective of the test is to show the throughput scalability of the VMAX 40K, with the disk configuration used in this solution, while the session running the data loading are scaling out on four RAC nodes. Each session runs a similar ETL workload by loading CSV files into the database.

This solution test demonstrates performance scalability on the VMAX 40K by loading data from external tables. Testing includes the following test scenarios:

• The first scenario uses one RAC node and runs one session to load data from one external table. The session loads one CSV file with a size of 120 GB for ETL loading. The CSV file is located on the Oracle ACFS file system. The external table is created as follows:

create table sales_ext ( id integer, …) organization external( type oracle_loader default directory EXT_DIR access parameters (fields terminated by "|") location ('sales.csv')) parallel reject limit unlimited;

The data is loaded from the external table as follows:

alter session enable parallel dml; alter table sales parallel; alter table sales_ext parallel; insert into /*+ append */ sales select * from sales_ext;

Note The table “sales” has the same structure as the table “sales_ext”. The data is loaded directly with the “append” hint, and multiple parallel slaves are used for data loading.

Overview

Test objective

Test scenarios and methodology

Page 32: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

32 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

• The second test scenario scales data loading sessions with external tables on additional RAC nodes. Based on the performance data from the first test scenario, the second, third, and fourth nodes were added into the environment. With one load session on each node, there are four sessions on four RAC nodes to load data concurrently from four 120 GB CSV files. In this configuration, the data loading process is the same as in the first scenario, but the load data is scaled linearly by running one to four RAC nodes. This resulted in improved bulk-load functionality and enhanced overall VMAX 40K performance.

We read the throughput (MB/s) from the physical write total bytes row in the AWR report. For example, the highlighted statistics in Figure 17 show the average throughput of one session running on one node to load data from an external table.

Figure 17. ETL workload test (AWR report): average throughput on one node when one

session running to load data from an external table

As shown in Figure 17, the average throughput for data loading from the external table was 893 MB/s (936064378.09/1024/1024). Figure 18 describes the realtime throughput of the first test scenario from Oracle Enterprise Manager.

Figure 18. ETL workload test: the realtime throughput on one node from Oracle Enterprise Manager

Test results

Page 33: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

33 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Note The throughput data shown in Figure 17 and Figure 18 differ because Figure 17 shows the average statistics in the AWR report while Figure 18 shows the realtime number in Oracle Enterprise Manager.

Figure 19 shows the average throughput of the second scenario from the AWR report and Figure 20 shows the realtime throughput from Unisphere.

Figure 19. ETL test: average throughput in multiple nodes scenario

One node

(one session for data loading)

Two nodes

(two sessions for data loading)

Three nodes

(three sessions for data loading)

Four nodes

(four sessions for data loading)

Figure 20. ETL test: realtime throughput from Unisphere

Page 34: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

34 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

From the results, the average throughput increased nearly linearly when the second node was added into the workload and the number of data loading sessions doubled. For example, running one session to load data from an external table against one node measured throughput of 893 MB/s, and this increased to 1,790 MB/s when two sessions were running in two nodes with one session on each node. The throughput increased similarly when the third and fourth node was added into the environment.

Note The throughput data shown in Figure 19 and Figure 20 differ because the average statistics were used in the AWR report and the realtime number was captured from Unisphere.

Overall, the throughput of data loading on the VMAX 40K array was nearly linear scaling when more nodes and sessions were added. Higher throughput can be achieved if additional HBA ports, front-end ports, and RAC database nodes are added into the environment.

Page 35: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

35 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

Rapid deployment of test/dev database environments

We tested the ability of the Symmetrix VMAX 40K solution infrastructure to support the rapid deployment of a test or development database environment. With EMC TimeFinder VP Snap technology, customers can easily create multiple copies of the production database and repurpose those copies for testing, development, offloading RMAN backup, and other purposes.

VP Snap enables storage administrators to create a point-in-time copy of a source thin device where the virtually provisioned target devices bind to a virtual pool. Up to 16 VP Snap sessions can be taken from a single source. When all target devices are bound to the same virtual pool allocations, VP Snap sessions can be shared between copies.

Source updates that are new to all point-in-time copies are saved in a single set of allocations that is shared by all target devices, which brings snap space efficiencies to thin devices. (See EMC TimeFinder Virtual Provisioning Snap in the Key technology components section).

The objective of the test was to demonstrate the use of multiple EMC TimeFinder VP Snap copies to maximize space savings on the array by sharing virtual pool allocations, and to demonstrate the deployment of test or development environments by EMC TimeFinder VP Snap technology.

This solution test demonstrates the process of deploying test and development environments and validating the benefits of using VP Snap. It included the following test scenarios:

• Measure VMAX 40K virtual pool space utilization and compare the space usage of four VP Snap sessions to one VP Snap session.

• Mount the VP Snap replica on the test/dev server for test or development purposes.

In the first scenario, we created one VP Snap session and examined the virtual pool usage, as follows:

1. Create TimeFinder VP Snap session target devices for all of the ASM disk groups, including DATA, ETL, REDO and others, and then bind them to a new virtual pool (see Step 1: Create devices and storage groups in the Storage provisioning on a VMAX 40K array section).

2. Define a device group to associate the source devices to the target devices created in step 1, as shown in Figure 21.

Overview

Test objective

Test scenario

Test procedure

Page 36: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

36 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Figure 21. Define device group through Unisphere for VMAX

3. Create one VP Snap session with the Use VSE option, as shown in Figure 22.

Figure 22. Create TimeFinder clone session with the Use VSE option

Page 37: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

37 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

4. Activate one VP Snap session with the Use Consistent option, as shown in Figure 23.

Figure 23. Activate one VP Snap session with the Use Consistent option

Note Use the VSE option to invoke VP Snap functionality. After activating VP Snap sessions, the copied data for each track resided in a unique allocation within the virtual pool if I/O tracks were written to the source devices. The pair state was CopyOnWrite, as shown in Figure 24.

Figure 24. Detail of the pairs of VP Snap session

5. Run the ETL data loading workload on the production environment.

6. Terminate the VP Snap session to reset the pool allocations.

Similarly, we created and activated another four TimeFinder VP Snap sessions based on step 1, 2, 3, and 4 and ran the same ETL data loading as in step 5. The test results compared space usage by one VP Snap session and four VP Snap sessions.

Page 38: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

38 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

After that, we tested the second scenario, which was to mount VP Snap copies on the test/dev server for rapid deployment of test or development database environments, per the following steps:

1. Present the TimeFinder VP Snap copies to the test/dev server (see Step 3: Provision the storage to the hosts in the Storage provisioning on a VMAX 40K array section).

2. Scan and list the ASM disks:

# oracleasm scandisks Reloading disk partitions: done Cleaning any stale ASM disks... Scanning system for ASM disks... # oracleasm listdisks

3. Mount all of the ASM disk groups, for example:

# $GRID_HOME/bin/crsctl start resource ora.REDO.dg CRS-2672: Attempting to start 'ora.REDO.dg' on ' oradwvps' CRS-2676: Start of 'ora.REDO.dg' on ' oradwvps' succeeded

4. Start up the database. It can then be used for test or development purposes:

$ sqlplus / as sysdba SQL> startup

After activating one VP Snap session, the tracks that were written to the source thin device resided on allocations in the virtual pool. The allocation that was used by one VP Snap session in the virtual pool was 435.22 GB, as shown in Figure 25.

Figure 25. Virtual pool allocations used by one VP Snap session

Test results

Page 39: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

39 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

After four VP Snap sessions were created and activated from the source device, the allocations were shared. Source updates that were new to multiple VP Snap copies were saved in a single set of allocations that was shared by multiple target devices. As shown in Figure 26, the allocation that was used by four VP Snap sessions in the virtual pool was 435.25 GB.

Figure 26. Virtual pool allocations used by four VP Snap sessions

As shown in Figure 27, the allocations used by four sessions were almost the same as those used by one session. The result showed that only one single shared copy resided in the virtual pool when data was copied to four targets. Multiple test or development environments can be deployed by using VP Snap copies with efficient space saving benefit.

Figure 27. Comparison of space usage by one VP Snap and four VP Snaps

Page 40: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

40 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Offload RMAN backup

An Oracle database is a critical component in many organizations. IT managers regard data corruption as a huge threat to their Oracle database environment. Therefore, backup is very important to protect the database from data loss. On the other hand, backup is a resource intensive activity that can cause contention between the resources and the workload on the production environment, especially for the CPU and I/O resources. The challenge that backup administrators have is how to offload the backup process from a production server to a backup host, to free up the resources of the production server.

In this solution, VMAX 40K VP Snap technology addresses this challenge by enabling offloaded, non-disruptive RMAN backup. VP Snap not only offers offloaded backup, but also dramatically improves the mean time to recovery (MTTR) by reducing the time required for the restore operation for certain types of corruptions (for example, the logical errors resulting from dropping important tables by mistake). Furthermore, as the backup operation has minimal impact on the database server performance, the backup can be run more frequently. The recovery operation is also optimized since fewer archived logs need to be applied.

This solution test consisted of offloading backup of the database using the TimeFinder VP Snap while the database was under intensive query load and monitoring the impact on performance:

1. Create and activate a TimeFinder VP Snap session (with the User Consistent option) against the database while the database is online, to make sure the Oracle database is consistent. (See the detailed steps in the Rapid deployment of test/dev database environments section for how to enable a VP Snap session with the User Consistent option).

Note Oracle support the use of third-party snapshot technologies for backup, restore, and recovery operations without putting the database in backup mode. (Refer to My Oracle Support Note: ID 604683.1)

2. Mount the TimeFinder VP Snap to a dedicated backup host.

After creating the TimeFinder VP Snap replicas against the Oracle database devices, the replicas were mounted on a dedicated backup server (see the detailed steps in the Rapid deployment of test/dev database environments section).

3. Start up the database on the backup host:

$ sqlplus / as sysdba SQL> startup mount

4. Run RMAN backup on the backup host.

The RMAN backup started on the VP Snap backup host while the DSS query workload was running. The DSS query workload was generated with 256 concurrent users running simultaneously against the four-node RAC production database. (See the Query workload test section)

Overview

Test scenario and methodology

Page 41: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

41 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

5. Measure the throughput (GB/s) and compare the data to that of the query workload result.

6. Restore the data files from the VP Snap copy.

When performing the offloaded RMAN backup with EMC TimeFinder VP Snap technology, the average throughput for the workload generated by 256 concurrent users was 10.67 GB/s, compared to the baseline of 10.68 GB/s. There was minimal impact (0.1%) on the workload.

Test results

Page 42: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

42 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler

Conclusion

IT departments are seeking ways to cope with the exponential growth of data and the new business demands that are being placed on data warehouses. The growing data is being refreshed more frequently, allowing for timely alerts, detailed reporting, predictive modeling, and business optimization. As a result, more enterprises are using business intelligence analysis tools to gain competitive advantage and to create new revenue streams.

Using the methodology and test processes in this solution as a guide for the server and storage configuration, customers can satisfy new query and loading performance requirements while maintaining the availability and security of their data.

TimeFinder VP Snap technology enables fast deployment of test or development environments. The space used by multiple VP Snap replicas is reduced dramatically. By leveraging TimeFinder VP Snap technology for data warehouse backup offloading, the system throughput on the production environment is minimally affected.

The key findings of the solution include:

• In the 30 TB database, the average query throughput increased when additional RAC nodes were added. For the query workload, it increased from 2.77 GB/s to 5.48 GB/s when the database servers were scaled out from one to two, and increased to 10.68 GB/s when four RAC nodes were used.

• The average throughput of the ETL workload increased linearly along with the addition of the nodes. The throughput was 893 MB/s for one node, and it increased to 1,790 MB/s when the second node was added. The throughput increased to 3,540 MB/s by running four RAC nodes.

• This solution can be taken as a baseline or foundation that can be scaled in a flexible, predicable, and near-linear way, by adding additional HBA ports, front-end ports, and RAC nodes, to provide higher throughput based on the configuration in this solution.

• When offloading nondisruptive RMAN backup by using EMC TimeFinder VP Snap technology, the performance for the query workload on four nodes was minimally affected.

• With EMC TimeFinder VP Snap technology, multiple copies of the production database were created easily for test/dev purposes, with the added benefit of reduced space use on the array.

Summary

Findings

Page 43: H10571: ORACLE DATA WAREHOUSE ON EMC … Data Warehouse on EMC Symmetrix VMAX 40K 4 EMC PowerPath, EMC TimeFinder Snap, and EMC Solutions Enabler Storage provisioning on a VMAX …

43 Oracle Data Warehouse on EMC Symmetrix VMAX 40K EMC PowerPath, EMC TimeFinder, and EMC Solutions Enabler

References

For additional information, see the white papers listed below.

• Oracle Data Warehouse Sizing with EMC Symmetrix DMX-4 and Dell R900—Best Practices Planning

• Introduction to EMC Oracle Optimized Warehouse Reference Configurations

• Deploying EMC VNX Unified Storage Systems for Data Warehouse Applications—Best Practices for Adoption and Deployment

• Maximize operational efficiency for Oracle RAC Environments with EMC Symmetrix FAST VP (Automated Tiering)—An Architectural Overview

• EMC Symmetrix VMAX Best Practices—Technical Notes

For additional information, see the product documents listed below.

• EMC Symmetrix VMAX Virtual Provisioning Space Reclamation and Application Considerations—Applied Technology

For additional information, see the documents listed below.

• Oracle Real Application Clusters Installation Guide 11g Release 2 (11.2) for Linux and UNIX

• Oracle Real Application Clusters Administration and Deployment Guide 11g Release 2 (11.2)

• Oracle Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Linux

• Oracle Clusterware Administration and Deployment Guide 11g Release 2 (11.2)

• My Oracle Support Note: ID 604683.1

White papers

Product documentation

Other documentation


Recommended