NetApp OnCommand 5.0 Sizing Guide

Technical Report

OnCommand 5.0 Sizing Guide Formely Known as Operations,Provisioning and Protection Manager Sizing Guide

Adaikkappan Arumugam, Shiva Raja, NetApp August2011 | TR-3440

Operations Manager and Protection Manager Sizing Guide 2

TABLE OF CONTENTS

PURPOSE 3

TERMINOLOGY ................................................................................................................................ 3

INTRODUCTION ............................................................................................................................... 4

PERFORMANCE EXPECTATIONS ................................................................................................. 4

SIZING FACTORS ............................................................................................................................ 5

SIZING GUIDE METHODOLOGY: ................................................................................................... 5

SENSITIVITY ANALYSIS AND RESULTS. ...................................................................................... 7

TEST SUMMARY ............................................................................................................................ 11

6.1 DATAFABRIC MANAGER SERVER 5.0 .................................................................................................... 11

6.1.1. SETUP WITH OPERATIONS MANAGER ALONG WITH PERFORMANCE ADVISOR, PROTECTION MANAGER, AND PROVISIONING MANAGER ENABLED........................................................................ 11

1.1.2 SETUP WITH OPERATIONS MANAGER AND PERFORMANCE ADVISOR ENABLED (PROTECTION MANAGER AND PROVISIONING MANAGER DISABLED) ...................................................................... 13

1.1.3 SETUP WITH PROTECTION MANAGER ENABLED (PROVISIONING MANAGER AND OPERATIONS MANAGER ENABLED. PERFORMANCE ADVISOR DISABLED) ............................................................. 15

1.1.4 SETUP WITH SELECTIVE LICENSES ENABLED .................................................................................... 18

1.1.5 SETUP WITH CLUSTERED DATA ONTAP (DATA ONTAP 8) ................................................................. 18

1.1.6 SETUP WITH OnCommand Host Service for VMware ENVIRONMENT ................................................... 19

6.2 DATAFABRIC MANAGER SERVER 4.0 .................................................................................................... 22

6.2.1 SETUP WITH OPERATIONS MANAGER AND PERFORMANCE ADVISOR ENABLED .......................... 23

6.2.2 SETUP WITH PROTECTION MANAGER ENABLED ................................................................................ 24

DATABASE GROWTH ................................................................................................................... 26

RECOMMENDED OPTIONS .......................................................................................................... 27

CONCLUSIONS .............................................................................................................................. 29


PURPOSE

This document provides the necessary information to allow OnCommand 5.0 (formerly known as Operations Manager, Provisioning Manager, and Protection Manager) Administrators to choose the correct system for hosting the DataFabric

® Manager server.

OnCommand software provides integrated policy-based data and storage management for virtual and physical environments. It brings together multiple products—including Operations Manager, Protection Manager, Provisioning Manager, Virtual Storage Console and SnapManager for Virtual Infrastructure into a single integrated, policy-based data and storage management solution for virtual and physical environments.

The OnCommand management solution is divided into Core Package and Host Package.This sizing guide is only for OnCommand Core Package AKA DataFabric

® Manager server and not for

the OnCommand Host Package

You can manage physical storage objects on primary and secondary storage after installing OnCommand Core Package software. The Core Package software also provides basic monitoring of virtual objects if Host Package software has been installed, because the Host Package contains hypervisor plug-ins that perform virtual object discovery.

Note: The suggestions and recommendations in this report are guidelines only; many environmental factors might influence the choice. This report discusses some of those factors.

TERMINOLOGY

Operations Manager, Performance Advisor, Protection Manager, and Provisioning Manager are capabilities that are delivered through software licensing of OnCommand Core Package 5.0 to enable the capabilities there are three installable components:

DataFabric Manager Server and its repository (Sybase database) provide central data storage

and agent coordination to Operations Manager, Provisioning Manager, and Protection Manager.

This component is installed on a server (Windows® and Linux®). Hence Operations Manager,

Performance Advisor, Provisioning Manager, and Protection Manager have a common repository:

DataFabric Manager.

NetApp Host Agent is an agent that can be installed on one or more open systems hosts to

provide the DataFabric Manager server with additional monitoring capabilities for SAN and FSRM

features. Also, when it is installed on the same host as the Open Systems SnapVault® agent, it

provides a way to start and stop OSSV services

Operations Manager is a client server application that delivers comprehensive monitoring and management of NetApp storage systems from a single console with alerts, reports, performance tools, and configuration tools to keep your storage infrastructure in line with business requirements for maximum availability and efficiency.

Performance Advisor offers a rich GUI in the NetApp Management Console that delivers a single pane of glass for comprehensive performance management of NetApp storage systems and MultiStore®.

Provisioning Manager or Capability enables policy-based provisioning of NAS, SAN, or secondary storage. It also uses NetApp Management Console to provide a rich GUI, and the DataFabric Manager server to enable monitoring, alerting, and active management of the storage entities

Protection Manager enables Protection and Disaster Recovery functionality. Protection Manager revolutionizes data protection setup by automating discovery, provisioning, and reporting for data


protection operations based on Snapshot™ through NetApp Management Console. It can configure and control all Snapshot, SnapMirror®, SnapVault, and Open Systems SnapVault (OSSV) operations.

NetApp Management Console (NMC) is a Java application that can be downloaded from the Setup menu of the Operations Manager GUI. It houses the Performance Advisor, Provisioning Manager, and Protection Manager UI.

INTRODUCTION

In growing NetApp environments, it’s important to make sure that as the infrastructure grows, the OnCommand Core server aka DFM server can scale to meet the increased needs. Also, the NetApp Management Console has been introduced to extend the features from solely monitoring an environment to actively managing it as well. When more management features are used, more resources are needed to support those activities. Specific resource requirements are outlined in the installation guide for each release of OnCommand (using DataFabric Manager server). These are minimum configurations. The number of storage systems ; volumes; relationships like n SnapMirror, SnapVault, and Open Systems SnapVault; and so on needs to be considered when determining the size of the server configuration. To determine the impact of monitoring and managing all these objects, this report looks at some of the factors that affect product performance.

Starting OnCommand 5.0 the DataFabric Manager server has become a true 64 bit application which can scale along with the hardware. In this guide all of the sizing is done using 64bit DataFabric Manager server installed on a Physical Server

PERFORMANCE EXPECTATIONS

FACTORS AFFECTING DATAFABRIC MANAGER SERVER PERFORMANCE

Memory, CPU, and disk are the top factors affecting server performance.

HOW DOES DATAFABRIC MANAGER SERVER USE ITS RESOURCES?

• Memory: The DataFabric Manager server is a 64-bit application. Starting OnCommand 5.0 the server scales with the hardware (esp available RAM)

• LUN: NetApp recommends installing the database onto NetApp LUNs for better performance, to reduce the RTO to less than an hour, and to give the flexibility to create more frequent Snapshot copies of the database. Note: Storing DFM Database on NFS is not supported.

• CPU: Starting Operations Manager 4.0, the limitation on CPU cores has been removed, and it is now bundled with new Sybase license that allows DB to use unlimited number of CPU cores by default. This is an attempt to remove bottlenecks that might arise due to CPU performance.

• Disk: The DataFabric Manager server can be operated off of local RAID (striped for performance) disk (although NetApp recommends using SAN storage). In most cases, the CPU is the bottleneck before the disk. However, if the database is sluggish and the CPU is not overwhelmed, then disk I/O is the next consideration for troubleshooting. As with any database application, spindles are your friends.

• Network: Today’s networks are fast enough to carry out the tasks relegated to them by Operations Manager, Provisioning Manager, and Protection Manager. However, if your network is extremely busy and you use quality-of-service devices, then consideration (in the QoS device) of monitoring and management services is necessary.


SIZING FACTORS

• Number of storage systems monitored: This is the total number of FAS storage systems and

NearStore® appliances being monitored by the DataFabric Manager server.

• Monitored objects: These are the objects that Operations Manager constantly manages and monitors:

qtrees, volumes, aggregates, vFiler™ units, disks, quotas, and so on. All these add to the total count of

objects that Operations Manager can manage and monitor. The more the number of objects, the more

the data that needs to be collected. For example, quotas: the largest quantity we know of is ~100,000

quotas being monitored, which does complete during the one-day default interval for monitoring these

relationships. If you reach this number, you should consider lengthening the interval beyond the one-day

default. Or look at splitting into multiple servers.

• Monitoring intervals: NetApp recommends leaving all monitoring intervals at their factory defaults, at

least initially. Shortening these intervals adversely affects the server performance. Conversely,

lengthening the intervals improves server performance. However, these measures should be taken only

in extreme cases.

• SnapVault and SnapMirror relationships: Number of relationships should be considered when

monitoring from Operations Manager or configuring and managing from Protection Manager. Each

relationship has to be constantly managed and monitored for Snapshot schedules, data transfer

schedules, backup jobs, lags, and conformance.

• NetApp Host Agents: NetApp Host Agents pose little burden on the DataFabric Manager server when

used for host data monitoring purposes, such as SAN implementations. However, when they are used for

the optional FSRM data classification, a load is imposed on both the storage systems and the DataFabric

Manager server, and therefore it is recommended that FSRM jobs be run consecutively during

nonproduction hours. For this reason it is not factored into server sizing.

To see how much memory and CPU each of the DataFabric Manager services is consuming over time, refer

to the dfmwatchdog.log file in the log subdirectory of the server installation.

SIZING GUIDE METHODOLOGY:

Staring OnCommand 5.0 we have adopted sensitivity based testing approach to size the requirement across various workloads for DFM server. This sizing guide intends to provide certain basic sensitivity data for key parameters effecting the performance and scalability of OnCommand 5.0.

� Show sensitivity to one parameter at a time for key parameters affecting performance

� Show performance data for key configurations to illustrate the performance of multiple variables

− indicate the interaction of various parameters used together in varying combinations

� Extract key guidelines on how to design a UM environment to maximize performance.

Why this sensitivity based approach?

As there are multiple dimensions to performance and scalability of DFM server it is imperative that to design a DFM server to meet once growing needs we have to take into the Monitoring Interval and Object Count, Operations in DFM, how dynamic the environment is, Load on the Storage Systems,Db history and deleted objects,DFM server resources like CPU and Memory etc.


In order to measure the sensitivity analysis data , we will extract data from 2 different configurations spread across 3 different workloads ( 100 to 300 Storage Systems along with 100 to 300 datasets)

Configuration Workloads

Operations and Performance Capability

(Protection & Provisioning Capability Disabled)

100 Storage

Controllers

200 Storage

Controllers

300 Storage

Controllers

Operations, Protection, Provisioning & Performance

Capability

100 Dataset

100 Storage

Controllers

200 Dataset

200 Storage

Controllers

300 Dataset

300 Storage

Controllers

Typical workflows would be as stated below – Table 3 “Operations Manager” , Table 4 “Protection & provisioning” .. We are measuring the sensitivity against Monitored Object counts where DFM monitoring spans across 100,200 & 300 storage systems. In accordance to the suggested Best Practices recommendations, we are factoring “Operations Manager “+ “Performance Advisor” Capability as one typical configuration and in the second iterations, we have included all features of OnCommand Core Package like Operation Manger, Protection Manager OM +PA + PM + PM (which is protection & Provisioning)

Note: To emulate the actual customer setup/config, we have included PA also as part of Provisioning and Protection config. This is just to simulate the max load and in no way recommended for actual environments. Recommendations in BPG stand good.


SENSITIVITY ANALYSIS AND RESULTS.

Configuration 1: Operations & Performance capability of OnCommand Core Package

The above Graph gives the sensitivity analysis for Operations and Performance Advisor Capability of the OnCommand 5.0. In this configuration the protection and provisioning capability are disabled.

Configuration Details:

OnCommand Core Package Server Configuration :( AKA DFM server)

Operating System Red Hat Enterprise Linux Server release 5.6 (Tikanga) 2.6.18-238.el5 x86_64

Hardware Configuration

IBM Blade Server HS21 based on 2 Intel® Quad Core Xeon(R) CPUX5570 @ 2.93GHz with 64 GB RAM

DataFabric Manager Database

Hosted on NetApp FC LUN

235.687

283.287

733.224

250.122

366.899

818.902

141.009

188.854

269.342

0

100

200

300

400

500

600

700

800

900

100 200 300

Re

spo

nse

Tim

e i

n S

eco

nd

s

Number of Storage Systems

Command Line Interface OnCommand Console Operations Manager Console


Monitored Object Count Details:

Monitored Object

Description Count

Storage Systems 100 200 300

Aggregates 710 1405 1993

Volumes 6597 15214 22259

Qtrees 22939 44872 61876

LUNs 7492 14706 20231

vFiler Units 205 391 259

Disks 2686 5359 7494

User Quota 52226 105904 145495

Resource Groups 100 200 299

SnapMirror Relationship 653 653 653

SnapVault Relationship 522 954 954

Script Job 118 244 363

Interface 204 402 560

DataFabric Manager server Licenses Core

Performance Advisor Enabled

Monitor Interval Default


Configuration 2: Operations, Protection, Provisioning & Performance Capability of OnCommand

Core Package.

The above Graph gives the sensitivity analysis for Operations, Protection, Provisioning &

Performance Capability of OnCommand Core Package 5.0. In this configuration the protection and

provisioning capability are enabled.

Configuration Details:

Operating System Red Hat Enterprise Linux Server release 5.5 (Tikanga) 2.6.18-194.el5 x86_64


IBM Blade Server HS21 based on 4 Intel® Quad Core Xeon(R) CPU E5620 @ 2.40GHz with 48 GB RAM



271.199338.801

457.804214.46

364.45

623.828

467.774

800.3

1113.22

0

500

1000

1500

2000

2500

100 200 300

Re

spo

nse

Tim

e i

n S

eco

nd

s

Number of Storage Systems

Command line Interface Operations Manager Console OnCommand Console


Monitored Object Count Details:

Monitored Object

Description Count

Storage Systems 100 200 300

Dataset 100 200 300

Resource Groups 100 200 299

Aggregates 683 1357 1980

Volumes 5867 18096 19746

Disks 2687 5358 7956

LUNs 8076 15291 21866

vFiler Units 15 30 45

Qtrees 24159 49296 68862

User Quota 52226 105904 145495

DataProtection Managed SnapVault relationships 676 1904 2750

DataProtection Managed SnapMirror relationships 705 1492 1641

DataProtection Managed OSSV relationships 414 837 1237

DataFabric Manager Server Licenses Core

Protection Capability

Enabled Provisioning Capability

Performance Advisor Capability


For both graphs 1 & 2 , the response time is against 3 interfaces namely

• Command line Interface • OnCommand console • Operations Manager Console.

The X axis data representation is number of storage systems for each Workload listed in the Table above. The Y axis data representation is the sum of the response time for all the above said typical workflows in Tables

Inference:

As we notice, that beyond 200 nodes, the response times are exponential and do not fall in linear range. Hence, we infer that at 200 nodes, all of the 3 interfaces behave optimally and would be the sweet spot.


TEST SUMMARY

NOTE: The below tabulated results are aligned with the DFM 4.0 sizing practices. OnCommand 5.0 results follow the same the practice.

6.1 DATAFABRIC MANAGER SERVER 5.0

Extensive testing was performed in order to determine the optimum point to which a DataFabric Manager server could be loaded. The scope was to:

• Validate performance of Operations Manager along with Performance, Protection Manager, and Provisioning Manager enabled on a single DataFabric Manager Server.

• Validate performance of Operations Manager along with Performance Advisor in scaled setup enabled on a single DataFabric Manager Server (without Protection Manager and Provisioning Manager enabled).

These tests provide a basic idea on the right number of storage systems or total number of objects that Operations Manager can handle with optimum performance without any failures. Also, a series of tests were performed and response times were gathered for the UI screens. An ample set of samples was considered to arrive at the averaged response times.

The following two sections document the test results of the above-mentioned validation exercise.

6.1.1. SETUP WITH OPERATIONS MANAGER ALONG WITH PERFORMANCE ADVISOR, PROTECTION MANAGER, AND PROVISIONING MANAGER ENABLED

One of the key requirements for a customer is to be able to install the entire DataFabric Manager stack on a single server. But, the concern or the question is will the server be able to handle or scale when all licenses are turned on: Operations Manager, Performance Advisor, Protection Manager, and Provisioning Manager. To help find an answer as to what is the maximum with which a DataFabric Manager server can be loaded, a series of tests was conducted to determine the optimum point. The following tables summarize the setup and the maximum object counts that a DataFabric Manager server can handle or scale.

Table 1) DataFabric Manager Server Setup

Operating System RHEL AS release 4 (Nahant Update 8) 2.6.9-89.ELsmp x86_64


IBM Blade Server HS21 based on 2 Intel® Dual Core 2.83 GHz CPU (Xeon, 5160) and 16GB



Table 2) Managed Setup

Objects Count Remarks

Storage Systems 41 Monitor 41 Data ONTAP® systems

Aggregates 150 ~ 4 aggregates per node

Volumes 1,500

Qtrees 6,000

LUNs 2,200

vFiler Units 82 2 vFiler units per node

Disks 1000

Operations Manager


Host Agents 10 OSSV/SRM Hosts

Resource Groups 127

Alarms Configured for all Critical, Worse and Emergency events

SRM File Walk 1.5 million files On 10 SRM Paths

Performance Advisor Enabled


DataFabric Manager Global Options Default

Events ~288,164

Data Management

Total Data sets 120

SAN Data set (Mirror Policy) 16 Mirror transfers once every 2hours between 09:00 to 20:00

NAS Data set (Mirror Policy) 16 Mirror transfers once every 2hours between 08:00 to 19:00

SAN Data set (Backup Policy) 36 Backup once daily between 20:00 and 02:00

NAS Data set (Backup Policy) 36 Backup once daily between 20:00 and 02:00

Open System Data set 16 8 OSSV Hosts. 150 Directories/Host Backup once daily between 03:00 and 04:00

DP managed SnapVault relationships 1,600 Based on planned Backup Data sets

DP managed SnapMirror relationships 200 Based on planned Mirror Data sets

OSSV relationships 1,200 Based on planned OSSV Datasets

General

DataFabric Manager Server Licenses

Operations Manager

Protection Manager

Provisioning Manager

Disaster Recovery

FSRM

Clients 4 2 NMC and 2 Web UI will be continuously accessing the DataFabric Manager Server

The above-mentioned object count is per DataFabric Manager server instance. If any of the above-mentioned objects are more in your currently managed environment, add additional DataFabric Manager server instances. For example, if you have more than 41 storage systems with object counts more than shown in the above table, you will need another DataFabric Manager server instance to manage the same. For example, if you have 120 storage systems with high object counts compared to above table, install 3 DataFabric Manager server instances and configure each one to manage 41 storage systems, each with object counts approximately equal to or less than what is shown as manageable above. The same holds true for the number of data


sets and relationships too. If you have more than 120 data sets, then add another DataFabric Manager server instance to manage the additional data sets and relationships.

The flip side to this discussion would be: what if my data sets/relationships are much less but have more storage systems? For example, if you have 800 relationships with 200 data sets and 50 storage systems, you should be able to manage with a single DataFabric Manager server instance.

PERFORMANCE CHECK RESULTS

The below table captures the response time for the frequently used Operations Manager, Provisioning Manager, and Protection Manager workflows using GUI.

Table 3) Operations Manager

Report Names Response Time (Sec)

Summary 5.59

Appliance 1.28

Events 1.64

File System 1.47

Aggregate 1.20

Volumes 1.14

Qtree 1.11

Table 4) Provisioning Manager and Protection Manager


Loading of Provisioning Dashboards page 2.13

Load Protection Manager Dashboards page 2.74

Loading of Data sets Overview page 3.74

Measure the time taken to load Protection Policies 1.70

Measure the time taken to load Resource pool page 2.19

Loading of Hosts page 2.15

Loading of vFiler page 2.14

Measure the time taken to load OSSV Hosts page. Note the number of OSSV hosts 2.20

Loading the members of a Data set 1.01

Loading of Events page with at least 1,000 provisioning events 12.55

Loading of Jobs page with at least 200 provisioning jobs 11.76

Measure time required to generate the Operations Manager canned reports below. Mirror Data Transfer Weekly Report. 1.36

Measure time required to generate the Operations Manager canned reports below. Backup Data. 1.34

6.1.2 SETUP WITH OPERATIONS MANAGER AND PERFORMANCE ADVISOR ENABLED (PROTECTION MANAGER AND PROVISIONING MANAGER DISABLED)

This setup was used to determine the optimum point to which a DataFabric Manager can be used with Operations Manager and Performance Advisor. Several tests were performed to arrive at the below-mentioned numbers.


Operating System Windows Server 2008 R2 Enterprise x64 base

Hardware Configuration IBM Blade Server HS21 based on 2 Intel Dual Core 2.4 GHz CPU (Xeon, 5160) and 48GB


DataFabric Manager Database Hosted on NetApp FC LUN



Storage Systems 300 250 Data ONTAP Systems (versions 7.2.5.1 or 7.3.1)

Aggregates 1934 7 aggregates per node

Volumes 17465 ~ 35 volumes per node

Qtrees 68310 ~ 200 qtrees per node

LUNs 22067 ~ 40 LUNs per node


Disks 7704 56 disks per node

Host Users 15867

Snapshot copies 120540 ~7 Snapshot copies per volume

UserQuota 112729

SnapVault Rels 3580

SnapMirror Rels 948

Host Agents 7

Resource Groups 920

Alarms 401

SRM File Walk 4.23 million files On 19 SRM paths

Performance Thresholds 12

lun:avg_latency

volume:san_latency

system:cpu_busy

processor:processor_busy

system:avg_processor_busy

nfsv3:nfsv3_read_latency

nfsv3:nfsv3_write_latency

nfsv3:nfsv3_ops

system:load_total_mbps

volume:avg_latency

volume:write_latency

volume:read_latency


Core

FSRM

All Global options set to default

perfDataExportEnabled = Yes

Performance Advisor Enabled Performance data and QTree monitoring enabled on all Data ONTAP 7-Mode storage systems

Events ~ 426031


The above-mentioned object count is per DataFabric Manager server instance. If you have more than 300 storage systems, you need to install another DataFabric Manager server instance to manage the same. For example, if you have 400 storage systems, then install 2 DataFabric Manager server instances and configure each one to manage around 200 storage systems each. Now let’s consider a different case where the number of storage systems is 300, but other object counts like aggregates, volumes, disks, and so on are very less compared to the above table. In such cases, one instance of DataFabric Manager should be able to scale. To summarize, if total object counts are less, then one server should scale; if more than what is tested as per above table, add another instance of DataFabric Manager to share the load.


Along with testing how much Operations Manager or Performance Advisor can scale, certain response times for the Web UI as well as NMC were recorded. These are reports when navigated from the Global Summary page. The response times are as shown below.

Table 7) Performance results of operations manger console


Summary 104.875

Appliance 60.04

Events 59.298

File System 60.17

Aggregate 60.154

Volumes 55.495

Qtree 63.208

Performance Advisor

Dashboard page 19

Hierarchy with logical objects 8.48

Hierarchy with physical objects 8.89

6.1.3 SETUP WITH PROTECTION MANAGER ENABLED (PROVISIONING MANAGER AND OPERATIONS MANAGER ENABLED. PERFORMANCE ADVISOR DISABLED)

The main focus of these tests was to measure the ability of Protection Manager to drive backups. With that in mind, the test script used the following steps.

• Installed DataFabric Manager and added primary and secondary storage systems. Initially we added only a handful of storage systems in DataFabric Manager so we could focus on the performance of Protection Manager itself.

• Created primary volumes and qtrees on the specified primary aggregates. The number of volumes and qtrees depended on whether medium or large load configuration was being tested.

• Selected one of the data protection policies for testing and created data sets with that policy.

• Created a single resource pool and added all the secondary aggregates to it. Attached the resource pool to secondary nodes of all the data sets.

• Added primary volumes to data sets. Volumes were equally distributed among the data sets. The data sets were in suspended mode.

• Started conforming five data sets at a time. This is done by first resuming five data sets and then resuming more as any of the data sets reach conformance.

• Waited for all the data sets to conform.


• Populated primaries with data using the Data ONTAP mkfile command. The amount of data depended on the whether a medium or large load was being tested.

• Started backups for all the data sets simultaneously by using the Protection Manager scheduler.

• Waited for the backup jobs to finish and measured data throughput of backups

The test configurations used was to test the scalability of Protection Manager alone. The primary use of this server is for Protection Manager only. The server also has Operations Manager and Provisioning Manager licenses enabled, but toned down for discovery and monitoring purposes only.

CONFIGURATION 1

Table 8) DataFabric Manager Server setup


Number of Data Sets

Number of Primary Storage Systems

Number of Secondary Storage Systems

Number of Primary Volumes

Number of Primary Qtrees

Protocol

Load 600 4 4 1,200 2400 Volume SnapMirror

Load 600 4 4 1200 2400 SnapVault

Load 600 4 4 1200 2400 Qtree SnapMirror

The above configuration was tested with the following policies:

• Mirror Policy

• Backup policy with qtree SnapMirror

• Backup policy with SnapVault

CONFIGURATION 2: STRESS TEST

Table 10) DataFabric Manager Server Hardware Configuration

Operating System Redhat Linux ES 5

Hardware Configuration 2 Intel quad Core Xeon 5440 at 2.83 GHz CPU, with 16GB RAM


Operating System Redhat Linux AS 4

Hardware Configuration 2 Intel quad Core Xeon 5440 at 2.83 GHz CPU with 16GB RAM




Number of Data Sets





Protocol

Load 500 4 4 5,000 ---- SnapVault

The above configuration was tested with the following policy:

• Backup Policy

Table 12) NMC (NetApp management console) hardware and software configuration

Operating System Windows XP SP 2

Hardware Configuration Pentium® 4 3.2Ghz, Pentium 4 3.2Ghz

CONFIGURATION

- Events table refresh was testing with 25,000 rows in the events table. - Compound data source load test was tested the time of the first set of data returned from the server of loading 1,274 resource members. - Volume list test was testing the start JAPI of volume listing against 497 volumes. - Load backups were loading 78 backups. - Load files were loading 4 files within one volume. - Data set loading was testing the JAPI of data set listing 103 data sets.

PROTECTION MANAGER TEST RESULTS

Protection Manager 4.0 has proven to be far more scalable and responsive than its predecessor (3.8). We have increased (compared to the previous test-bed - only 400 data sets with 4,000 relationships) the number of data sets (600) and relationships (5,000) without compromising the DataFabric Manager server performance. Table 13) Protection Manager Test Results

DataFabric Manager

Protection Manager 5.0 Conformance

Data Sets in HH:MM

Data Set per Minute

VSM 600 in 04:00 0/40

QSM 600 in 07:30 0.75

SnapVault 600 in 08:14 0.82

GUI TEST RESULTS (NMC)

The performance of NMC client when compared to its predecessor is more or less the same. As mentioned before, the performance of NMC was improved significantly in 3.8, and it is still maintained in 4.0. Below table records the time taken in seconds for commonly used screens.

Table 14) NMC Test results

Report Name Protection Manager 4.0


(Seconds)

Events table refresh 1

Compound Data-source load 0.6

Volume List (API) 1

Enable Restore Button 1.8

Load Backups 1.7

Load Files 0.5

Data Set Load 1

6.1.4 SETUP WITH SELECTIVE LICENSES ENABLED

This section attempts to provide an idea of how to size environments that do not match the above provided configurations and managed setups. There can be two such scenarios:

1. Object counts less than tested: All environments having object counts less than the managed setup table that is provided in sections 6.2.1 and 6.2.2 can easily use a single instance of DataFabric Manager to manage their environments smoothly. For example, an environment with around 2,000 volumes, 1,000 aggregates, and 9,000 qtrees, which are much less than tested setup, a single server with Operations Manager and Performance Advisor can be used to manage the entire environment though there are 300 storage systems.

2. Object counts more than tested: These are environments with object counts beyond the counts provided in sections 6.2.1 and 6.2.2. As an example, let’s consider an environment with 80 storage systems and the need is to enable only Operations Manager, Performance Advisor, and Protection Manager. For Protection Manager consider around 3,000 relationships in total. This kind of an environment will need two instances of DataFabric Manager server installed as shown below:

i. First instance can have only Operations Manager and Performance Advisor enabled: this server can be used only for the purpose of alerting, monitoring, reporting, and performance management.

ii. Second instance can then be primarily made available for Protection Manager and Provisioning Manager: this server can be made exclusively available for data protection. Make sure that Operations Manager is configured only to discover/monitor storage systems: no active management. Disable Performance Advisor completely for better performance (refer the recommended options section).

If observed, what actually matters are the object counts that need to be considered for any sizing. All objects managed by Operations Manager and data sets/relationship counts managed by Protection Manager.

6.1.5 SETUP WITH CLUSTERED DATA ONTAP (DATA ONTAP 8)

Operations Manager 4.0 is the first enterprise-class tool that is available for managing Data ONTAP 8: both 7-Mode and Cluster-Mode. Below tables give what hardware/OS was used and how many objects Operations Manager was able to manage. This setup was tested with all licenses disabled other than Operations Manager.

Please check Data ONTAP 8.0 documentation for scalability of 8.0 storage systems. At the time of this release, we recommend to use up to 4 Data ONTAP 8.0 cluster systems in your environment.


Operating System Microsoft Windows 2003 Enterprise Edition SP2 (Build 3790) x64 base




Objects Count

Controllers 4 (FAS3070s)

Vservers 42

Aggregates 18

Volumes 1,752

Ports 16

Logical Interfaces 49

Snapshot copies 17,350

6.1.6 SETUP WITH OnCommand Host Service for VMware ENVIRONMENT

The DataFabric Manager server provides monitoring as well as manages backup and recovery for VMware environment. This configuration was used to determine the optimum point to which a DataFabric Manager can be used with OnCommand Host Service for VMware Environment for monitoring as well as manage backup and recovery.

Several tests were performed to arrive at the below-mentioned numbers.


Operating System RHEL AS release 4 (Nahant Update 8) 2.6.9-89.ELsmp x86_64


IBM Blade Server HS21 based on 2 Intel® Dual Core 2.83 GHz CPU (Xeon, 5160) and 16GB



Table 1) OnCommand Host Service for VMware & vCenter Configuration

Operating System Microsoft Windows Server 2008 R2 Enterprise Build 7600 x64-based PC


IBM Blade Server HS21 based on 2 Intel Dual Core 2.4 GHz CPU (Xeon, 5160) and 48GB

vCenter Version 4.1



VMware

ESX Server 24

Virtual Machines 1005 ~40 Virtual Machine per ESX



Datastores 91

Operations Manager

Storage Systems 17 NetApp Controllers.

Host Service 1

Primary Storage Systems 4 Configured for all Critical, Worse and Emergency events

Secondary Storage Systems 1 On 10 SRM Paths

Resource Group 242

Storage Service 37

Aggregate 41

Qtree 7060

Disks 431

Volumes 1642

Performance Advisor Disabled


DataFabric Manager Global Options Default

Data Management

Total Data sets 203

Physical Dataset (Mirror Policy) 52 Mirror transfers once every 2hours between 09:00 to 20:00

Virtual Data set (Backup Policy) 151 Backup once daily between 20:00 and 02:00

DP managed SnapVault relationships 1018 Based on planned Backup Data sets

DP managed SnapMirror relationships 0 Based on planned Mirror Data sets

General


Operations Manager

Protection Manager

Provisioning Manager

Clients 4 2 NMC and 2 Web UI will be continuously accessing the DataFabric Manager Server

In a VMware environment, there are two separate GUIs: the OnCommand console, from which you can view physical-to-virtual relationships as well as manage backup and recovery for VMware environments, and the OnCommand Plug-in for VMware, from which you can manage virtual environments.


Along with testing how much DataFabric Manager server can scale in a VMware environment, certain

response times for the OnCommand Console, OnCommand Plug-in for VMware, CLI, discovery, backup and restore were recorded. The response times are as shown below.


Table 7) Performance results of OnCommand Console

Tab Name

Response Times (Seconds)

Firefox 3.6 Internet Explorer 8.0

Dashboard 29.4 29

Datasets 10.9 8.2

Jobs 9.3 9.2

Events 7 6.3

Policies 2.6 1.6

Reports 1.8 1.3

Events 9 9.8

Restore operation pop up window Time 2.1 2.4

Alarms 3.6 2.5

Groups 13.1 14.4

Backup 30.2 31.7

Virtual Centers Inventory 4.6 5.3

Datacenters Inventory 5.9 5.3

ESX servers Inventory 1.4 2.6

VMware VMs Inventory 15.3 10.6

Datastores Inventory 16.5 13.6

Table 7) Performance results of OnCommand Plug-in for VMware.

Page Name Response Time in Seconds

Datasets 17.9

Policies 9.9

Unprotected resource 10

Restore 305

Jobs 16 Table 16) Performance results of CLI Executions

CLI Listing Response Time(Seconds)

Backups 16.6

Virtual Machine 5

Virtual Disk 4

Datastores 0.8

Hypervisors 0.5

Host Service 2.2

Virtual Center 0.5

Datacenters 0.52

Power shell backup list 127


Table 16) Performance results of Discovery, Backup & Restore

Discovery, Backup & Restore

Time in HH:MM:SS

Remarks VMware Snapshot

Enabled Disabled

Discover HS 00:02:45

Restore From

Datastore/V

M Size

Local backup of a nfs datastores with 5 VMs each 00:37:00 25GB

Local backup of a vmfs datastores with 20 VMs each 01:11:00 75GB

Remote backup of a nfs datastores with 5 VMs each 00:24:00 25GB

Remote backup of a vmfs datastores with 20 VMs each 05:18:00 7GB

Remote backup restore 1 VM residing on nfs datastore 00:12:00

Remote backup restore 1 VM residing on vmfs

datastore 00:15:00

Local backup of

Dataset with 2 nfs datastores with 20 VMs each 00:30:10 00:10:00

Dataset with 1 nfs datastore with 100 VMs 00:46:14 00:14:00

Dataset with 10 nfs datastore with 5 VMs each 00:45:10 00:10:00

Dataset with 1 VM residing on nfs datastore 00:10:00 00:05:00

Dataset with 1 nfs datastore having 80 VMs 00:38:00 00:22:00

Dataset with 1 nfs datastore having 320 VMs 06:45:00 00:27:00

Dataset with 2 vmfs datastores with 40 VMs each 00:16:00 00:09:00

Dataset with 10 vmfs datastores with 5 VMs each 00:18:00 00:10:00

Dataset with 1 VM residing on vmfs datastore 00:11:00 00:08:00

Remote Backup of Data Transfer

Dataset with 2 nfs datastores with 20 VMs each 01:15:00 38GB

Dataset with 1 nfs datastore with 100 VMs 01:30:00 3GB

Dataset with 10 nfs datastore with 5 VMs each 02:01:00 39GB

Dataset with 1 VM residing on nfs datastore 03:45:00 8GB

Dataset with 1 nfs datastore having 80 VMs 10:07:00 545GB

Dataset with 1 nfs datastore having 320 VMs 12:02:00 333GB

Dataset with 2 vmfs datastores with 20 VMs each 03:37:00 39GB

Dataset with 10 vmfs datastores with 5 VMs each 03:21:00 21GB

Dataset with 1 VM residing on vmfs datastore 05:20:00 19GB

6.2 DATAFABRIC MANAGER SERVER 4.0

As per the scalability results of DataFabric Manager 4.0, there were no issues with respect to the response time of GUI in Operations Manager. But, the response times were quite longer in OnCommand 5.0 in Windows but better in case of Linux..


6.2.1 SETUP WITH OPERATIONS MANAGER AND PERFORMANCE ADVISOR ENABLED


Operating System Microsoft® Windows 2003 Enterprise Edition SP2 (Build 3790) x64 base





Storage Systems 250 250 Data ONTAP Systems (versions 7.2.5.1 or 7.3.1)

Aggregates 1,749 7 aggregates per node

Volumes 9,490 ~ 35 volumes per node

Qtrees 50,400 ~ 200 qtrees per node

LUNs 9,600 ~ 40 LUNs per node


Disks 14,000 56 disks per node

Host Users 2,000

Snapshot copies 47,400 5 Snapshot copies per volume

Host Agents 20

Resource Groups 250

Alarms 20

SRM File Walk 15 million files On 10 SRM paths

Performance Thresholds 12

lun:avg_latency

volume:san_latency

system:cpu_busy

processor:processor_busy

system:avg_processor_busy

nfsv3:nfsv3_read_latency

nfsv3:nfsv3_write_latency

nfsv3:nfsv3_ops

system:load_total_mbps

volume:avg_latency

volume:write_latency

volume:read_latency


Core

FSRM

All Global options set to default

perfDataExportEnabled = Yes

Performance Advisor Enabled Performance data and QTree monitoring enabled on all Data


ONTAP 7-Mode storage systems

Events ~ 580,800


The response time of Operations Manager reports should be less than 15 seconds.

OPERATIONS MANAGER

The response time of Operations Manager Web UI in DataFabric Manager Server 3.8 is slower (but negligible) when compared to DataFabric Manager 3.7.

Table 19) Response time

Table 7) Performance results


Summary 7.58

Appliance 1.15

Events 11.47

File System 2.73

Aggregate 1.47

Volumes 1.66

Qtree 3.2

Performance Advisor

Dashboard page 19

Hierarchy with logical objects 8.48

Hierarchy with physical objects 8.89

6.2.2 SETUP WITH PROTECTION MANAGER ENABLED

The main focus of these tests was to measure the ability of Protection Manager to drive backups. With that in mind, the test script used the steps as described in Section 6.1.3.

CONFIGURATION 1




2 Intel quad Core Xeon 5440 at 2.83 GHz CPU with 16GB RAM


DataFabric Manager DB on NetApp FC LUN


Number of Data Sets





Data Transfer Size

Load 400 4 1 or 2 800 4,000 1200GB

The above configuration was tested with the following policies:

• Backup policy with SnapVault


• Backup policy with qtree SnapMirror

• Mirror policy

CONFIGURATION 2




1 Intel quad Core Xeon 5440 at 2.83 GHz with 16GB of RAM


DataFabric Manager DB on NetApp FC LUN

Table 23) Managed setup

Number of Data Sets



Number of OSSV Relationships

Data Transfer Size

Load 50 400 to 1,000 1 400 to 2,000 2MB to 20MB

The above configuration was tested with the following policy:

• “Remote backups only” policy for Open Systems SnapVault

NETAPP MANAGEMENT CONSOLE (NMC) HARDWARE AND SOFTWARE CONFIGURATION

HARDWARE CONFIGURATION

• Virtual Machine

• Xeon 3.20GHz

• 1GB RAM

OPERATING SYSTEM

Windows XP SP 2

PROTECTION MANAGER TEST RESULTS

Protection Manager 4.0 has proven to be far more scalable and responsive than its predecessor (3.8). We have doubled (compared to the previous test-bed - only 200 data sets) the number of data sets (400) and relationships (4,000) without sacrificing the DataFabric Manager server performance.

GUI TEST RESULTS (NMC)

• The performance on the Events page has significantly improved. It only takes 2 – 3 seconds to populate 6,000 plus events.

• The performance of most of the overview pages in NMC have improved. Previously, when a new event came in, the entire table of data would need to refresh; now only the table row associated with the event is refreshed. This significantly improves performance and memory usage, especially on virtual machines.

• Performance improved:

- The time taken to enable the restore button from the data set overview page

- Listing the backups in the restore wizard

- Listing the files contained in a backup in the restore wizard. It now only takes a couple of seconds to populate 200 plus files.


DATABASE GROWTH

Space in the DataFabric Manager database is dominated by history tables. A rough breakdown is that 75% of the file is history tables and about 25% is free pages and overhead. The remaining tables, including qtree and quota histories, tend to only use a small percentage of the file.

From a regression analysis of history table contents, the following formula estimates the space consumed as a function of the number of hosts, aggregates, and volumes:

(# hosts * 2.25MB) + (# vols * 98KB) – (# aggrs * 270KB) = .bytes/table

For example, suppose that a customer has 250 hosts, 1,587 volumes, and 1,587 aggregates. The customer expects that the number of hosts will grow to 350. To accommodate this growth, the customer should plan for 2,200 volumes and 2,200 aggregates:

(350 * 2.25MB) + (2200 * 98KB) – (2200 * 270KB) = .409Mbytes/table

DataFabric Manager keeps two each of daily, weekly, monthly, and quarterly tables, plus an open-ended number of yearly tables. At the end of the year, there are nine filled history tables, and DataFabric Manager adds a new yearly table each year. If we add the 25% overhead (free pages and internal bookkeeping), each table uses about 509MB. Nine tables mean 4.58GB, plus a new 509MB yearly table each year.

The bottom line is that after a year, the DataFabric Manager database will be about 4.6GB, and it will grow by about 509MB/year.

If Performance Manager is used, additional space is consumed by the DataFabric Manager server. This space is not directly part of the DataFabric Manager database but is stored in a special directory.

NetApp recommends storing the database on a NetApp LUN and creating more frequent Snapshot copies, so that RTO is reduced to less than one hour. As the database grows with the use of Protection Manager and Provisioning Manager, the volumes can be easily resized, or thin provisioning can be used.

PERFORMANCE ADVISOR DATA REPOSITORY GROWTH

From the moment the DataFabric Manager server is started and a storage system is enabled for performance monitoring, it begins to collect data for the predefined views configured in Performance Advisor, such as the views used in its dashboard. This data collection for Performance Advisor is stored on the DataFabric Manager server but not in the database. Performance Advisor keeps a separate accounting of the data it collects.

For each counter that is part of a view, one set of data is collected. This collection of data can range anywhere from one-minute intervals (the default is one minute) down to real-time data collection. Each instance of Performance Advisor opened on desktop queries the data for a view from the same “view repository” as the others. This scheme reduces the overall amount of storage and resources necessary. For some views, the data is recycled every seven days. For other views, such as CPUs, the data is kept for a year and then recycled. For real-time views, the data is recycled each time the real-time view is disabled.

Amount of disk space needed per monitored system for Performance Advisor: From the scaled configuration from section 6.3, the object count per host is derived as follows: Disks per controller: 51 Aggregates per controller: 4 Volumes per controller: 33 LUNs per controller: 31 vFiler units per controller: 2 The following object counts are assumed: Number of network interfaces per controller: 2


Number of FCP targets per controller: 1 Number of processors per controller: 2 With the above setup, the amount of space required to monitor one Data ONTAP 7.2.3 controller is 529MB per year. For monitoring N hosts the space requirements would be (529 x N) MB of disk space. Other assumptions:

• There are no custom views.

• Qtree basic counter group is disabled.

• DataFabric Manager version is 3.7.

• Percentage of volumes with priority queues: 10%.

RECOMMENDED OPTIONS

NetApp recommends the following options when using Protection Manager:

• dpmaxActiveDataTransfers=x for all storage systems (use dfm host set)

The value of “x” is the value that depends on Data ONTAP version, replication protocol, and platform. The correct value should always be less than or equal to the maximum supported value for the factors above. The 7.3 releases have a table on the NOW™ site at http://now.netapp.com/NOW/knowledge/docs/ontap/rel731/html/ontap/onlinebk/protecting/reference/r_oc_prot_max-simultaneous-replication-ops.html

There is an NDMP stream count limit of 40 streams for Data ONTAP 7.2.4 and earlier. The stream count is 128 for Data ONTAP 7.3 and increases for newer Data ONTAP releases. Depending on the Data ONTAP version, you can increase the number.

• dpScheduledJobExpiration=12h

Suppose that 150 jobs are started in Protection Manager, and that 100 of those jobs finish in 12 hours. The remaining 50 jobs are dropped, and an error is logged in the dfmscheduler.log file. There is no other

record of those dropped jobs. No retry will be done after 12 hours. So depending on number of jobs you might want to set this value high for retries.

• statusUpdateInterval=5mins

Setting this dfbm option to 5 minutes instead of the default 30 sec, would reduce the job

progress updates from every 30sec to 5 minutes and reduces the load of the DFM database.

• purgeJobsOlderThan=10weeks

Setting this dfbm options to a finite value (such as 10 weeks – based on your preference). This

option purges jobs older than the specified retention period. This helps to maintain the size of

the DFM database and improves performance.

Set SNMPv3 as the preferred SNMP version. As this improves the response times for SNMP communication between DFM and Storage System

Note:SNMPv3 is supported on ONTAP version 7.3 or later and DFM version 3.7 or later.

Increasing the semaphore limit. on Linux

Each running job needs a semaphore array to connect to the database. By default, RedHat Linux has 128 semaphore arrays. The number of data protection jobs that can run simultaneously is limited by the number of remaining semaphore arrays.

On Red Hat Linux, the default limit of 128 semaphore arrays can be increased to 1024 by adding

line below to /etc/sysctl.conf:


kernel.sem=250 32000 32 1024

Recommendation: Increasing the number of semaphore arrays to 1024 is recommended

when DFM is running on RedHat Linux.

To enable scalable architecture, it is recommended to run Operations Manager and Performance Advisor on one server and install Protection Manager and Provisioning Manager on a second server. On the second server, disable Performance Advisor and do not perform any activities using OM. Optionally, on the second server, the following options can be disabled in Operations Manager through the command-line interface.

• agentMonInterval

• ccMonInterval

• envMonInterval

• fcMonInterval

• hostRBACMonInterval

• opsMonInterval

• SANHostMonInterval

• srmMonInterval

• cfMonInterval

• clusterMonInterval

• cpuMonInterval

• statusMonInterval

• vserverMonInterval

Effectively reduce considerable load on the second server by only discovering storage systems and

MultiStore instances that are involved only with respect to data protection.

Table 24)Spliting Decision Table.

Object Type Number

Volumes 12000

Qtrees 150000

Relationship(SV/QSM/VSM/OSSV) 3000

Number of Dataset 300

Number of Userquota 150000

When any of the above object count exceeds in your environment split the DFM server

When more than one of the object count exceeds in the environment definitely split the dfm server.

When the difference between the following commds are more than 2x then contact NGS to prune your db for improving the response times.

dfm volume list –a and volume list is 3x or more


dfm qtree list –a and qtree list is 2x or more

dfm lun list –a and lun list is 2x or more.

dfm host list –a and host list is 2x or more.

RECOMMENDED CONFIGURATION FOR A LARGE INFRASTRUCTURE (100 + STORAGE SYSTEMS):

HARDWARE CONFIGURATION

• Dedicated Server for DFM, with no other application running.

• In case of the server being a Virtual Machine, with same QOS such as dedicated CPU and RAM.

• 2 Intel quad Core Xeon 5440 at 2.83 GHz CPU

• DataFabric Manager server Database 10GB of Free Space

- 20GB free space

- DataFabric Manager server database to reside on NetApp FC LUN

• Minimum 16 GB RAM

SOFTWARE CONFIGURATION

Windows 2000/2003/2008, RedHat Linux ES, SUSE

HARDWARE CONFIGURATION FOR NMC

• Xeon 3.20GHz

• 2 GB RAM

Note:NMC should not be run on the same machine as the DFM server.

OPERATING SYSTEM FOR NMC

Windows XP SP 2

For software requirements, see the compatibility matrix on NOW (NetApp on the Web) at http://now.netapp.com/NOW/knowledge/docs/olio/guides/dfm_compatibility.

CONCLUSIONS

Results might be different with earlier or later releases of DataFabric Manager. When new

releases substantially alter these results, this document will be updated accordingly. The tests

described here show a slight constant increase in the access times (10x) in the Web UI (file

systems, volumes, and disk report pages) of Operations Manager using DataFabric Manager

Server 5.0 in windows and comparable or better results in Linux, when compared with DataFabric

Manager Server 4.0. Also the response time of NMC in DataFabric Manager Server 5.0 and 4.0

has remained same.

In addition to this, with DataFabric Manager 5.0, following was observed:

• It is recommended to disable PA when Operations Manager, Protection Manager, and

Provisioning Manager are enabled on a single server for a scaled environment as described in

section 6.2.1.

• Linux platform provided better performance when all licenses were enabled.


• For a scaled setup as in section 6.2.1, number of data sets needs to be restricted to 220 per

DataFabric Manager Server instance.

© 2011 NetApp, Inc. All rights reserved. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, DataFabric, Data ONTAP, MultiStore, NearStore, NOW, SnapMirror, Snapshot, SnapVault, and vFiler are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. Intel and Pentium are registered trademarks and Xeon is a trademark of Intel Corporation. Linux is a registered trademark of Linus Torvalds. VMware is a registered trademark of VMware, Inc. Microsoft and Windows are registered trademarks of Microsoft Corporation. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. TR-3440

NetApp provides no representations or warranties regarding the accuracy, reliability or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer’s responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.

Date post:	14-Apr-2016
Category:	Documents
Upload:	jensterd2040
View:	19 times
Download:	3 times

NetApp OnCommand 5.0 Sizing Guide

Documents