BestPractices.pdf

CatalogicDPX4.3

Best Practices Guide

dpx43211/18/2014bp

Catalogic Software, Inc., 2014. All rights reserved.

This publication contains proprietary and confidential material, and is only for use by licensees of Catalogic DPX, CatalogicBEX, or Catalogic ECX proprietary software systems. This publication may not be reproduced in whole or in part, in any form,except with written permission from Catalogic Software.

Catalogic, Catalogic Software, DPX, BEX, ECX, and NSB are trademarks of Catalogic Software, Inc. Backup Express is a registeredtrademark of Catalogic Software, Inc. All other company and product names used herein may be the trademarks of their respectiveowners.

Best Practices Guide Table of Contents

Catalogic DPX4.3 2014 Catalogic Software, Inc.

3

Table of Contents

Table of Contents 3

Chapter 1: Technology and Solution Overview 4Audience and Purpose 4

Chapter 2: NetApp Storage System Guidelines 7General Considerations 7Storage and Sizing for Secondary Data 10Storage Configuration 11Existing andMulti-Use Storage 16

Chapter 3: Managing NetApp Storage Systems 18Servers and Data Grouping 18Job Creation and Scheduling 20Miscellaneous Considerations 28External MediaManagement and Device Control (Tape Libraries) 29Troubleshooting and Known issues 30

Chapter 4: External Resource List 35Catalogic 35NetApp 35VMware 36

Chapter 5: Conclusion 37

TRADEMARKS 38

INDEX 41

Best Practices Guide Chapter 1: Technology and Solution Overview


4

Chapter 1: Technology and Solution Overview

Catalogic DPX is designed to protect data, applications, and servers using amyriad of storage technologies. Thisguide specifically describes the combination of DPX software and NetApp storage systems. The hardware and softwarecomponents are configured to implement a system that protects data on supported client systems to NetApp storage andoptionally archives the data to tape. This guide offers specific recommendations for system configuration, as well asgeneral guidelines across all components, including data protection software, storage system hardware and software,and tape library configuration. This ensures that the overall solution operates optimally and fulfills customers specificdata protection needs.

DPX is compatible with a wide range of NetApp storage offerings including hardware FAS and V-series devices, IBMN-series branded hardware, as well as the NetApp software, Data ONTAP Edge server. Data ONTAP 7-mode is asupported destination for DPXBlock backups. 7-mode and Cluster mode (CDOT) are both supported for NDMP backup.

For the latest system requirements and compatibility details regarding supported hardware, file systems, applications,operating systems, and service packs, go to System Requirements and Compatibility. Data ONTAP 7.3.x and later issupported, however it is strongly recommended to run Data ONTAP 8.x or later. Data ONTAP 8.1/8.2 or later arepreferred to take advantage of all current fixes and storage efficiency features.

This guide has been updated for DPX 4.3 and Data ONTAP 8.2. Differences with feature support on prior versions arenoted where important.

Audience and PurposeThis guide is targeted at DPX implementation professionals and advanced DPX administrators. The guidelines listed arebased on deployment and administration experience, as well as the best practices of the respective technology vendors.The document lists known parameters and configurations that lead to a successful DPX implementation. Use it as a toolwhen architecting a solution that fits a customers specific data protection needs.

Implementing these best practice guidelines requires knowledge and understanding of the following publishedmaterials:

DPX Deployment Guide at MySupport

TR-3487 SnapVault Best Practices Guide

TR-3466Open Systems SnapVault (OSSV) Best Practices Guide

TR-3446 SnapMirror Async Overview and Best Practices Guide

TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide

TR-3505i.aWhen to Select NetApp Deduplication and/or Data Compression Best Practices (available on requestfrom NetApp or partner)

TR-3965 NetApp Thin Provisioning Deployment and Implementation Guide

Data ONTAP 8Documentation

Best Practices Guide Audience and Purpose


5

Data ONTAP 8.2 Data Protection Online Backup and Recovery Guide for 7-mode (lists specific limits forSnapVault, SnapMirror, and other limitations for specific devices)

Data ONTAP 8.2 Storage Efficiency Management Guide for 7-mode

Data ONTAP 8.2e Data Protection Tape Backup and Recovery Guide for 7-mode

Data ONTAP 8.2MultiStoreManagement Guide for 7-mode

Data ONTAP 8.2 StorageManagement Guide for 7-mode

Data ONTAP 8.2 SANAdministration guide For 7-mode

For additional information about NetApp licensing, read knowledge base article 42502.

For additional information about FlexClone, read knowledge base article 45779.

Catalogic Software documentation and knowledge base articles can be obtained fromMySupport. NetApp documentscan be obtained from the NetApp Support site and in some cases directly from the NetApp reseller.

The following are also required to architect a successful data protection plan using DPX:

The implementation of amonitoring and alerting framework and a storage procurement plan tomanage the NetAppstorage systems. These requirements are described in TR-3965 and are essential when utilizing thin provisionedspace to avoid performance degradation and storage availability disruptions in production environments. NetApptechnical support canmake specific recommendations on software and procedures necessary to fulfill thisrequirement.

Familiarity with the System Requirements and Compatibility information.

Detailed knowledge of the environment to be protected:

types of servers, versions of operating systems

applications to be protected (structured/unstructured data)

data volatility

data compressibility

locations of servers (local/remote) of data

bandwidth and latency of network links between systems

Firm understanding of data protection needs including:

backup frequency and retention needs

short-term recovery requirements

long-term archival requirements

replication/offsite requirements

Best Practices Guide Audience and Purpose


6

disaster recovery needs and facilities

DPX is integrated with and dependent on the following key NetApp technologies:

Data ONTAP 8.0 or later with 64-bit aggregates, to support larger data stores and built-in storage efficiency features

NetApp FlexVol and thin provisioning technology for efficient spacemanagement

iSCSI LUNs and NetApp FlexClone, used to support fast lightweight access to protected data

NetApp SnapVault Primary, SnapVault Secondary, and NetAppOpen Systems SnapVault licensing, used for backupdata transfer

NetApp NearStore licensing to increase data transfer limits on NetApp secondary storage device

NDMP protocol for Block backup and recovery control, as well as tape operations

Data ONTAP 8.2.1 or later for non-root account usage andMultiStore(vFiler) integration

Knowledge of these technologies and how they interoperate is crucial to understanding how the best practicerecommendations build a strong foundation for data protection success.

The best practice guidelines follow the chronological flow of DPX implementation, starting with the initial setup of theNetApp storage system, configuration and sizing of storage requirements, followed by creation and scheduling of backupjobs to fulfill a data protection plan. This document also covers items specific to utilizing tape libraries.

Best Practices Guide Chapter 2: NetApp Storage System Guidelines


7

Chapter 2: NetApp Storage System Guidelines

General ConsiderationsDPX supports Data ONTAP 7.3.x and later. 7-mode is required for the DPX Block solutions including agent, agentless,NetAppOSSV, controller to controller SnapVault, and NDMP backup. NetApp Cluster mode is supported for NDMP tapebackup only.

It is strongly suggested that the NetApp controller run themost recent versions of Data ONTAP to take advantage ofstorage efficiency features, general improvements with NDMP, resourcemanagement, andmiscellaneous correctedissues. For older 32-bit controllers, Data ONTAP 7.3.7 is suggested. All newer 64-bit controllers are recommended to runeither 8.1.3P1 or 8.2P3 or later. For additional information on known issues and important fixes, see Troubleshooting andKnown issues on page 30.

Note: There is a critical A-SIS related Data ONTAP bug with early versions of Data ONTAP 8.2 described inTroubleshooting and Known issues on page 30.

When configuring network interfaces, ensure that themanagement interface, typically e0M, is not a network interface onthe same subnet as other interfaces intended to transfer data. The e0M interface is typically a low bandwidth interface,100-Base-T onmany controllers. Including the interface in a subnet, especially when ip.fastpath is enabled, will lead tolow performance as the e0M interfacemay get included in sending or receiving data transfer traffic. Configure themanagement interface to its own subnet. If it cannot be isolated, the interface can be completely disabled, withmanagement operations taking place over one of the provisioned data interfaces. Disabling ip.fastpathmay also be asolution. For more information, see NetAppManagement Interface on page 32.

Data ONTAP 8.x 7-mode high-availability pair controller configurations are supported when each controller is utilized asan independent storage device. However, cluster failover features and storagemigration between NetApp nodes is notdirectly managed by DPX. A controller takeover, which occurs during the backup, fails the backup. Backup and restoreoperations to the failover controller are expected to work as along as the failover maintains the SnapVault relationship listand all SnapVault qtrees have been rolled back and quiesced by DataONTAP. Controller takeover and take back is anoperation that occurs completely outside of the data protection solution, assuming these operations are properlyconfigured and successfully executed, the data protection software should not need any special configuration for backupand restore operations. The NetApp controllers move the necessary storage and IP address needed by DPX.

DataONTAP versions earlier than Data ONTAP 8.2.1 requires use of root credential to support DPXbackup and restoreoperations. If the NetApp storage server is only used as an NDMP backup source, a non-root account can be used in anysupported Data ONTAP version.

Data ONTAP versions earlier than 8.2.1 configured with aMultiStore license are limited to using vFiler0 as the backupdestination and recovery source. See the SnapVault andMultiStore section of the Data ONTAP 8.2 7-Mode DataProtection Online Backup and Recovery Guide from the NetApp Data ONTAP 8Documentation.

Data ONTAP 8.2.1 and later introduces a new NDMP security method used to support non-root accounts and to enableMultiStore configured controllers to use any vFiler for data protection operations. For additional information on setting upvFilers and scanning in nodes with non-root accounts, read knowledge base article 46640. However, note that use of non-root accounts and vFiler access may have security implications that are important to consider. The Deployment Guidemakes the following specific security related recommendations:

options ndmpd.authtype challenge

Best Practices Guide General Considerations


8

options httpd.admin.ssl.enable on

The first option instructs the NDMP service to transmit account credentials using anMD5 hash and the second sets upsecure HTTPS access for Data ONTAP API control. Using these options, which employ user credential encryption overthe wire is strongly suggested, however the combination of these secure features is only available to the root account andonly for vFiler0 whereMultiStore is installed. For more information, see Enable Options and Services in the DeploymentGuide

Non-root account setup, including any account used for vFiler access, requires the use of the new NDMP authtype"plaintext_sso", which is effectively equivalent to "plaintext" in that it transmits NDMP credentials in the clear over thenetwork. Note that vFilers, except for vFiler0, do not have an HTTPS service available to them and only support HTTPAPI access which also transmits user credentials in the clear.

The root volume (usually /vol/vol0) should not be housed on a large aggregate. It is recommended to leave the rootvolume in the default NetApp supplied configuration and not expand the aggregate containing the root volume to hostother data. If the root volume encounters a data consistency issue and requires a Data ONTAP wafliron correction, theentire aggregate where the root volume is located needs to be scanned. Large aggregates can take several days to runthrough suchmaintenance tasks. This condition is not a common occurrence, consult your NetApp implementationengineer or NetApp technical support for any specific questions and concerns about adjusting Data ONTAPs rootvolume and containing aggregate.

FlexClone licensing is strongly suggested for all DPX implementations. DPXintegrates with FlexClone features tostreamline the DPX condense process, assist with recovery features such as Instant Access and virtualization, andavoid the need to break SnapMirror replication relationships. For more information, read knowledge base article 45779.Additionally, discuss with your sales representative or data protection engineer. If you choose not to use FlexClone, thenconsider reviewing the Data ONTAP SANAdministration Guide for 7-mode with respect to setting the snapshot_clone_dependency option on each DPXdata volume to avoid errors in the DPX condense process. Without FlexClone, mostrestore operations using a SnapMirror destination volume require that you break the SnapMirror relationship before therecovery action can succeed.

NearStore licensing is required to increase resource limits for concurrent backup and restore operations. NearStore isusually included with most modern Data ONTAP versions.

NetApp storage systems have specific NDMP kernel thread concurrency limits. In general, each data protection taskrequires an NDMP connection, which uses a Data ONTAP NDMP kernel thread. A DPX job generally initiates a separatetask to control a backup or restore operation initiated for a specific device. For example, a Block backup of aMicrosoftWindows server with two source volumes consumes two NDMP connections. Similarly, an NDMP tape backupconsumes an NDMP session for each source volume in the backup job.

Total concurrent NDMP operations are the sum of all backup, restore, NetAppOSSV, and SnapVault Primarytransfer operations performed by a NetApp storage system. Each of these types of operations has its own specificlimits; however, concurrency of all these operations is bounded by the systems NDMP kernel thread limitation.

When architecting new backup jobs, it is important to account for all concurrent tasks that may already be running atthe time a new job starts. Ensure that all concurrent jobs do not exceed the systems NDMP kernel thread limit. It isstrongly advised to reserve at least 20 NDMP kernel threads at any given time as a buffer for unexpected joboverlaps, emergency restore operations, and other ad hoc Data ONTAP data transfers.

Very large client servers can generate a significant number of tasks in a backup job. LargeMicrosoft Exchange DAGclusters are especially prone to this. When the number of DAGdata devices is large and replicated tomany DAGhosts, there is amultiplicative effect on task generation that must be accounted for. For example, if a 4-node

Best Practices Guide General Considerations


9

DAGcluster each contain 25 disk devices used for Exchange data, this translates into roughly 100 tasks to back upthe entire cluster. Thus, a single job containing this cluster could affect other backup jobs running in parallel. Considerjob scheduling and setup strategies which effectively avoid NDMP task concurrency concerns.

NDMP backup/restore is limited to 40 concurrent operations. However, each NDMP transfer also counts against theNDMP kernel thread limits mentioned previously.

Exceeding NDMP kernel thread or concurrent operation limits may lead to job failure.

When architecting a DPX backup solution to work with NetApp functionality managed outside of DPX, such as NetAppSnapMirror, qtree SnapMirror, VolumeCopy, externally triggered NetApp SnapVault transfers, SnapDrive,SnapManager, and Snap Creator; confirm the specific technical limits that apply to your storage systemmodels. Givespecial consideration when DPX is added to an existing NetApp storage system that serves multiple purposes, forexample, primary and secondary storage. The abovementioned functionality generally do not count against NDMPkernel thread limits, however, other storage system specific Data ONTAP limits may apply.

System limits and current usage can be determined from Data ONTAP command line interface directly using thefollowing commands:

priv set advanced

rsm show_limits

Most of these limits are subject to system queuing when concurrent resource requests exceed systemmaximums.However, note that backup and restore operations consume resources bound by the storage controller limits, especiallySV SRC and SV DST limits reported by rsm show_limits. Exceeding these limits may result in DPX job failures. Onecommon conflict exists when an aggressive backup schedule, utilizing SnapVault protocol, is run in parallel withaggressive SnapMirror schedules.

rsm show_limits prints a detailed review of system resources available and their cost. At the top of the output, thiscommand prints Reservations, Tokens, and Transfers summaries. Reservations refer to resources that are reserved forspecific operations. Reservations for volume SnapMirror are reported in the VSM figure and are controlled by the optionreplication.volume.reserved_transfers. SnapVault is reported by the QSMreserve figure and is controlled by the optionreplication.logical.reserved_transfers. Careful use of these options and reservations is recommended, as reservingthese resources will prevent other operations from running, even when no reserved transfers are in progress; thereservations remain idle. The Transfers section displays real time information about current system use. Each operationhas an assigned cost that when underway removes Avail_Tokens from the system resource pool. TheMPVSMSRC/DST figures report resources used for volume SnapMirror transfers. SVSRC/DST figures report SnapVaultprotocol use including DPX agent Block backup and controller to controller backup. Legacy SVSRC/DST report onNetAppOSSV agent transfers.

For a secondary NetApp system dedicated to DPX storage, the NDMP kernel thread use is generally the limiting factorfor agent-based backup and NDMP tape use. The rsm show_limits resource figures aremainly of concern if you arecoordinatingmultiple controller operations such as running SnapVault and SnapMirror operations in parallel, using volcopy, coordinating OSSV transfers outside of the DPX product, or using other NetApp technologies in amixed storageenvironment.

Agentless backups are not generally constrained by NDMP kernel thread limits as these operations transfer data directlyto and from LUNs for either iSCSI or Fibre Channel. Agentless also does not use any resources reported by rsm show_limits. NetApp LUN concurrency limits are significantly higher and do not have an effect on this functionality. Consider

Best Practices Guide Storage and Sizing for Secondary Data


10

how concurrent agentless data transfers to NetApp storage systems could affect overall performance, however, thesejobs can be freely run concurrently with other agent-based jobs.

Customers familiar with NetApp DFM and ProtectionManager products have documented SnapVault fan-in limitationsthat are quite low, typically four relationships or less. DPX is different than those other products and the published limitsfor DFM and ProtectionManager do not apply to DPX. It is typical for DPX tomanagemore than four relationshipsfanning into a single volume including DPXclient backup, OSSVagent, and controller to controller SnapVault. DPX hasimplemented SnapVault control to be conforming with all NetApp guidelines and best practices and has verified that ouruse of SnapVault and fan-in is well within the limits and expectations for Data ONTAP.

Storage and Sizing for Secondary DataAny physical disk drives supported by NetApp can be used for secondary storage. It is typical to see secondary storageneeds met with lower-cost and larger capacity SATA drives. When architecting DPX, consider the usable spaceavailable at the volume level after aggregate creation.

It is strongly recommended to follow NetApp's best practices for provisioning disks and creating storage aggregates. It isgenerally recommended to take a conservative approach and use the typical RAID-DP options and provision therecommended number of hot spares. The following are not recommended:

Using RAID4 with aggregate setup

Eliminating hot spares

Extending the aggregate used for the root/boot volume

Storage tuning is a useful method to increase the usable space for secondary storage. These are topics that should bereviewed and approved by NetApp support and the NetApp hardware implementation engineer. They can assess thestorage configuration risks to the Enterprise.

Storage needs for DPX depend on the size of existing data, frequency of backup, retention period, and change rate.Consult your Catalogic Software sales engineer for approximate storage requirement estimates for your specificenvironment and data protection policies. It is advised to take a conservative approach for initial storage provisioning, asit can be difficult to estimate what an environment's change rate and growth will be over time. Additionally, note thatstorage efficiency savings are not absolute and are inherently data dependent. A-SIS and compressionmay not beappropriate for all secondary storage volumes and the savings achieved with either are highly dependent on similarity andcompressibility of the source data.

Short-term iSCSI restore operations, for example IA map and BMR, generally do not consumemuch space. However,longer term use, such as long running RRP restores or use of IV for I/O intensive large data sets could consumesignificant space in the volume containing the LUN. Youmay either reserve aggregate space to account for such usecases or regularly monitor aggregate space usage to avoid out-of-space conditions.

FlexClone volume clones temporarily count against the storage systems maximum volume limitation. This is of specialconcern if the total number of volumes used on a storage system for both primary and secondary data is very close to thestorage systems published limits.

Best Practices Guide Storage Configuration


11

Storage ConfigurationCreate the largest 64-bit aggregates the system can support. This maximizes usable space by minimizing the number ofdrives dedicated tomeeting RAID parity and hot-spare requirements. 64-bit aggregates are required to takemaximumadvantage of various storage efficiency options.

Aggregates for secondary storagemay span storage shelves; this helps to create aggregates of the largest sizesupported by the storage controller.

DPX does not use aggregate level snapshots, it is highly suggested that aggregate level snapshots be disabled onaggregates hosting secondary backup data. Aggregate level snapshots unnecessarily trap blocks that otherwise expireand are removed in the course of normal operations. Before disabling aggregate level snapshots, check that the NetAppstorage system does not require the use of these snapshots for other Data ONTAP features such as NetApp SyncMirror.

Disable snapshot reservation and NetApp scheduled snapshots on all volumes that contain backup datamanaged byDPX. The snapshot reservation wastes space in this case and snapshot scheduling can have unintended side effects,such as retaining blocks of data that should be expired and using up limited snapshot copies that are needed for newbackups.

Data ONTAP 8.0.x cannot migrate existing SnapVault data from 32-bit to 64-bit aggregates. It is advised to create 64-bitaggregates and re-base any existing SnapVault data to the 64-bit aggregate. Once the retention period requirements aremet on the new destination, the 32-bit aggregate can be removed and the disks merged into an existing or new 64-bitaggregate.

Data ONTAP 8.1 and later can automatically convert 32-bit aggregates to 64-bit. Youmust add enough disks to theaggregate such that the newly expanded storage exceeds the 32-bit storage limit. The aggregate converts automaticallyand in the background. A storage administrator can expect to see some performance degradation to aggregates that arein the process of migrating to 64-bit, however all existing data and SnapVault relationships should be unaffected.

NetApp systems have a file size and LUNsize limitation of 16 TB. This limitation applies to all of the recent versions ofData ONTAP and all NetApp controller modules. This 16 TB limitation is the limit for any single volume you need toprotect. For example, a client machine configured for agent-based backup cannot have any one file system larger than 16TB. For agentless backup, the VM cannot have a VMDK that is larger than 16 TB. Other methods supported by DPXsuch as file level or OSSV backup can be used to protect file systems exceeding the 16 TB limit.

Use thin provisioning with proper monitoring, event alerting, andmitigation plans in place. See General Considerationson page 7. Proactively monitoring space utilization at the volume and aggregate level is necessary to avoid the aggregatefilling up. An aggregate running out of space has significant effects on any operation that the aggregate supports,including performance degradation and I/O errors. An aggregate that runs out of space sends all transferring SnapVaultrelationships into a quiescing or rollback state that deadlocks on resource contention. The only solutions are to addmorespace to the aggregate or to kill the SVrelationship and delete affected data. It is suggested that you use spacemonitoring and threshold alerting tools to closely monitor storage system space utilization.

A general guideline is to create thin provisioned destination volumes that are two to four times the space required for theinitial base backup of the source data. Actual space required depends on the retention duration, backup frequency, andthe expected change rate of the source data. For a typical server that is backed up once per day, a 14 day retention wouldrequire two to three times destination storage and a 30 day retention three to four times destination storage. For morecomplex backup/retention/change rate scenarios, contact Catalogic Software sales engineering or professional servicefor assistance.



12

Enable deduplication or A-SIS on all volumes used for secondary storage:

Check the version of Data ONTAP in use. Older versions of Data ONTAP 8.2must be upgraded to Data ONTAP8.2P3 or later to avoid a potential data corruption issue when using A-SIS. See Troubleshooting and Known issueson page 30.

For agent-based backup target volumes, disable the default automatic deduplication schedule. The completion of theSnapVault processes automatically starts a deduplication task following the data transfer. There is no way to alter orcontrol the automatic SnapVault deduplication process other than to disable A-SIS storage efficiency features at thevolume level.

For agentless backup target volumes, enable scheduled A-SIS deduplication on the volume if it is not alreadyenabled. If a deduplication run is not scheduled, the data in the volume does not benefit from additional storageefficiency gains even if the destination volume is configured to support this. Schedule the operation to occur after thedata transfer has taken place. In general, schedule a deduplication operation to occur one to two hours after the job isexpected to complete. Scheduling frequency or exact timing is not critical, however, it is suggested to rundeduplication at least once per day and preferably to complete before any volume SnapMirror operations.

Change deduplication configuration and scheduling using the NetAppOnCommand SystemManager utility or thecommand line interface:

For agent-based backup, use the NetAppOnCommand SystemManager user interface utility to select the On-demand option for the volume. The same can also accomplished using sis config s path to disable alldeduplication schedules. The deduplication process is automatically triggered by the completion of a SnapVaultoperation and cannot be scheduled or overridden. If deduplication is undesirable, for reasons such as the data doesnot deduplicate well or there are performance concerns; it can be disabled on a volume by volume basis or across theentire NetApp storage system.

For agentless backup, it is suggested to use the OnCommand SystemManager Scheduled volume option. See alsothe sis config manual page for command line scheduling options. Configure a schedule to begin the deduplicationprocess after the backup job is complete, or at some point during the day where starting such process has minimaloverlap with other processes that may consume storage system resources.

Current versions of Data ONTAP 8.x and later support up to eight concurrent deduplication processes across astorage system controller and one active process per volume. Data ONTAP queues any outstanding deduplicationrequests.

Automated schedule requires that the default data growth threshold of 20% be crossed before initiating deduplication.See TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide. This is set usingthe OnCommand SystemManager Automated volume option or by issuing sis config s auto path from thecommand line.

It is not generally suggested to use the Automated deduplication setting for all volumes. Consider using this featureon some of the volumes if the number of concurrent deduplication processes is of concern or if the average size ofincremental data transfer generally exceeds the 20% threshold. Automated deduplication is not suggested forvolumes containing DPX agent block data through SnapVault as the SnapVault protocol triggers A-SIS automatically.

Compression features may be considered to help optimize secondary storage usage. Enabling compressionmay impactstorage system CPU andmemory usage patterns and should be tested within your specific environment prior to ongoinguse. See TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide for NetApprecommendations as well as the additional suggestions for overall DPX implementation in this document. Inline



13

compression is available in Data ONTAP 8.0 and later and should be tested on a volume by volume basis until anacceptable balance between resource use, performance, and data protection goals are achieved. Data ONTAP 8.1 andlater offer the post-process compression feature, which can also be considered and tested to help defer compressionoperations and help normalize storage system resource usage throughout the day.

Disable volume auto-grow for secondary data volumes used by DPX. It is generally better to thin provision and over-sizethe volume rather than use auto-grow features. In the field, auto-grow does not seem to work well with SnapVault datatransfers, generally causing these to fail and enter quiesce when the auto-grow action is initiated.

Disable auto-delete of older snapshot copies. Leaving this option enabled can potentially lead to recovery point data lossif the volume or aggregate starts to run out of space. During low space conditions, Data ONTAP attempts to delete oldersnapshots to reclaim space. Loss of DPX created snapshots are not synchronized with the DPX catalog. Thus the DPXcatalog reflects recovery points that should be available but are in fact removed from the secondary storage.

Do not configure snapshot reserve or fractional reserve for secondary storage volumes containing DPX Block backupdata. Configuring this does not interfere with any operations, but is unnecessary. Fractional reserve is a primary storagefeature that is not used with DPX data. Snapshot reserve is not necessary since all recovery points are retained insnapshots. Setting a snapshot reserve removes the reserved percentage from certain storage reporting and only serve toconfuse storage administration.

There are no explicit requirements with regard to volume naming conventions, however, the following recommendationsmay help simplify ongoingmanagement of the DPX solution:

Keep volume names as short as possible because NetApp limits qtree path names to a total of 63 characters. qtreesare used for primary storage (CIFS/NFS), NetAppOSSV backups, and all other DPX Block backup data repositories:

Data ONTAP imposes limitations on the total path name length and some of the file names created during thebackup process can be lengthy. Keeping the volume name short helps avoid conflicts in the backup and recoveryprocess.

There are some recovery processes that may require keying in a volume namemanually or selecting it from a list.Shorter names are easier to identify and find.

qtree names generated by the data protection solution are constructed from a combination of the backup jobname, node name, and a task-specific identifier. The NetApp volume path name, backup job name, and the servernode name can be controlled by the user. The task-specific identifier is an internal reference that is generatedbased on the job type and device type or both.

Avoid naming conventions where volume names are very similar. For example, avoid using volume names that areprefixed or suffixed with long strings of identical characters. Doing so generally makes it difficult to sort, search, andenter volume names.

It is best to make volume names as unique and as readable as possible, preferably using some consistent namingconvention that is obvious and easy to understand for all individuals involved inmanaging andmaintaining thesolution.

Consider adding a short prefix that associates the volumewith some function. Examples include designation oflocation, department, server function. Additionally, consider suffixing the volume namewith something thatidentifies the retention period. Large storage systems with many aggregates may also benefit from a keyword orletter to designate the purpose or location of a specific volume.



14

Avoid special characters, for example *, &, !, non-printable ASCII, and other extended character sets whennaming volumes, aggregates, jobs, and logical node names, as thesemay bemisinterpreted by the software.Data ONTAP does not allow qtrees to contain non-ASCII characters.

Monitor aggregate and volume space usage on an ongoing basis and avoid situations where the aggregate or individualvolumes run out of space. Full or nearly-full aggregates can suffer from performance degradation and it is important toconfigure your storage system to avoid such circumstances. It can be challenging to correct a low space conditionwithout deleting data, adding new storage, or moving data around. A volume running out of space is easier to addressprovided the containing aggregate is not full or nearly full, however, a full volume affects all backup and restore operationsfor which it is a destination. A volume running out of space also leads to qtree rollback operations, which generally requireaddingmore space to correct and depending on the data size; can take several hours or days to remedy. See Storageand Sizing for Secondary Data on page 10 for a spacemonitoring, alerting, andmitigation plan that must be in place toproperly manage NetApp storage servers using thin provisioning.

Consider the Data ONTAP total volume count limits. Lower-end NetApp storage systems are limited to 200 volumes percontroller, while higher-endmodels have a limit of 500 volumes per controller. This includes volumes used for anypurpose such as primary storage, secondary storage, root volume, and volume FlexClones. This is an important factor toconsider when grouping servers into volumes and in any scenario where NetApp storage is not dedicated entirely to DPX.

Configure and enable Storage efficiency features, such as deduplication and compression, on the volume prior to runningthe first backup job. Greatest benefits are realized when new/empty volumes are used for the initial base backups,followed by scheduled incremental backups. Although it may be possible to run deduplication and compression onexisting data, this does not generally result in immediate storage savings andmay require additional space and I/O forprocessing. When carrying out storage efficiency post-process operations on existing data, maximum benefits aretypically not attained until after all data trapped in existing snapshots have expired.

Enabling inline compression indiscriminately on all configured volumes is not recommended. Inline compressionalgorithms consumeCPU andmemory resources that can overburden a NetApp storage system resulting in degradedperformance. Test compression features prior to ongoing use on specific volumes containing representative data. This isespecially true for lower-end FAS 2xxx series models and storage systems with mixed workloads, for example,production data and backup on a single storage system. Conduct testing to determine if compression should be enabled,on which volumes, and how the use of the features affects overall storage system performance. The backup secondarystorage use case is much different than typical primary storage I/O use; backup tasks send a continuous stream of dataall of which is subject to compression as the controller receives it. Inline compression is not recommended when thecontroller CPU use is above 50% utilization. Inline compression slows down the backup process by as much as half ormore, based on controller size and available capacity.

As data compressibility and performance impact are inversely related, consider disabling compression on volumes whereefficiency gains from compression are not significant. Leaving compression enabled in such cases does not producemeaningful storage efficiency gains; however, it does consume storage system resources. Similarly, do not attempt tocompress data that is already highly compressed (file servers with a lot of JPG images, ZIP files). Note that themorecompressible the data is, the lower the system resource overhead and performance impact is on the storage system.

Deferred volume compression has also delivered favorable results. Deferred compression is a scheduled task which ismonitored, limited, and controlled by Data ONTAP, similar in concept to how A-SIS is controlled. Data ONTAPschedules and limits resources consumed by deferred compression so that it has minimal effect on other importantcontroller operations such as backup and primary storage use.

Consider the impact of of deduplication and compression on NDMP tape backup:

Dump backup data is rehydrated and uncompressed inmemory before being written to tape. This inflates theamount of data which goes to tape and consequently requires media equal to the amount of logical data being stored.



15

SMTape backup all volume attributes are preserved and the data is written to tape with deduplication andcompression savings intact. Note that the storage system used for the restore operation needs the same or later DataONTAP release and all options and licenses enabled for the restored data to be accessible.

For amore detailed description of Dump and SMTape backupmethods, see the DPX Deployment Guide atMySupport.

Consider Storage efficiency features such as deduplication and compression or both on agentless backup destinationvolumes when the VMware source of that data resides on an NFS data store. For VMware NFS data stores, the initialbase backupmay transfer the entire allocated VMDK, even if the virtual disk is thin provisioned, and not all of the sourceblocks are occupied by data. Although the base backup needs to transfer the entire VMDK footprint once, deduplicationor compression or both should eliminate the destination storage impact of the unallocated blocks, thus saving space onthe NetApp volume. Ongoing incremental backups only transfers blocks changed since the initial base backup. Theabove described behavior is consistent with VMware Change Block Tracking features detailed in VMware KB article1020128.

Inline compression can be used as a tool to reduce the required landing zone for agentless backup data originating fromNFS datastores where initial CBT tracking is not possible. This does not reduce the amount of base backup data sent tothe controller but it may drastically reduce the initial storage required to hold the base backup. Once the base backup iscompleted, inline compression can be disabled or replaced with deferred compression.

DPXcan track VM locations when VM's are part of a higher-level object, such as a resource group. Note that VMwaredoes not maintain CBT when a VM is storage vMotioned to another datastore. If a storage vMotion occurs, the CBT islost and this results in a DPX base backup transfer for that affected VM. If themoved storage is hosted on an NFSdatastore, then the base transfer might also include all data for the device, not just allocated blocks. If you expect VM's tobemoved frequently, account for this in your secondary storage plan. You will either need tomonitor and extend thevolume size each time a base transfer is needed or perhaps back up the VM resource group on a short retention cyclewhich expires and frees up space frequently. The agent-based solution is preferred here as the agent-based backupcontinues incremental regardless of how or when the VM's storage is migrated.

When utilizing SnapMirror between NetApp storage systems, consider the bandwidth needed to support the base backupdata sets to transfer and a prudent synchronization schedule:

If tape is available, consider using SMTape backup and restore to seed volume SnapMirror operations.

Where tape is not available or inconvenient, configure the SnapMirror relationship on empty volumes, before any otherdata transfer takes place. This establishes the relationship and initializes checkpoint facilities. Subsequent updatesbenefit from checkpoint restart should data transfer interruptions occur. SnapMirror can be established after thesource volumes contain data, however, this requires that all of the source volume data be successfully transferredfirst before checkpoint data can be established. An interruption in the SnapMirror initializationmay require a larger setof data to transfer from the beginning.

Configure SnapMirror synchronization to occur once per day, after the backup is completed. Do not attempt toconfigure SnapMirror synchronization to runmultiple times per day as this is not necessary, it wastes availablesystem resources, and can interfere with other important operations. A conservative approach would be to configurethe volume SnapMirror to occur 12 hours after the backup has completed.

Consider running SnapMirror schedules outside of the backup window when possible. This avoids controller resourceconflicts between SnapVault and SnapMirror operations indicated in the General Considerations on page 7.

Best Practices Guide Existing andMulti-Use Storage


16

It is not recommended to use SnapMirror Sync or Semi-Sync with DPX. These Data ONTAP features are intended tobe used as near real time storagemirroring for primary storage use cases. The SnapMirror Sync and Semi-Syncfeatures do not add any additional protection to DPX secondary storage volumes. Using suchmay imposeunnecessary load on your controller when backup, restore, A-SIS, compression, or condense operations areunderway. Additionally, it is unknown if data on the SnapMirror Sync/Semi-Sync destination will be usable for restoreif themirror is interrupted in themiddle of a backup, snapshot, or A-SIS operation. If you desire to replicate secondarystorage, Asynchronous SnapMirror is the preferred and supportedmethod.

Creating FlexClone copies from SnapMirror destination snapshots may lead to SnapMirror replication errors if thesource DPXhost continues a normal cycle of backup and condense. If a recovery operation such as an IA, IV, oragentless recovery lasts for a very along time, the SnapMirror sourcemay want to expire that snapshot and replicatethese changes to the SnapMirror destination. The replication fails since the destination has snapshot data being heldbusy by the recovery operation. This condition can also occur if a recovery operation fails in a way that leavesFlexClone volumes behind or if a manual FlexClone is created from the SnapMirror destination. This is a known andexpected behavior of Data ONTAP. You will need to either remove the FlexClone copy or arrange tomanually split offthat clone because a data copy operation could take significant time.

iSCSI is a core requirement of the DPX data protection solution. iSCSI is mainly used for block restore, verification,BMR, and agentless backup/restore. If the NetApp controller is not configured for iSCSI or is isolated from otherproduction networks, the DPX may not be able to perform desired functions. NetApp interfaces that block iSCSI trafficvia options interface.blocked.iSCSI may fail backup and restore operations that require the use of that interface.Entities that would require access to a NetApp interface for iSCSIaccess include hosts attempting IA maps, ESXhostsattempting agent or agentless virtualization restore, Virtualization proxy nodes coordinating agentless backups, andBMR.

VMware ESX servers must be able to interact with iSCSI. The ESX server must have the software iSCSI initiatorinstalled and enabled. The ESX server must have one or more local NICs that can host a vmkernel service; iSCSI LUNattachment for to the ESX server takes place on an available vmkernel interface. The vmkernel requirement disqualifiesuse of any NICs that are used for high availability and clustering functions where the interface and available IP addressesare not usable by the local ESX server. For agentless backup, your designated virtualization proxy nodes, which could bephysical or virtual hosts, must have access to a NetApp interface to write iSCSI data. Additionally, the individual VM'shosted on an ESX cluster must have the ability to route and access iSCSI sources on a routable NetApp interface toperform restore operations.

Existing and Multi-Use StorageFor NetApp systems that host primary (production) and secondary (backup) data, additional considerations apply. DPXbackup jobs produce a different load characteristic than typical production storage use cases, for example file sharing,application back end data storage, and VMware storage. Block-level transfers produce concurrent data streams that arenon-cacheable sequential write operations. Consider scheduling backup jobs outside of peak production use hours tominimize degrading performance of critical applications.

When architecting a data protection solution, note that hosting primary and secondary data on the same storage systemintroduces a critical single point of failure. A loss or failure of the sole storage system results in the loss of both productionand backup data. If a storage system is to host primary and secondary data, it is strongly suggested that the secondarydata be periodically moved to another location via tape or transmitted offsite via SnapMirror replication.

Do not attempt to use primary storage volumes as backup destinations.

Best Practices Guide Existing andMulti-Use Storage


17

Avoidmixing primary and secondary storage within the same aggregate. Create separate aggregates for primary andsecondary storage needs. This helps isolate performance issues and prevent out-of-space conditions.

Where aggregates must host both primary and secondary data, it is strongly recommended to have amonitoring, alerting,and spacemitigation facilities in place as described previously. If these facilities are not in place and you do not havepredictable and known storage growth patterns, consider utilizing space reservation on your primary data volumes,especially where LUNs are configured. Although this conservative approachmay require allocatingmore space, it helpsprotect the primary data volumes from I/O errors if the secondary data fills the aggregate.

Give special consideration to backup of NetApp volumes containing primary LUN data. This includes NDMP backup orcontroller to controller SnapVault backup of volumes containing live LUN data. Typical scenarios include Fiber Channelor iSCSI attached LUNs used directly by applications such as SQL Server, Exchange, or VMware ESX data stores.When designing a solution that manages the backup of LUN data,verify that there are no attempts to quiesce the sourceLUNs, nor are the contents of the LUNs cataloged or searchable. Review how the LUN data can be placed into amodethat results in an application consistent backup and confirm that the tools, systems, and procedures necessary to restorethe LUN data are available in a usable way to the appropriate application. Data consistency is usually accomplished byquiescing the LUN data in someway such that a valid volume snapshot can be taken. DPX can then back up thisapplication consistent data either via Data ONTAP SnapVault or NDMP backup, or the data can be replicated viaSnapMirror. Some applications such as SnapManager can arrange to create a predictable Snapshot copy name, which isused to build a scheduled NDMP backup job.

Similar consideration applies to NFS data stores used by ESX servers for VMDK virtual machine storage. When backingup these volumes directly with either NDMP tape backup or SnapVault Data ONTAP primary to secondary transfer, DPXdoes not quiesce VMDK storage nor catalog the contents of VMDK files. This is similar to the behavior describedpreviously for volumes containing primary LUN storage.

Best Practices Guide Chapter 3: Managing NetApp Storage Systems


18

Chapter 3: Managing NetApp Storage Systems

Install NetAppOnCommand SystemManager onto one or moremanagement workstations. This is the NetApprecommended free tool for storage systemmanagement andmonitoring. Do not rely on a built-in web-based NetAppFilerView application, as this is unavailable with Data ONTAP 8.1.0 or later.

Install a full-featured telnet/SSH terminal emulator that can easily capture output. TheWindows telnet client is generallycumbersome to use. SSH is preferred for its security and ability to be scripted. UNIX and Linux machines are especiallyuseful for SSH and scripting, howeverWindows variants do exist. For an example of SSH scripting, read knowledgebase article 46630. RSH can also be used for scripting purposes; it is much easier to set up, but much less secure thanusing SSH.

Install an FTP client. Command-line FTP utilities are acceptable if you are familiar with them. Otherwise, a graphical FTPclient is easier to use. An FTP client is useful for occasionally collecting log files from the NetApp device.

Configure and test FTP access to the NetApp root volume (usually /vol/vol0). This is required for log collection iftroubleshooting is necessary. The FTP password is not tied to the systems NDMP or system password and can easilyget out of sync. It is suggested to have the FTP and NDMP/system passwords be the same, so that included supportand troubleshooting utilities can get equal access as needed. As indicated previously, this should use the systems rootaccount, as the included software and tools all pull NetApp configuration information from a common place. If rootaccount use with FTP is unacceptable, theremay be some log collecting tools that will not work. However, the NetApplog files can usually be collectedmanually either by an alternate FTP account or via other methods such as CIFS andNSF sharing. For more information regarding FTP access errors and NetApp log files, read knowledge base articles45648 and 41798. If FTP access to your controller does not work, call NetApp technical support for assistance.Depending on what kinds of access your controller is licensed for, this could be a delicate operation to correct. Note thattroubleshooting NetApp operations often requires reviewing a large number of logs by both NetApp Support andCatalogicSoftware Data Protection Technical Support and the OnCommand SystemManager tools currently do not permitdownloading these files.

It is extremely important that the NetApp storage administrators understand that they should not delete, rename,relocate, or alter DPX snapshots or underlying destination volumes in any way. Do not use tools such as FilerView,SystemManager, or the Data ONTAP command line interface to alter backup data in any way. This includes attemptingtomanually clean up space when aggregates or volumes fill up, unless instructed to do so by Catalogic Software DataProtection Technical Support. DPX manages the lifecycle of snapshots and will add, remove, and clone snapshots asneeded and requested through themanagement console. The appropriate way tomanage recovery data is to either alterretention periods or remove servers/jobs from within themanagement console. The space is reclaimed after the nextcondense operation.

Servers and Data GroupingThemaster server component of DPX can generally support approximately 300 nodes protected via agent-basedbackups. This estimate assumes an average of two to three source volumes per server and once daily backups.

Master server capacity is ultimately dependent on overall task scheduling and resources, so if the total number of backuptasks or the job frequency requirements are higher themaster server capacity is lower.

Best Practices Guide Servers and Data Grouping


19

Agentless backups are bound by different system constraints. Agent-based and agentless jobs can be freely intermixedin the backups schedule. Agentless backups do not conflict with NDMP kernel thread usage nor with other Data ONTAPlimits for SnapVault and SnapMirror.

Predominantly for agent-based deployment, it is recommended to deploy nomore than 300 nodes on amaster server. Ifthere is a small amount of agentless nodes needed, 20 nodes or less, this can be added.

For mixed agent/agentless environments, it is recommended to limit the agent-based nodes to 220 with amaximum of400 nodes including agentless. For large agentless environments, the 400 nodemaximum applies. Where necessary,lower and distribute the agent-based load across multiple masters to make room for additional agentless nodes.

Multiple NetApp storage systems can be configured for use with a single master server/Enterprise.

Each NetApp storage system added to the backup Enterprise should have a dedicated client node, otherwise known asan NDMP proxy server. Do not use a single server as a proxy for more than one NetApp controller. For small Enterpriseconfigurations, 50 nodes or less, themaster server can be the proxy for the NetApp controller. For larger Enterprises,avoid using themaster server as a proxy for any NetApp controller. Instead, choose an individual DPX client machine toserve as a dedicated proxy node for each NetApp controller in the Enterprise and same subnet.

An NDMP proxy may be a virtual machine, however, in larger environments CPU andmemory resources may need to bereserved for this VM to ensure adequate performance.

When considering where to locate an NDMP proxy server, give preference to nodes on the same subnet as the NetAppcontroller.

Configuringmultiple master servers with a single common storage controller is strongly discouraged. Sharing a singlestorage controller across multiple master servers/Enterprises complicates job scheduling andmakes resource conflictmanagement significantly more difficult. It is possible to scan a single NetApp controller into multiple Enterprises.However, caremust be taken to ensure that themaster server jobs are carefully scheduled to avoid exhausting NDMPkernel thread resources and causing SnapVault and SnapMirror resource contention.

Servers using DPX agent-based backupmust be current on all recommended vendor operating system patches andapplication patches. Defragment Windows servers prior to running DPX base backup. Configure Linux servers to fulfill allminimum requirements for LVM2 and free extents prior to DPX installation.

A Block backup jobmust back data up to its own dedicated volume. A NetApp volumemust not host data for more thanone backup job. Do not configuremultiple backup jobs to share volumes otherwise data retention, condense operations,and reclaiming of space will be problematic. Sharing volumes across backup jobs also complicates deduplication andcompression operations on the NetApp. See Storage Configuration on page 11 for more information on the NetAppfile/LUN limitation.

Large implementations (>100 clients) should strongly consider grouping servers with like function and similar data or bothinto backup jobs. Design jobs groupings around:

Geographic location: Do not mix local and remote hosts in a single backup job.

Data retention: Combine servers with similar data retention requirements.

Source server data size: Do not combine server backups with drastically different data sizes. Ideally, for data transferconcurrency and job completion, all incremental backup transfers within the job would be roughly the same size, andtherefore complete around the same time.

Best Practices Guide Job Creation and Scheduling


20

Server type or server function: Avoidmixing dissimilar operating systems or dissimilar applications, as theremay belimited deduplication benefits across diverse operating system platforms and data sets.

Cluster nodes: Consider creating independent jobs for large clusters with many nodes and/or physical volumes.

Business function: Consider organizational separation, however, note that grouping servers with disparate features orconsideration as described abovemay not benefit from deduplication.

Application functionality: Consider breaking up large application servers into multiple jobs, which focus on specificresources. This is especially helpful for largeMicrosoft Exchange DAG clusters, but can also be used for MicrosoftSQL Server. Break jobs up with a focus on server devices, as the Block backup is inherently device centric. Avoidcreating such jobs where data is shared on common devices as each backup job creates its own base backup andconsume the necessary secondary storage space to protect this.

In smaller implementations,it is recommended to dedicate a backup job for each server, backing up each job to a singlevolume. The primary reason is to improve scheduling flexibility andmaintain stricter separation between servers, groups,and departments. Note the previously mentioned Data ONTAP model-specific volume limits. In addition, not groupingservers in a single job reduces potential deduplication benefits.

A DPX node cannot perform block-level backup for attached CIFS/NFS mount points. DPX can back up CIFS/NFSprimary data on a NetApp controller either via Data ONTAP SnapVault Primary backup to a Secondary system or NDMPbackup to tape. Alternately, this data can be SnapMirrored to a second NetApp controller and then the SnapMirrordestination volume can be backed up via NDMP tape backup.

For remote site server backup, tuning the TCPsocket keepalive settings for themaster and clients is stronglyrecommended.

For more information about socket keepalive settings forWindows, Linux, and Solaris, read knowledge base article46021. It is recommended to reduce keepalive transmission to 10minutes or less. The overhead that TCP keepalivepackets generate are negligible even for very low bandwidth links and reducing this setting helps avoidmany commonfirewall, router, and latency issues that may time out idle or very slow moving network connections. Tuning keepalivesettings on themaster server is generally good and has no other effect on the local high performance network. All remoteclients should receive similar tuning. Local nodes generally do not need this kind of tuning.

Job Creation and Scheduling

General NetApp Storage Recommendations

NetApp storage systems impose a limit of 255 snapshots per volume. When considering job frequency and retentiontime, plan for nomore than 245 snapshots in a volume. NetApp requires a few snapshots for general activities such asmaintaining SnapVault relationships, managing roll back operations, Dump/SMTape tape backup requests, SnapMirrortransfers, and others. Each backup job run results in one snapshot copy that preserves that recovery point-in-time; this isregardless of the number of source servers, tasks, or amount of data contained in the destination volume.

Also consider the size of the storage system when grouping servers into jobs andmixing job types. Higher-end storagesystems such as the FAS6xxx series have higher resource limits andmore CPU andmemory to accommodate greater



21

concurrency. Lower-end storage systems such as the FAS2040 have lower resource limits, which in turn necessitateless aggressive concurrency andmore conservative job distribution and scheduling. Some features such as onlinecompressionmight not be appropriate for lower end storage controllers.

Note that backup traffic is not like typical primary storage use. Backup traffic generally consists of a continuous streamof sequential data writes that do not benefit from caching. Primary storage I/O is generally more random access and canbenefit tremendously from caching. Backup traffic generally requires more CPU resources than typical primary storageuse cases.

DPX Agent-Based Backup

For agent-based backup jobs, the number of concurrent tasks is limited by Data ONTAPs internal NDMP kernel threadresources. Each DPX task corresponds to one NDMP session, which in turn uses one NDMP kernel thread.Concurrency of jobs and tasks must be planned and scheduled to ensure that the storage systems resources are notexhausted. DPX tasks include the following:

NetAppOSSV backup: Each path backed up corresponds to one task.

NDMP backup: Each path backed up corresponds to one task.

Restore: Restore jobs are similar to backup jobs for task usage, however, restores are usually performed on an adhoc basis.

Linux BMR: Each target disk which transfers data corresponds to one task, however BMR is usually an ad hocoperation.

The total number of concurrent tasks must not exceed the storage system limits. Jobs can consist of one or more serversand each source volume on each server use at least one task. Thus, populate and spread out the schedule such that thetotal number of concurrent tasks running across jobs at any given time does not exceed the storage systems internallimits.

In environments where SnapMirror scheduling is used and/or other NetApp data transfers are coordinated outside ofDPX, youmust account for this additional resource use and adjust the job schedule concurrency to accommodate. SeeNetApp Storage System Guidelines on page 7 for additional details and references to Data ONTAP's resource limits.

A recommended starting point for DPX SnapVault job creation is four to eight servers per job, backing up to a singlevolume. However, optimal server count depends on the number of source volumes, each requiring a single task, amountof data to be backed up, and the expected duration of the incremental data transfer. See Servers and Data Grouping onpage 18 for additional details on how to group servers into jobs.

The range of servers per job is flexible and should be adjusted based upon the data and servers included as describedabove. For example, if the environment has a lot of very small servers with few source volumes, it is recommended togroupmore of such servers into a single job. Conversely, servers with large data footprints andmany source volumes,such as large clusters, should configure fewer servers per job. Isolate clusters to their own job. Very large servers,especially those with high data change ratemay benefit from being isolated in their own jobs.

Any job that is likely to initiate more than 75 parallel tasks is better broken down in several jobs; where these jobs couldbe scheduled to reduce the overall parallel operations to a single controller. Very large application clusters, such as largeExchange DAG implementations, also benefit from splitting the data protection work across multiple jobs where possible.



22

Avoidmixing very large servers with small servers. It is better to group servers of similar size where all such parallel datatransfers start and end around the same time, see Servers and Data Grouping on page 18. Mixing different sized serversstill works, however the NetApp snapshot does not take place until all of the servers in a backup volume have completedtheir transfer. Thus, the smaller servers that finished sooner may wait for a very long time before their data is trulyprotected in a snapshot.

Some agent environments may require data transfer throttling to helpmanage job or server bandwidth utilization.Consider the following where bandwidthmanagement is needed:

ForWindows 2008 nodes and later, node throttling is best handled with a local group policy. This applies a QoSsetting to the node that can be global to the system or limited to DPX related transfer. For additional information, readknowledge base article 45991.

DPX Block backup job Set Source Options dialog offers a throttling parameter that caps themaximum data transferrate of each task in a DPX job. This management console parameter applies to each individual task; the totalbandwidth consumed is limited to the number of tasks multiplied by the per-task limit.

There is a new node throttling option for the nibbler process that can be used on nodes withWin 2003 and later andwhere local group policies are not desired. The options utilizeWindows QoS features on a per-process level. For moreinformation, read knowledge base article 46361.

NetAppOSSV agents also offer bandwidthmanagement parameters. Consult the Overview section of the NetAppTR-3466Open Systems SnapVault (OSSV) Best Practices Guide. Modification of several parameters in the sourceservers snapvault.cfg, wan.cfg, and server.cfg files is necessary. Note that you cannot mix throttlingmethodsabove, youmust select only one for a job or a node. Attempting tomix methods may result in erratic or unpredictabletransfer.

DPX provides agents which are designed specifically to handle backup and restore integration for key applications. Theblock-level agent generally covers application support for Active Directory, Exchange, SQL Server, SharePoint, andOracle. The DPX agent interacts directly with VSS onWindows platforms to quiesce applications for backup and tocoordinate restore. Linux agent integrates with the application and LVM2 to coordinate backup and facilitate restore.

NDMP Tape Backup

NDMP tape jobs share the pool of NDMP kernel threads and also have their own specific NDMP data transfer limitations.See NetApp Storage System Guidelines on page 7. When laying out NDMP tape and agent-based SnapVault jobs,create the jobs and populate the schedule such that the total number of NDMP kernel threads is not exhausted at anygiven time and the total number of concurrent NDMP tape operations is also not exceeded.

Note that NDMP backup has some specific limitations in regards to encryption and tapemigration. NDMP encryptionrequires that the tape drive be connected to the NetApp controller. Tapemigration can only be performed via the use of aDPX device server. Additionally, themigration of NDMP tape data has some limitations on the type of NDMPdata, forboth SMTape or Dump.

The following table describes the limitations of NDMP datamigration:



23

BackupType

EncryptedBackup

Migration Supported Migration with Encryption

Dump Y Physically connect tape libraries to controller nodes anddevice services and then install on each device server.

N

Dump N Use automatic setup for multiple device servers and tapelibraries using the Device Configuration Wizard.Note: The Device Configuration Wizard performs all thesteps described in the Manual process and acts on theentire Enterprise.

Y

SMTape Y N N

SMTape N Y Y

For a comprehensive review of NDMP backup and restore covering primary data, secondary data, and NetApp Cluster,see NDMP Backup and Restore in the ReferenceGuide. Note that NDMP dump of DPX Block level client data, includingagent and agentless backup, does not support incremental or differential backup due to how Data ONTAP tracks files bydate for backup inclusion. All NDMP dump backups for DPX client datamust be defined as full backups. DPX support forSMTape at this writing only supports full volume backup.

NetApp OSSV Agent and SnapVault Controller Management

NetAppOSSV agent backup is very similar to DPX Block backup with respect to job tasks and NDMP kernel threadconsumption. SnapVault controller to controller backupmanagement also consumes NDMP kernel threads. By abiding tothe NDMP kernel thread limit suggestions above, OSSV and SnapVault management jobs can be freely intermixed withother jobs.

In addition to NDMP kernel threads, these backups also consume resources as described previously within rsm show_limits. Use caution when attempting tomix these jobs with other NetApp operations that are not directly controlled byDPX; for example SnapMirror, SnapVault, Volume copy, etc., outside of the DPX management console.

It is strongly recommended that the hostservers have the latest version of OSSV agent be installed. V3.0.1P3 is theminimum requirement, but newer versions for specific platforms may exist. If the version you have access to on theNetApp support portal is not this minimum version or later, youmay need to use the search facility at the bottom of thesoftware download window to find the latest version for your platform.

If you cannot locate the appropriate OSSV software version, contact NetApp technical support, specify the platform youare interested in, and indicate that you need access to the latest version; v3.0.1P3 or later.

Agentless Backup

The NetApp storage system constraints do not apply to agentless backups. Agentless backups work via iSCSI and arenot subject to NDMP kernel threads or resource contention between SnapVault and SnapMirror. Agentless is bound bycontroller limits relating to LUNs and iSCSI. However, iSCSI is inherently limited by the bandwidth allotted to it viaavailable network controllers. Although agentless and iSCSI are bound by limits that differ from SnapVault, the NetApp



24

snapshot still does not take place until after all tasks in the job complete, thus it is still prudent to group servers of similarsize where possible.

With any disk and network transfer architecture design, the available resources and limiting factors are different for eachenvironment. Themaster server, backup job, and proxy suggestions provided below are generalities that can be extendedif system resource and bandwidth are available for heavier workloads. Start with the suggested figures and incrementallyincrease them as needed until a balance between resources and backup speed is achieved. For agentless workloads, thegeneral resource limits to consider are ESX server performance for VM I/O activity, VM snapshotting, Network File Copy(NFC) access, VMDK read performance, and network interface performance. Proxies are generally limited by the ESXNFC operations, storage performance, and network bandwidth. The NetApp storage is generally limited by iSCSInetwork transfer performance and the other familiar constraints onmemory, CPU, and controller limits.

VMwares vSphere 5 documentation contains many references describing thesemethods and their associated limits.See vSphere 5 Documentation Center for various VM backup transport methods and ESX/ESXi NFC connection limits.

Note: DPX only supports backup of VMs controlled by a vCenter, so only the through vCenter Server figures apply.

At the start of each job, there is a resolution phase that takes place. The job resolution connects to each vCenter to queryfor the VMs to backup. This query can take some time depending on the number of VMware resources such as VMs,resource groups, etc., that the environment contains. The VMs are thenmatched up against the capabilities of theproxies defined in the job. The SAN method is always preferred. Hot-addmethod is the next preference with NBDnetwork transfer being the lowest. Proxies hosted on virtual machines automatically discover the appropriate method touse; this is controlled by the ESX server. There are DPX proxy parameters that can be used to prevent use of a particularmethod, for example, prevent Hot-add use in favor of NBD, however these settings are generally discouraged. It is betterto use the highest performance proxy method the environment has available and/or select specific proxy nodes in a jobdefinition to achieve the desired results. Note that there are no options to promote a proxy to be recognized as a specifictype that the ESX API does not recognize. For additional information on proxy server optional settings, contact CatalogicSoftware Data Protection Technical Support.

The agentless proxy node function is divided into three different access methods:

SAN: Applies only to proxy nodes with VMFS LUN attached storage. This is usually a physical node with Fiber HBAthat can attach to VMFS SAN storage, however iSCSI can also be used. The proxy coordinates VM snapshots,reads the VMDK data directly from the LUN and transfers the backup to the NetApp controller via iSCSI.

Hot-add: Applies to virtual proxy nodes only. The virtual proxy nodemust exist on an ESX server that has access toall of the datastores for the VMs you intend to back up. The proxy nodemust reside on a datastore that has a blockingsize similar to the VM datastores to be accessed for backup. The proxy coordinates the VM snapshot, directlyattaches to and read the VMs VMDK data and transmits this to the NetApp controller via iSCSI.

NBD: Applies to physical or virtual proxy nodes. All data transfers take place across the network. Consequently, thismethod is generally the lowest performing option for backup. The proxy connects to the vCenter, requests a snapshotof the VM, connects to the ESX server, reads the VMDK data over the network, and retransmits this data to theNetApp controller via iSCSI. The VMDK data is read using a Network File Copy protocol and each version of ESXhas specific resource limits on how many NFC connections it supports.

All proxy servers are required to have iSCSI access to the NetApp controller. The proxy must also have access to thevCenter node. NBD proxies must have access to each ESX server. For restore operations, the ESX server must permitthe ESX iSCSI software initiator to have access to the NetApp iSCSI storage interface.



25

Note that DPX attempts tomatch as many nodes to a preferred proxy as possible. The proxy type and order preference isimportant to remember to avoid unexpected issues. For example, a job that contains 10 NBD proxies and one SAN proxywhere all VM node storage can be accessed via SAN, all prefer the SAN proxy and ignore the other 10.

Before reading data, all VMs being backed up perform a VMware snapshot. For VMware supportedWindows guests withVMware Tools service installed, the snapshot process invokes VSS to quiesce VSS aware applications. The snapshotprocess may take time or possibly fail if the VM is under heavy I/O load. VM snapshots impose processing overhead onan ESX server, so it is important to consider how many snapshots an ESX server concurrently perform within a shortperiod of time. The backup job schedule should correspond to a time when the VMs have a lower level of activity and theoverall ESX server is under light load.

For reading data, the following guidelines are offered:

SAN is limited by the number of simultaneous read operations made to the storage. Even for larger Enterprises wherea proxy might open hundreds of VMDK files, read operations are not expected to impose any specific SAN fabricissues. The backup operation is generally limited by themaximum number of open file handles the proxy OS supportsand the available bandwidth for reading data. Fiber Channel SAN is preferred here, however iSCSI can also be used.Note that the data is written to the NetApp controller via iSCSI, thus if there is only one interface available for iSCSItraffic, the read and write operations share the same bandwidth which limits the performance gains the SAN methodwas designed to provide.

Hot-add is similar to SAN in that read operations are generally limited by the number of open files permitted byVMware and the bandwidth available on the ESX server to read data from the datastores. The VM hosting the proxymust be on a datastore of identical block size to the VM datastores being backed up. The VM also has other VMwareconfiguration requirements for Hot-add including that all VM disks participating in backup are required to be SCSI andnot IDE. The proxy VMmust also have permission to read the datastores of interest and the VM itself must reside inthe same datacenter as the virtual machines it is backing up.

NBD has tighter limits for concurrent data access to vCenter and ESX servers. The ESX server has version specificNFC connection limits that cannot be exceeded. The link provided above describes the specific limits for each ESXversion. NBD proxy must have network access to connect directly to both the vCenter server and to the ESX serverswhere the VMs exist.

For writing data, the primary resource limitations is themaximum network bandwidth between the proxy and the NetApp.Write performance is also limited by overall NetApp Controller performance such as I/O load, CPU use, etc. One proxyattempts to write multiple parallel streams andmultiple proxies may operate in parallel against one NetApp. Since alliSCSI connections attach to one target, it is likely that all such transfers move data across the sameNetApp controllernetwork interface. It is recommended to configure and test which NetApp interface is receiving the inbound iSCSI backuptraffic. NetApp can be configured with single port management interfaces, isolated/dedicated network interfaces, andinterfaces that bondmultiple links into one logical network connection. Ensure that the data protection iSCSI traffic usesthe interfaces you intend.

Consult your NetApp documentation at NetApp Support for the specific controller limits pertaining to connected iSCSIhosts and LUNs. LUN maximums are shared between iSCSI and Fiber channel. Lower end controllers max out at 128hosts per controller and 1024 LUNs. Larger controllers can handle up to 512 iSCSI hosts and 2048 LUNs. Each proxythat connects to a NetApp controller for iSCSI transfer counts as a host and each VMDK backup in progress correspondsto one LUN on the controller.

Avoid using NIC bonding on physical proxy nodes. In the field, NIC bonding, at times, has shown to be unreliableespecially where the expectation is to increase network throughput. Also note that NetApp systems can have troublewith NIC bonded hosts when the ip.fastpath option is enabled; see Troubleshooting and Known issues on page 30.



26

Although job and task concurrency are theoretically similar to agent-based backup, the pattern of load on themasterserver is different. Agentless jobs query source vCenters for the entire VM enterprise at the start of each job. For smaller,less complex Enterprises of less than 100 nodes, this should not be an issue. For larger andmore complex Enterprises,the environment discovery can take some time and processing to accomplish. For this reason, it is recommended toavoid running large numbers of agentless jobs to start at the same time. Rather, it is a good policy to determine how longthe job resolution phase will take and space job starts apart in the schedule by this time. If job spacing for all jobs is notpossible, limit the number of simultaneous job starts to five or less and observe the load imposed on the vCenter serverandmaster server when simultaneous jobs begin. Based on performance, adjust the simultaneous job starts as needed.The number of concurrent jobs running at any one time can remain the same as the agent-based figures suggest.

From a host perspective, consider how busy the VMs might be at the time of backup and how much data they are likely totransfer. Consider constructing smaller or even single node backup jobs for servers that will be I/O intense or generallyhave a lot of data to regularly transfer. For smaller nodes and nodes with little or no data to transmit, consider batchingthese into larger jobs. VDI are generally very lightweight and usually good candidates for batching up into larger jobs.

Creating backup jobs can bemoremanageable and flexible if the VMs are organized into higher level containers such asResource Groups; each high level container then gets a backup job to protect it. Using VMware containers such asresource groups also adds flexibility as the backup job dynamically discover all VMs in the group; all VM additions anddeletion are accommodated for.

For any agentless deployment, note the overall number of concurrent parallel iSCSI transfers a NetApp controller ishosting. Although the LUN and host limits for NetApp are fairly high, the published NetApp figures do not account for all ofthese connections participating in concurrent bulk inbound transfer of data. When architecting a job, strive to limitconcurrent iSCSI transfers to about 50 and adjust this figure as needed based on the controller size and performanceobserved on the controller.

For environments where NBD is the only option, consider a conservative approach to job creation. Each ESX servermust not exceed its maximum number of parallel NFC connections or the tasks in the job fail. Recall that each VM is atask, each task utilizes one thread per VMDK, each VMDK consumes a VMware NFC stream, and each VMDK uses aniSCSI LUN to the NetApp for data transfer. Do not create jobs where the sum of all servers and protected devicesexceeds the ESX server NFC limit. To generalize this case, the recommendation is for five to ten servers per job ofaverage size one to three VMDK files and to avoid job concurrency to a common ESX server. Evaluate the job tasksneeded against the NFC resource limits and adjust your job accordingly. Since the limiting factor here is ESX NFCconnections, job concurrency can be configured if jobs running in parallel back up VMs from different ESX server.

The Hot-add transfer resource limitation is higher, but likely not as high as SAN. To generalize this case, limit the numberof concurrent backups against a single ESX server to 25 concurrent servers of average size being one to three VMDKdevices. This may be accomplished in a single job or multiple jobs having their job startup spread out as indicated above.

SAN is generally considered the highest performing solution. Your resource limitations are likely to relate to the availablebandwidth the proxies can read and the network bandwidth the NetApp controller can accept. Using the NetApp iSCSIrecommendation reviewed previously, limit concurrent iSCSI VMDK transfers for a single NetApp controller to 50 hosts.Adjust the job based on the resources consumed and data transfer performance. With a higher number of parallel nodes ina job, note the load imposed upon the ESX servers during the VM snapshot phase and adjust accordingly.

The number of proxies you select for backup jobs depend on the available data paths and backupmethods available tothem. For amedium to large environment, it is strongly advised to plan out your proxy use and avoid selecting all proxieson your jobs. Large Enterprises with large numbers of proxies may consume significant time in the resolution phasereading the VMs in the environment andmatching up against proxy capabilities.



27

NBD has the lowest resource limits. If the NBD proxies are virtual, limit the job to two to four proxies; this is meant toavoid excessive network interface traffic for the ESX server. If the NBD nodes are physical, limit the proxy number to 10or less, not to exceed the number of nodes being backed up in the job. The number of proxies can be adjusted; howevergiven the limits applied to NFC transfers, increasing the number of proxies does not necessarily improve performance.

Hot-add has higher performance but does consume resources from an ESX server. Limit the number of hot-add proxies ina job to 10 and adjust this as needed based on ESX load and NetApp iSCSI performance. If your ESX datastores havevarying block sizes, limit proxy selection to only include proxies hosted on VMFS datastores with identical block size.

SAN is the highest performingmethod. On a relatively fast SAN and high performing NetApp iSCSI interface, start with15 proxies for large backup jobs and adjust this as resources permit.

If your ESX storage is hosted on NFS, the NBD proxy method is the only option available.

In general,it is inefficient to configuremore proxies in a job than there are nodes in the job. Proxies can be used in parallelfor multiple concurrent jobs, however note that the proxy resources and performance are shared across all requests. So,it may be a better use of resources to define jobs that do not overlap proxy use for concurrent job runs.

Agentless backup of Windows VSS enabled applications may be possible but should be fully validated with theapplication vendor and tested (backup and restore) prior to production deployment. VMwaremakes specific requirementson the VM to support VSS backup; most notably IDE disks are not supported, dynamic disks are not supported, andSCSI controllers must have an equal number of available IDs to perform backup. Some applica

Date post:	10-Sep-2015
Category:	Documents
Upload:	avinashdilipkavde
View:	218 times
Download:	2 times

BestPractices.pdf

Documents