Education Services Home, | Dell EMC Education …...Brocade Application platforms - the standalone...

EMC Proven Professional Knowledge Sharing 2010

Design Considerations for Block Based Storage Virtualization Applications Venugopal Reddy

Venugopal Reddy Global Solutions Architect EMC [email protected]

2010 EMC Proven Professional Knowledge Sharing 2

Table of Contents

Introduction ...................................................................................................................................... 3 Virtualization – Virtual Entities ......................................................................................................... 4

Virtual Target ............................................................................................................................... 5 Virtual Initiator ............................................................................................................................. 5 Virtual LUN .................................................................................................................................. 5 ITL ............................................................................................................................................... 5

Virtualization hardware .................................................................................................................... 6 Applications ..................................................................................................................................... 7

Recoverpoint ............................................................................................................................... 7 SANTAP Implementation ........................................................................................................ 8 Brocade Implementation ......................................................................................................... 9

Invista ........................................................................................................................................ 10 Storage Encryption .................................................................................................................... 10

Cisco Storage Media Encryption .......................................................................................... 11 Brocade Disk and Tape Encryption ...................................................................................... 12

Design Considerations .................................................................................................................. 12 Recoverpoint ............................................................................................................................. 12

SAN design ........................................................................................................................... 14 Fabric Splitter Sizing ............................................................................................................. 18 Journal volume design .......................................................................................................... 19 Sizing the RPAs (Appliances) ............................................................................................... 20 Sizing the WAN Pipe ............................................................................................................. 21 Databases in Consistency Groups ....................................................................................... 21 Recoverpoint over Invista ..................................................................................................... 22 Splitter configuration limits .................................................................................................... 23 Storage performance with RecoverPoint .............................................................................. 23 SANTAP Performance .......................................................................................................... 25 FAP performance .................................................................................................................. 26 Network latency and BW requirements ................................................................................ 26 Statistics and Bottlenecks ..................................................................................................... 27

Storage Encryption with Cisco SME ......................................................................................... 29 Storage Encryption with Brocade Encryption Services ............................................................. 31

Future Directions ........................................................................................................................... 32 Conclusion ..................................................................................................................................... 32 References .................................................................................................................................... 33 Disclaimer: The views, processes or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies


Introduction

The tumultuous global events during recent years accentuate the need for organizations

to optimize their information technology budgets by increasing resource utilization,

reducing infrastructure costs, and accelerating deployment cycles. Simultaneously,

regulatory framework and compliance requirements are imposing new demands on the

ways we store, access, and manage data.

“Network based Storage Virtualization,” the term describing virtualization implemented in

Storage Area Networks (SANs), is an enabling technology that has spawned several

innovative applications that are beginning to form a framework to optimize information

life cycle management in data centers and to solve the challenges mentioned above.

These innovations include the ability to create and manage dynamic storage resource

pools, perform seamless data migrations, encrypt stored data, and enable long distance

replication and server less backups.

The core functionality of data virtualization in a SAN is its ability to map, redirect, or

make a copy of the data within the intelligent SAN switch. The block level I/O data that

can be mapped, redirected, or copied fosters unprecedented flexibility and facilitates

some of these innovations. Products such as EMC’s Recoverpoint and Invista, and

Storage Encryption solutions from Brocade and Cisco are some of these block based

storage virtualization applications. They implement virtualization in the “network” to

achieve innovative uses of data without impacting I/O performance. The intelligent

appliances and SAN switches implementing the virtualization layer provide high

performance and scalability to match the needs of enterprise class application

environments.

Network based storage virtualization, while enabling a number of innovative applications,

poses a number of important design and deployment challenges. The implementation of

the technology requires specialized ‘virtualizing modules’ in the SAN switches that are

highly vendor specific and hardware dependent. The inherent complexities and the

interdisciplinary nature of the technologies also call for special considerations when

designing, deploying and maintaining these new generation applications.


Successful deployment of network based storage virtualization applications requires

meticulous planning and design, and an intimate understanding of the intelligence

implemented in the SAN layer. This article provides a practitioner’s perspective on

implementing these solutions and aims to:

1. Extend the understanding of the mechanisms of various virtual entities that exist

in network based virtualization

2. Provide insight into a broad range of applications that benefit from virtualization

in the SANs

3. Share recommendations and best practices for successful virtualized

infrastructure planning, design and deployment considering scalability, availability

and reliability

4. Describe techniques to maintain the virtualized infrastructure while sustaining

performance

Virtualization – Virtual Entities

The principle behind virtualization is to capture the I/O in flight from the host to or from

the target within the fabric. You can perform various operations once the I/O is

intercepted, for example make a copy, redirect the I/O, encrypt the data etc.

Virtualization hardware within the fabric creates virtual entities by implementing this

mechanism of intercepting the I/O. There are four main components of virtual entities:

• Virtual Targets (VT)

• Virtual Initiators (VI)

• Virtual LUNs (vLUN)

• Initiator-Target-LUN (ITL) nexus


Virtual Target

VT is a virtual entity created in the fabric that is presented to the real host as a target. In

some cases, the VT can assume the identity (WWN) of the real storage target (in case of

SANTAP on Cisco switches) or it can be a different identity. The physical host performs

I/O on these virtual targets; virtualization modules intercept the I/O.

Virtual Initiator

VI is a virtual entity created in the fabric that is presented to the physical target as a host

HBA. In some cases, the VI can assume the identity (WWN) of the real host HBA (in

case of SANTAP on Cisco switches) or it can be a different identity. The Virtual Initiator

performs I/O on the physical storage targets; the virtualization module acts as the

‘intermediary’ between the physical host initiator (HI) and Physical Target (PT).

Virtual LUN

vLUN is a virtual entity created on the VT. The Physical Host Initiator performs its I/O on

the vLUN. The virtualization hardware intercepts the I/O operation to the vLUN and is

then redirects it to the physical target (through the VI) after performing necessary

operations (copy, map, encrypt etc). A vLUN can be created from one or more physical

LUNs from disparate arrays. Features such as striping, mirroring, and concatenation are

implemented at this layer.

ITL

Front end ITL is a virtual entity that makes up a nexus between the physical host HBA,

Virtual Target, and the virtual LUN. Front end ITL counts and their placement in the

fabric play an important role in optimizing performance and scalability of the virtualized

application. ITLs define the limits of the virtualized hardware.

A backend ITL is a nexus between the Virtual Initiator, Physical Target, and physical

LUN. The virtualization framework in the fabric manages the mapping between a FITL

and BITL.


Virtualization hardware

In the current market, virtualization hardware comes from two major fibre channel switch

equipment vendors, Brocade and Cisco. The virtualization modules are available as

blade modules that can be inserted into directors and expandable switches, or in the

form of standalone switches.

Brocade hardware Virtualization hardware from Brocade comes in the following forms:

1) AP-7600B Storage Application Services switch

2) PB-48k-AP4-18 Storage Application Services blade

3) ES5832 Encryption switch

4) PB-DCX-16EB Encryption Blade

Brocade Application platforms - the standalone AP-7600B switch or PB-48k-AP4-18

blade for the Connectrix ED-48000B, or ED-DCX-B director implement the Storage

Application Services (SAS) framework in the SAN using the Fabric Application

Programming (FAP) layer. Brocade's SAS API implemented on these specialized

modules provides hardware implemented primitives like mirroring, copying, extent maps,

striping, concatenation, copy on write, resync, dirty region logging etc. These features

enable the virtualization applications to implement features such as Mirroring,

Replication, snapshots, Migration, Backup etc.

ES-5832B and PB-DCX-16EB implement encryption services for data-at-rest disk array

LUNs using IEEE standard p1619 Advanced Encryption Standard (AES) 256 bit

algorithms. The I/O redirection using VIs and VTs enable the data compression by six

encryption/compression FPGAs in the blade or switch.

Cisco Hardware Cisco’s virtualization hardware comes in the following forms:

1) Storage Services Module (SSM)

2) MDS-PBFI-1804 (18/4 port) Multi Services Module

3) 9222i MSM switch


Virtualization functionality is enabled through the Storage Services Module (SSM) line

card that you can insert into any modular switch within the Cisco MDS 9000 family. Each

SSM contains a virtualization daughter card that hosts 4 virtualization engines. Each

card has 2 Data path processors (DPPs) with Storage virtualization, volume

management, reliable replication writes (Cisco SANTAP), SCSI Flow services, Fibre

Channel Write Acceleration, and Network-Accelerated Serverless Backup (NASB).

Storage Media Encryption (SME), or encryption of data on the Tape Libraries, is

facilitated by Cisco’s encryption engines integrated on the Cisco MDS 9000 18/4-Port

Multiservice Module (MSM), MDS-PBFI-1804 and Cisco MDS 9222i Multiservice Module

Switches. SAN OS’s FC-Redirect creates VIs and VTs to enable data encryption with

any fabric reconfiguration. These Multiservice Modules and switches also support Cisco

SANTAP.

Applications

This section briefly describes the applications that use virtualization features in the

intelligent switches.

Recoverpoint

Recoverpoint is a distance replication and disaster recovery solution that runs on out-of-

band appliances protecting locally and remotely. The salient features of the solution are:

1) Heterogeneous storage replication for tiered disaster recovery

2) Innovative data Journaling and Application Data consistency

3) Support of FC and IP replication

Recoverpoint uses splitter technology for replication where a copy of the host writes is

sent to the out of band intelligent appliances. The splitters can be host, fabric, or array

based. This article focuses on fabric. A Cisco fabric splitter uses SANTAP technology on

its SSM and MSM modules, and a Brocade fabric based splitter uses AP7600B or PB-

48k-AP4-18 as splitting engine.


SANTAP Implementation

SANTAP implemented on Cisco SSM / MSM modules ‘taps’ a copy of the I/O to be sent

to the Recoverpoint appliance. SANTAP in the SSM / MSM module is responsible for

creating the virtual entities in the fabric. SANTAP currently utilizes two VSANs: a

Frontend VSAN where the Host Initiators and the Virtual targets reside, and a Back end

VSAN where the physical targets, VIs and appliance HBAs are situated. In addition to

the host VIs, SANTAP creates a group of virtual entities called Control Virtual Targets

(CVT).

SANTAP Communication CVT is the portal through which the appliance (RPA) communicates with SANTAP. In an

SSM module, when a CVT is created in the back-end VSAN, 10 virtual WWNs are

created. Of these, 8 Virtual Initiators (VI) represent 8 Data Path Processors (DPPs,

ASICS on the module), the remaining VI represents the Control Path processor (CPP)

Initiator, and the lone Virtual Target (VT) created represents the CPP Target.

Communications between the SANTAP service and the RPA fit into three classes:

1. Control messages from the RPA to the SANTAP service

2. Control messages from the SANTAP service to the RPA

3. Data traffic (reliable writes) mirrored from a host issuing a write to a storage

array.

The first two classes of communication are messages/notifications between the devices

to control various aspects of the SANTAP service. Both the SANTAP service (CPP VI

and CPP VT) and the RPA appear as both a standard SCSI initiator and target. SCSI

Write operations are used between the SANTAP service and the RPA to convey control

messages.

The appliance discovers SANTAP service when it logs into the FC fabric and queries the

name server (FCNS). Once discovered, the RPA issues Port Login (PLOGI) and

Process Login (PRLI) commands, followed by the standard SCSI device-discovery

process. The SANTAP service (CPP target) responds to a SCSI Inquiry with Vendor

Information set to "CISCO MDS" and Product Identification set to "CISCO SANTAP

CVT."


The Cisco SANTAP will initially issue a pending write log (PWL) to the RPA when

mirroring a host write to the appliance. The PWL is a short SCSI command (several

bytes) consisting only of the write operation’s Metadata (LBA number). Once the RPA

acknowledges the PWL, the Cisco SANTAP service will simultaneously perform a write

I/O to both the RPA and target device (storage array). The RPA will then acknowledge

the write I/O. Finally, the pending write log entry is cleared with another short PWL

Command. Communications and Operations on MSM modules and switches are similar.

Brocade Implementation

RecoverPoint on AP7600B and PB-48k-AP4-18 can be deployed in two modes:

• Multi VI mode

• Frame redirect Mode

Multi VI mode In this mode, the HI are zoned with the Virtual target (created on the switch, when HI is

bound to a VI) and the VI is zoned with the PT. Because of this, you need to mask VIs

on the PT and reorganize the zones. When the host sends an I/O to the VT, the I/O is

intercepted by the DPP on the switch. VI then sends one copy to the Physical Target

and the other to the appliance.

Frame Redirect Frame redirect ensures that a copy of the I/O can be sent to the Recoverpoint appliance.

The feature uses a combination of Redirect zones and Name Server changes to map

real device WWNs to the FCIDs of the virtual entities. This allows redirecting a flow

between a host and target to the appliance without reconfiguring them. When you

perform binding between an HI and a PT, a new redirect (RD) zone is created. The RD

zones have a prefix of “lsan_” and will contain the HI, PT, VI and VT.

The RD Zone is part of the defined zone configuration and will not appear in the effective

zone configuration. When you create the first RD zone (using the bind_host_initiators

command on the RPA), two additional zone objects are created: A base zone

"red_______base" and a "r_e_d_i_r_c__fg" zone configuration . These additional zone

objects are required by the Frame Redirect implementation and must remain on the

switch as long as other RD zones are defined.


Invista

EMC Invista is a network-based storage virtualization solution that utilizes intelligent

Fibre Channel switches to implement centralized storage virtualization services that

span heterogeneous storage systems. Using the virtualization modules in the FC

switches, Invista provides services such as volume management, mirroring, clones, non-

disruptive data migration across heterogeneous storage systems with an easy to

manage centralized management user interface.

Virtual volumes created out of one or more storage systems are presented to the host on

the virtual targets created by Invista. Similarly, Invista VIs perform I/O on the physical

storage systems on behalf of the hosts. I/O remapping occurs in the data path for fast

path commands (read6/write6 and read10/write10) at hardware speeds, with minimal

additional latency. Slow-path commands on the virtual volumes (such as inquiry) are

serviced by the highly available and redundantly configured Invista appliances that

maintain the metadata of the virtualized storage on the highly available LUN in the SAN.

Brocade Implementation Invista creates 16 VIs and 16VTs on the virtualizing modules or switches. The VIs are

zoned with PTs and VTs are zoned with HIs. The VIs and VTs are equally distributed

among the two DPPs on the DPC.

Cisco Implementation Invista create 9 VIs (one per each DPP and one Control VI) and 32 VTs. The VTs are

zoned to HIs in front end VSANs, and the VIs are zoned to storage targets. The SAL

agent installed on the FC switches communicates with Invista appliances to configure

the intelligent services modules.

Storage Encryption Storage Encryption at the Fabric layer is a relatively new application of Block level

storage Virtualization. Key advantages of the Fabric Level Storage Encryption include:

• The ability to encrypt data at wire speeds

• Central management of Encryption resources

• Simplified, non-disruptive installation and configuration


These encryption solutions are ideal for cases such as:

• Highly sensitive data on the Disk or Tape that needs protection (Data-at-rest).

• Secure data backups for offsite tape storage and long-term archiving

• Centralized management of heterogeneous disk and tape storage environments

• Secure replication of Encrypted data backups to remote facilities

• Implementing Clusters of Encryption Blades or Switches by scaling data center

encryption services

Cisco Storage Media Encryption Cisco MDS Storage Media Encryption (SME) service enables encryption of data stored

on tape. This protects the backed up data on the tapes from unauthorized access or

tape loss. SME creates VIs and VTs. An I/O sent to VT is intercepted, encrypted, and

written to the tape by the MSM module through the VI. SME is a transparent fabric

service and the MSM module can be deployed anywhere in the fabric. It does not need

to be directly in the data path; hence no cabling or configuration changes are required.

Once an SME is enabled, traffic that is being encrypted is redirected to the appropriate

MSM in the fabric using the FC-Redirect service.

FC-Redirect VIs and VTs are created and placed in the default zone when SME is enabled. When an

HI-PT nexus is configured on the SME, a LOGO (Logoff) is sent to the host to abort any

existing sessions and exchanges to the physical target that may be in transit. The host

then performs another PLOGI, but the MSM module intercepts it and redirects it to VT.

The VI corresponding to the VT then performs a PLOGI on behalf of the Host, and

continues through the PRLI and Discovery sequence. Once complete, the VT

acknowledges the host’s PLOGI request, and accepts the host’s PRLI request. Then,

the VT will intercept the host I/O sent to the PT, encrypt by the encryption module that is

forwarded to the VI that sends the encrypted data to the PT. This is transparent to the

HI and PT.


Brocade Disk and Tape Encryption Similar to the frame redirect option in Recoverpoint deployment, the Brocade encryption

engine uses RD zones for encryption. The HI gets the FCID of the VT when it queries for

the FCID of the PT and PT gets the FCID of the VI when it queries for the FCID of the

HI. The I/O intercepted by the VT is encrypted by the encryption engine and is written to

the PT by the VT.

The reverse happens when data is read from the PT. In addition, there is another entity

named CryptoTarget Container that binds all these virtual entities. A CryptoTarget

Container holds configuration information for a single target including:

Target Port, Initiators, and LUN settings

Interfaces between the encryption engine and targets

the initiators that access storage devices

Design Considerations

We will discuss a few design considerations when deploying the above mentioned

applications using virtualization in the fabrics. These considerations stem from the

experience of deploying these applications and may be considered additional to the

product documentation provided by vendors.

Recoverpoint

Consider four major components when designing a replication solution using

RecoverPoint.

• RecoverPoint Appliances (RPA) — RecoverPoint appliances are Linux based

boxes and are instrumental for replication activities. They accept “split” data and,

based on policy settings, apply bandwidth reduction techniques, ensure write

order fidelity, guarantee data consistency, and route the data to the appropriate

destination volume, either via IP or Fibre Channel. The RPA also acts as the sole

management interface to the RecoverPoint installation.


• RecoverPoint Journal Volumes – Journal volumes are dedicated LUNs on both

Production and Target sides used to stage small aperture, incremental snapshots

of host data. As the personality of production and target can change during

failover and failback scenarios, Journal volumes are required on all sides of

Replication (production, CDP and CRR).

• Intelligent Fabric Splitter — The RecoverPoint splitter driver is a use-specific,

small footprint software that enables continuous data protection (CDP) and

continuous remote replication (CRR). The splitter driver can be loaded on a host,

on an Intelligent Blade within a SAN director, or on a CLARiiON® array. The

intelligent fabric splitter is the intelligent-switch hardware that contains

specialized port-level processors (ASICs) to perform virtualization operations on

IO at line speed. As mentioned in the previous sections, this functionality is

available from two vendors: Brocade and Cisco. Brocade’s intelligent switch, the

AP-7600, can be linked though ISLs to a new or existing SAN. Cisco’s intelligent

blades are the Storage Services Module (SSM) and the MultiServices Module

(MSM) that can be installed in MDS 9513, 9509, 9506, 9216i, 9216A, or 9222i.

• Remote Replication — Two RecoverPoint Appliance (RPA) clusters can be

connected via TCP/IP or FC to perform replication to a remote location. RPA

clusters connected via TCP/IP for remote communication will transfer “split” data

via IP to the remote cluster. The target cluster’s distance from the source is only

limited by the physical limitations of TCP/IP. RPA clusters can also be connected

remotely via Fibre Channel. They can reside on the same fabric or on different

fabrics, as long as the two clusters can be zoned together. The target cluster’s

distance from the source is again only limited by FC’s physical limitations. RPA

clusters can support distance extension hardware (i.e., DWDM) to extend the

distance between clusters.


SAN design

Deciding where to place the SSM modules or AP-7600 switch / PB-48k-AP4-18 module

in the SAN is one of the most common design considerations in Recoverpoint. You also

have to decide on the location of Recoverpoint appliances on the SAN.

Here are guidelines for placing the Intelligent switch modules and SSM modules:

1) As a best practice, the intelligent modules/switches should be placed nearest to

the storage ports. In Core-Edge fabrics, the intelligent modules/switches should

be connected on the Core Switches. Similarly in Host-edge-Core-Storage-edge

fabrics, the most logical place would be on the Storage edge fabrics. However, if

the modules will be used by multiple storage ports on different storage edge

switches, placing the intelligent modules on the core switches is ideal.

2) As a best practice, the Recoverpoint appliances should be placed as close to the

intelligent modules as possible. In AP-7600B deployments, it is preferable to

place the appliance ports on the switch itself. Similarly in MDS, MSM modules

and switches appliance ports should be placed on the module/switch. However,

on MDS switches with SSM module, the appliance should be connected to a

regular line card on the switch on non-shared FC ports. Further, the appliance

ports should not be connected to the ports on SSM modules.

3) Inserting an SSM module in a MDS9513 Director reduces the director port count

to 255. For this reason, placing SSM modules in a 9513 director is not

recommended where scalability of the ports in MDS Directors is a concern.

Complex SAN topologies

A complex topology with numerous switches in the fabrics will most likely be in one of

two designs, each discussed in the next sections:

• Core/edge (hosts on an edge switch and storage on a core switch)

• Edge/core/edge (hosts and storage on an edge switch but on different tiers)


Core/Edge configurations

In this model, you connect hosts to the edge tier switches and storage to the core tier

switch(es). The core tier is the centralized location and connection point for all physical

storage in this model. All IO between the host and storage must flow over the ISLs

between the edge and core. It is a one hop logical infrastructure. (ISL hop is the link

between two switches).

MDS Configurations

The SSM/MSM blade is located in the core. To minimize latency and increase fabric

efficiency, co-locate the SSM/MSM blade with the storage that it is virtualizing, just as

you would do in a Layer 2 SAN.

In these deployments:

ISLs between the switches do not have to be connected to the SSM/MSM blade

Hosts do not have to be connected to the SSM/MSM blade

Storage should not be connected to the SSM blade

Internal routing between blades in the chassis is not considered an ISL hop. Since the

SSM/MSM is located inside the MDS switch, there is additional latency with the

virtualization ASICs. However, there is no protocol overhead associated with routing

between blades in a chassis.

If you are using a switch with an embedded MSM blade (MDS 9222i switch) for

virtualization, it should be ISLed to the Core tier switch. The number of ISLs used should

meet the amount of virtualization traffic.

Brocade Configurations

In these configurations, you can locate an AP4-18 blade on the core tier director or you

can use an external AP-7600B switch to ISL to the core switch. The considerations for

locating the blade are similar to the MSM blade in the MDS configuration. The AP-7600B

is an external intelligent switch; it must be linked through ISLs to the core switch.

Physical placement of the RecoverPoint Appliances can be anywhere within the fabric

and need not be connected directly to the intelligent switch although it is the most

commonly employed approach.


When using AP-7600B switches, hosts are connected to the edge tier and storage is

connected to the core tier. The core tier is the centralized location and connection point

for all physical storage in this model. All IO between the host and storage must flow over

the ISLs between the edge and core. It is a one-hop infrastructure for non-virtualized

storage. However, all virtualized storage traffic must pass through at least one of the

ports on the AP-7600B. Therefore, the IO from the host must traverse:

1. an ISL between the edge tier and the Core tier

2. an ISL between the Core Tier and the AP-7600B

3. an ISL back from the AP-7600B to the core tier where the IO is terminated. This

is a three-hop logical topology.

Edge/Core/Edge Topologies

MDS configurations:

In this model, you connect hosts to the Connectrix® MDS host edge tier switches and

storage to MDS storage tier switches. The MDS switches or directors act as a

connectivity layer. The core tier is the centralized location and connection point for edge

and storage tiers. All IO between the host and storage must flow over the ISLs between.

This is a two-hop logical topology.

Within the fabric, hosts can access virtualized and non-virtualized storage. The location

of the physical storage to be virtualized will determine the location of the SSM/MSM

blade. There are two possibilities:

Storage to be virtualized is located on a single switch

Storage to be virtualized is located on multiple switches on the storage tier

Note: Physical placement of the RecoverPoint Appliances can be anywhere within the

fabric and need not be connected directly to the intelligent switch.

Storage on a single switch:

If physical storage is located on one edge switch, place the SSM/MSM modules on the

same switch. tThe SSM/MSM is co-located with the storage that it is virtualizing to

minimize latency and increase fabric efficiency.


Note: Connections to the SSM/MSM are not required.

ISLs between the switches do not have to be connected to the SSM blade

Hosts do not have to be connected to the SSM/MSM blade

Storage should not be connected to the SSM blade

Internal routing between blades in the chassis is not considered an ISL hop since the

SSM/MSM is located inside the storage edge switch. There is additional latency with the

virtualization ASICs; however, there is no protocol overhead associated with routing

between blades in a chassis.

Storage on multiple switches:

If the physical storage is spread among several edge switches, locate a single

SSM/MSM in a centralized location in the fabric to achieve the highest possible

efficiencies. Because the physical storage ports are divided among multiple edge

switches in the storage tier, place the SSM/MSM in the connectivity layer in the core.

Just as with the Core/Edge design (or any design), all virtualized traffic must flow

through the virtualization ASICs. By locating the SSM/MSM in the connectivity layer,

RecoverPoint’s VIs will only need to traverse a single ISL to access physical storage. If

the SSM/MSM was placed in one of the storage tier switches, most or some of the traffic

between the SSM and storage would traverse two or more ISLs.

With MDS switches, since the SSM/MSM is located inside the switch or director, internal

routing between blades in the chassis is not considered an ISL hop. There is additional

latency with the virtualization ASICs but there is no protocol overhead associated with

routing between blades in a chassis.

Brocade Configurations

In a core/edge/core design with Connectrix B, The AP-7600Bs are linked via ISLs to the

storage edge switches.

Storage on a single switch:

In this model, hosts are connected to the host edge tier switches and storage is

connected to storage switches that form the other edge tier. The core Directors are for

connectivity only. All IO between the host and storage must flow over the ISLs in the

core. It is a two-hop infrastructure for non-virtualized storage.


In these cases, when using an AP4-18i blade, it should be located in the directors of the

core tier. If employing an AP7600B switch, it should be directly ISLed to the core

directors. The considerations are similar to the MDS blades and switches.

In the case of AP7600B switches, all virtualized storage traffic must pass through at

least one of the ports on the AP-7600B. Therefore, the IO from the host must traverse an

ISL between the edge tier switch and the core director. Then, it must traverse the ISL

between the core director and the AP-7600B. Finally, it must traverse an ISL back from

the AP-7600B to the core director where the IO is forwarded to the storage edge switch.

This is a four-hop design that would require an RPQ.

Storage on multiple switches:

You may spread physical storage amongst several edge switches. In thse cases, an AP-

7600B or AP4-18i blade must be located in a centralized location in the fabric to achieve

the highest possible efficiencies. Because the physical storage ports are divided among

multiple edge switches in the storage tier, place the AP-7600B or AP4-18i blade in the

connectivity layer in the core.

Note: When RecoverPoint is added to a SAN, not all physical storage has to be

virtualized. Multiple storage ports may have LUNs to be virtualized. However, if more

than 75% of the LUNs to be virtualized reside on a single switch, locate the AP4-18i

blade on that switch or ISL the AP-7600B on that switch.

Fabric Splitter Sizing Fabric splitters introduce additional limitations that are independent of the RecoverPoint

cluster limitations.

An ITL is an entity used internally by the switch to uniquely identify a LUN, as accessed

by some initiator. It is composed of the following elements:

I Initiator’s WWN

T Target’s WWN

L LUN


All three values are necessary to uniquely identify a LUN given the possibility of LUN-

mapping per host. Because data replication performed by the switch occurs at ITL

granularity, the limits regarding the number of volumes supported by a service are given

in ITLs. In other words, the relevant “count” of replicated entities in the environment

should be the sum of possible paths for all initiator-target-LUN combinations.

ITL numbers supported for each splitter vary between splitter type and Recoverpoint

release versions. Use ITL calculators to calculate the ITLs required in the configuration

and to verify that they are under the limit imposed for those configurations.

Journal volume design Journal volumes can be seen as append only, log structured systems where all the data

that is modified on the source volumes is being logged on the journals. Understanding

the I/O profile on these journal volumes helps to properly design the journal volumes to

meet performance requirements.

As the hosts writes to the production volumes, RPA consolidates the incoming copy of

I/Os into consistent snapshots. It then organizes these I/O blocks along with their

metadata and writes them to Journal in a sequential write. Because the metadata is

stored with the snapshot, RPA can identify where these blocks belong on the target

volume. Similarly, when it’s time to distribute the snapshot to the target volume, RPA

reads the snapshot as a sequential read. RPA then reads the existing data on the target

volumes and follows it with a write on the target volumes to update the data from the

Journal. These reads and writes to the target volumes are random and are based on the

host I/O profile. RPA again bunches all the random reads from the target volumes and

writes them to the journal's undo stream as a sequential write.

The I/O profile on the Journal volumes is sequential and random on the target volumes

while distributing. For every write done on the production volumes, RPA performs one

random read and one random write on the target volumes. Keep the journal spindles

separate from random I/O profile volumes when designing the Journal volumes. When

allocating the target volumes, design them with their random I/O profile in mind. For

example, if the production host has a 3:1 read to write, random I/O profile, design the

target volumes performance profile to be 60%-70% of the production volumes.


Sizing the RPAs (Appliances)

A single RecoverPoint appliance can sustain an average of 75 MB/s write I/Os up to

peaks of 110 MB/s. Use this throughput figure to calculate the number of appliances

required for the desired replication. A minimum of two RPAs are required for redundancy

in any RecoverPoint solution. The maximum sustainable incoming throughput for a

single cluster is 600 MB/s.

Note: CGs (unless appropriately configured on Version 3.3) run on a single node. From

v 3.3 onwards, applications can span RPAs providing up to 250MB/s throughput per

consistency group.

Below version 3.3, if an end user requires CGs that sustain throughput more than the

maximum per node, utilize the parallel bookmark feature.

In addition to throughput rates, calculate the I/O per second (IOPS) when sizing the

appliances. The maximum IOPS a single RPA can sustain is 16,000 IOPS. The

processing power of an eight-node cluster is theoretically 128,000 IOPS.

But, as already mentioned, the other elements of the SAN (in this case the splitter

location) affect the overall environment. In the chart below, for SANTAP (Cisco) and

SAS (Brocade), values represent the cumulative supported IOPS for a pair of blades.

See the Recoverpoint release notes of the version being used for the latest number.

Splitter IOPS 3.0 sp1 3.1

Average Sustained

Host 12000 14000

SANTAP 11500 11500

SAS 12000 16000

Clariion 6000 10000

Burst

Host 19000+ 19000+

SANTAP 11500 12000

SAS 18000+ 19000

Clariion 6500+ 20000


Sizing the WAN Pipe

The RecoverPoint WAN network, in the case of remote replication, must be well-

engineered with no packet loss or duplication as it would lead to undesirable

retransmission. When planning the network, ensure that the average utilized throughput

doesn’t exceed the available bandwidth. Oversubscribing available bandwidth will lead to

network congestion, causing dropped packets and TCP slow start. Consider Network

congestion between switches as well as between the switch and the end device.

Consider user RPO requirements and I/O fluctuations to determine the BW required.

The relevant date to size the WAN pipe is:

• Average incoming I/O for a representative window in MB/s. (24 hrs/7 days/30

days)

• Compression level achievable on the data. (This is often difficult to obtain and

depends on the compressibility of the data. The rule is 2x to 6x.)

Dedicate a segment or pipe for the replication traffic or implement an external QOS

system to ensure bandwidth allocated to replication is available to meet the required

recover point objectives (RPO).

From these numbers, compute the minimal BW requirements of the environment by

multiplying the estimated compression level by the average incoming data. Allocating

this BW for replication does not guarantee RPO or the frequency of high loads because

the I/O rate can fluctuate throughout the representative window.

Databases in Consistency Groups Database files and redo logs should be in a single consistency group (CG) or group set.

Place the archive logs in a different consistency group. This separates the spindles from

database volumes and also facilitates enabling image access on the archive logs CG at

a later point of time from that of an image on the Database CG. This enables recovery.


Further, the archive logs are created at discrete points in time (when the online redo logs

are switched) and are generally used as a whole. Bookmarks taken just after the log

switch should be enough for database recovery and we will not need the intermediate

bookmarks created while the logs are being switched. For this reason, the journal

volumes on the archive log CG need not be big. Archive log CG is also a good place to

enable fast forward distribution by specifying maximum journal lag.

Recoverpoint over Invista

You can achieve replication of Invista volumes to a remote location utilizing a

Recoverpoint solution. The Invista volumes can be replicated in a Virtual-to-Virtual or in

Virtual-to-Physical configurations. The following aspects of the design need to be

carefully evaluated when designing to deploy Recoverpoint over Invista in this fashion:

a) Firmware versions on the intelligent modules: The versions of Invista and

Recoverpoint are generally qualified to work only with a specific version of firmware on

the intelligent modules. The firmware on these modules is closely tied to the firmware on

Fibre channel switches hosting them. Deploy modules/switches hosting Invista

separately from the modules/switches hosting Recoverpoint due to this tight requirement

and dependency on firmware versions. This enables upgrade or downgrade of the

modules of one application independent of other applications’ modules.

b) All Recoverpoint appliances must be able to access all of the Invista volumes that are

being replicated. Recoverpoint Appliance HBAs accessing Invista volumes add up the

ITLs on Invista configuration. Consider these counts to correctly design without running

into Invista ITL scalability limits. For example, a 2x appliance per site Recoverpoint

configuration (with two HBAs per fabric per appliance) replicating Invista volume will

increase the Invista ITL counts by 5 times (Invista ITLs + 2x HBA X 2x RPA X Invista

ITLs). Similarly, consider the number of ITLs supported on an Invista VT (256 currently)

so as to distribute the VTs among RP appliances and front end hosts carefully.


Splitter configuration limits

Multiples splitters in the fabric:

Most RecoverPoint cluster configurations contain one fabric splitter per fabric. There is

no restriction on the number of clusters in a fabric when the splitters of the clusters are

maintained separately. The architect must be mindful of the following when there is a

need to deploy two or more splitters within a fabric (for the same cluster) for

performance or resiliency reasons.

Brocade:

You can use several splitters in the same fabric, as long as each target port is handled

exclusively by a single splitter.

Cisco:

You can use several splitters in the same fabric:

As long as each target port (represented by DVT) is handled exclusively by a

single splitter

As long as each target port (represented by DVT) is handled by several splitters,

each on different VSAN.

Storage performance with RecoverPoint There are three different entities with the RecoverPoint replication that are important for

performance analysis.

1. The replica volume set, a copy of the production volume set in size

2. The journal volume set as RecoverPoint performs striping on these volumes.

Putting a large number of these from different RAID groups will increase

performance.

3. The production volume set that is usually irrelevant in CRR installations.

However, for CDP that is using the same array for the source and target, or for

CRR in test environments that use the same array for both sets, this set must

also be included in the storage performance.


The journal volume set is used by the system for large IOs, and they are almost all

sequential IO, striped across several LUNs for performance. The journal contains two

streams that we will refer to as the “DO” data containing future writes and the “UNDO”

data containing historic writes.

The RecoverPoint system has three modes of distribution at the remote site. The system

switches automatically between these modes based on storage performance.

Journal Storage Performance Configuration Guidelines:

I suggest the following to increase Journal volume performance:

Allocate a special RAID group for the journal. Writes on the Journal volume are

usually sequential with a large write size. Hence, any random I/O profile LUNs

should not be on the same disk spindles as journal volumes.

Configure journal volumes of different CGs on different storage RAID groups so

one group’s journal will not slow down other groups.

Configure journal volumes on a different RAID group than the user volumes.

Raid 5 is generally a good choice for Journal volumes.

Journal volumes can be corrupted if any host that is not the RPA writes to it. To

prevent this, ensure that it is not zoned in with any other hosts other than the

RPAs. Manually load balance between consistency groups so that CGs on each

RPA generate, on average, approximately the same amount of data.

Ensure that journal volumes are optimized for sequential writes (reads and

writes).

Journal speed is also the maximal burst speed that is handled with no high load.

If the WAN allows it, and you have bursts, be sure the journal volume is fast

enough to handle them.

Run a benchmark on the journal (For example, IO meter or iorate from the host)

to measure sequential access speed.

At a minimum, remote user volume should be able to sustain the average

production writes load. The system can provide better Recovery Time Objectives

(RTO) when the remote storage can keep more than twice the average of

production writes load. Note: we only care about production writes and not reads.


The maximum replication distribution will be at least 20 percent slower than the

journal volume speed.

If you have more RPAs in the cluster when you have CGs defined, I recommend

splitting a consistency group into two or more separate groups to gain

performance. Run the parallel group set on different RPAs. The system will

create bookmarks that are consistent across multiple RPAs when configuring

parallel groups (or group sets).

SANTAP Performance

Supported Host ITLs The number of supported SANTAP sessions or replicated ITLs is limited to 2048. A

session is the object that SANTAP uses to manage the split write stream for a particular

ITL. When an appliance suffers an outage, SANTAP experiences a significant amount of

exception traffic and must process this additional work in a limited amount of time. The

time budget and the amount of work impose a ceiling on the number of sessions that

SANTAP can support. This is why only 2048 host ITLs per SSM or MSM module are

supported.

Distributing the load among SSM DPCs Create a DVT for every target port utilizing SANTAP services. The DVT is created in the

Front-end (FE or host) VSAN and is assigned to a DPP on the SSM module to manage

(mirror) the writes sent to the target port. DPP (Data Path Processor) is an ASIC on the

SSM module. There are 8 such ASICs per SSM module. Once a host logs into a DVT,

SANTAP will install a DVTLUN for every masked LUN on the target port for this host.

When a DVT is created in the Front-end VSAN and a host logs into that DVT, SANTAP

creates a pseudo initiator for the host on the DPP (that the DVT was assigned to). Once

a pseudo initiator is tied to a DPP it should not be associated with another DPP (as it

leads to a duplicate VI problem and results in non-deterministic behavior). If the host

needs to talk to another DVT, then that DVT also must be created on the same DPP

where the pseudo initiator is installed. This is an important design consideration when

distributing the load across the DPPs.


Utilize as many DPPs as possible in an SSM based SANTAP design for performance

reasons. Each Front-end and Back-end VSAN combination gets its own DPP. If a set of

hosts are exclusively talking to a set of DVTs, place them in a single Front-End VSAN.

Distributing the load in MSM Configurations MSM configurations require only a single Front-end VSAN. Manual load balancing

across the DPPs is not required and adds significant flexibility in SANTAP deployments

without performance impacts.

FAP performance In Brocade FAP deployments with Frame redirection, each binding is alternately

assigned to one of the two DPCs on the virtualizing module or switch. When binding

performance critical hosts with multiple paths per fabric, bind the initiators of the host

belonging to a fabric one after the other so that they get different DPCs.

Network latency and BW requirements In addition to the size considerations mentioned in the WAN sizing section, the WAN

pipe becomes an issue when there is a large latency (more than 100ms) or packet

drops. Configure or tune the number of streams (sockets) used for replication when

there is a high latency or low fidelity WAN.

Run the ‘iperf’ command on the RPA (as user boxmgmt ) with 1, 5, 10, 20, 40, 60

sockets to check WAN performance. Set the number of sockets (num_of_streams) in the

RPA to the highest number on which iperf obtains improvement in results (maximum

number is 40). If there is a significant gain moving from 40 to 60 sockets (more than

10%), it illustrates that the WAN link is problem bound. Note that Recover Point will not

get better performance than iperf if the WAN bandwidth is significantly larger than iperf

results. RecoverPoint will not be able to utilize it, since the WAN suffers from bottlenecks

(for instance, too many packet drops).


Statistics and Bottlenecks “Detect_bottlenecks” in RecoverPoint is a simple tool for the user to see f there is a

problem with system performance, identify the problem and the possible resolution. This

tool is often used to iteratively resize and tune the various components of a

RecoverPoint system based on the live system’s performance feedback.

RecoverPoint captures and stores both short term and long statistics on its performance.

The statistics are collected on a wide ranging and comprehensive set of counters. Only

very experienced users can understand these statistics. ‘Detect_bottlenecks’ provides a

simple and easy to understand analysis interface.

‘detect_bottlenecks’ outputs provide the following:

1) System overview based on the long term statistics for the duration specified

2) Observations on the system exceptions (bottlenecks) that cause improper

behaviors

Interpreting Detect_bottlenecks output There are 4 sections In the output of Detect_bottlenecks output.

1) Overview of the system:

This contains an overview of the system during the time specified. There are four

subsections in the system overview:

Site Overview

Group Overview

Copy Overview

Link Overview


2) Highload periods: the

The output prints how many high load periods the system observed during the period

specified and their times. The system then prints the system overview for the first 3

highload periods.

Each highload period could have several highloads. The system overview for the

highload period and the first highload in that period is produced.

System overview for each highload and highload period is similar to the system

overview. The highload overview will contain the Site, affected RPA, affected Copy, and

affected Link overview during the period of highload.

3) Initialization Periods:

The output prints how many Initialization periods the system observed during the period

specified and their times. The system then prints the system overview for the first 3

Initializations.

System overview for each Initialization period is similar to overview of the system. That

is, each Initialization period’s overview will contain Site, Group, Copy and Link overview

during the period of Initialization.

4) Peak Periods:

This feature detects the largest data peaks during a specified detection period across all

RPAs at the production site. The command returns write volumes on all data transfer

links, where a data transfer link is uniquely defined by an RPA, target copy (local or

remote), and consistency group. Information at this granularity makes it possible to

identify opportunities for reorganizing consistency groups across the available RPAs to

achieve optimal load balancing (and reduce peaks) across the system.

Bottleneck Analysis: Bottleneck analysis is done on the statistics collated at each of the analysis cycles

(system overview, highload analysis, Initialization analysis). Bottleneck analysis is the

exception analysis where the algorithm deduces exceptions and identifies the root cause

of the exception using analytical formulas.


Storage Encryption with Cisco SME

Several deployment topologies are possible with Cisco SME modules or Switches. The following guidelines related to SAN topology apply when deploying SME clusters.

The existing and new tape libraries must be connected to the MDS 9500 family

switches and the MDS 9200 family switches.

Switches connected to tape libraries must be running the minimum supported

SANOS version or later.

The MSM-18/4 module is supported on MDS 9500 family of switches and the

MDS 9222i switch. The switch must be running the minimum supported SANOS

version or later.

Cisco SME requires a minimum of one SME line card in a cluster.

SME modules should be on the target switch whenever possible.

Core-Edge Topology In core-edge topology, media servers (or the hosts) are at the edge of the network, and

tape libraries are at the core.

In this topology, use SME modules in the core switch if the targets that require SME

services are connected to only one switch in the core. The number of SME line cards

depends on the throughput requirements

Edge-Core-Edge Topology In Edge-Core-Edge topology, the hosts and the targets are at the two edges of the

network connected via core switches.

If the targets that require SME services are connected to only one switch on the edge,

SME modules should be used on that switch and the SME should be provisioned on that

switch only. The number of SME line cards depends on throughput requirements


Sizing Guidelines

1. Each SME interface supports up to 450 MB/s throughput with compression and

encryption enabled.

2. The number of tape drives that can be serviced by an SME module depends on

the throughput of the tape drives. For example, peak throughput of each LTO-3

drive is 40-60 MB/s with compression and encryption enabled. Each SME

interface should be connected to 6-8 such tape drives for optimal performance,.

3. In addition, the actual throughput also depends on the server performance,

number of concurrent SME streams on the SME interface, and the backup data

(compressibility) so appropriate considerations must be made and a benchmark

is recommended.

4. 32 targets, at most, per switch are supported by FC-redirect.

5. Each FC-redirected target can be zoned with a maximum of 16 hosts.

6. A maximum of 1000 FC Redirect entries are available on each line card on which

two hosts or targets are connected.

7. A Cisco MDS 9500 series switch can accommodate multiple SME line cards.

8. A physical fabric can have, at most, one Cisco SME Cluster. Each cluster can

have up to four switches with multiple SME interfaces provisioned and SME

service enabled.

9. The encryption engine processor on the Cisco MSM module also processes the

traffic on the four Gigabit Ethernet ports on the MSM module and performs

IPSec encryption and data compression for Fibre channel over IP connections.

As a result, using FCIP and SME on the same MSM module is not advisable due

to the performance degradation that may result.


Storage Encryption with Brocade Encryption Services You can employ several deployment topologies using Brocade Encryption modules or

switches. Some commonly deployed topologies are:

1. Single fabric deployment - HA cluster

In this topology, two encryption blades or switches can form an HA cluster

providing redundancy in a Single fabric. If one Encryption Blade or switch fails,

the other switch or blade takes over the Crypto Target Container.

2. Single fabric deployment - DEK cluster

In this deployment, the Encryption modules/switches depend on host Multi Path

I/O software for failover. Each Blade or switch services one Host Initiator and

Target pair per host.

3. Dual fabric deployment - HA and DEK cluster

In this model, the HA cluster is used within a fabric and DEK failover is used

between the fabrics for redundancy.

Sizing Guidelines

The following performance numbers dictate the sizing and the number of modules or

switches to be deployed for Brocade Encryption.

For Disk Encryption, the throughput per module/switch is rated at 96 Gbit/sec

(Mix of both encrypted and clear text traffic). Up to 64K concurrent exchanges

can be processed per module or switch.

For Tape Processing, the throughput per module/switch is rated at 48

Gbit/sec. Up to 96 concurrent tape sessions can be processed per module or

switch.


Future Directions One of the limiting factors of fabric based block virtualization technologies is that they

require a specialized virtualization hardware switch or module in the fabrics. Even in

highly redundantly designed fabrics, once these modules or switches are inserted into

the fabric, the entire virtualized application infrastructure depends on them for

virtualization services. These virtualization modules can become the performance

bottleneck and single point of failure within a fabric. Although redundancy can be

introduced within the fabrics by adding multiple virtualizing modules and additional host

initiators and storage targets, this design becomes expensive and difficult to maintain.

The limitation stems from the fact that enabling virtualization technology operates at the

module or switch layer.

Fabric Port level virtualization services could alleviate such limitations in the fabrics.

When the virtualization is done at the port level of a fabric, the redundancy designed in

the existing fabrics can remain the same while the intelligent ports can deliver the

virtualization services. Major strides in design changes and innovation at the port level

technology are needed for this to be feasible. These enhancements are needed for

Block level virtualization applications to be widely deployed in enterprise environments.

Conclusion

This article briefly reviews block based storage virtualization applications that are being

deployed by Enterprises. It describes the SAN specialized hardware that enables the

virtualization features in these applications and examines the design aspects of these

solutions while deploying them and keeping performance and scalability in mind.


References

1. RecoverPoint CLI Reference Guide

2. RecoverPoint Administrator’s Guide

3. RecoverPoint Security and Networking Technical Notes

4. Deploying RecoverPoint with SANTAP and SAN-OS Technical Notes

5. Deploying RecoverPoint with the Connectrix AP-7600B and PB-48K-AP4-18

Technical Notes

6. Cisco MDS 9000 Family Storage Media Encryption Configuration Guide

7. Brocade Fabric OS Encryption Administrator’s Guide

Date post:	02-Jun-2020
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

Education Services Home, | Dell EMC Education …...Brocade Application platforms - the standalone...

Documents