Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster ...

Prepared by: Citrix Solutions Lab

Design Considerations for Citrix XenApp/XenDesktop 7.6 Disaster Recovery

Citrix Solutions Lab White Paper This paper examines the issues and concerns around building a disaster recovery plan and solution, the possible use cases that may occur, and how a team of engineers within the Citrix Solutions Lab approaches building a disaster recovery solution.

November 2015


3 citrix.com

Table of contents

Section 1: Overview ................................................................................... 5

Executive summary ...................................................................................................... 5

Audience ................................................................................................................... 5

Disaster Recovery vs. High Availability ........................................................................ 6

Defining types of Disaster Recovery ......................................................................... 6

Defining what is critical .............................................................................................. 7

Section 2: Defining the Environment .......................................................... 8

Service Descriptions ..................................................................................................... 9

User to Site Assignment ............................................................................................. 10

User Counts by Region ........................................................................................... 10

Regional Site 1 – Network Diagram ........................................................................... 11

Regional Site 2 – Network Diagram ........................................................................... 12

Cold DR Site – Network Diagram ............................................................................... 13

Software ..................................................................................................................... 14

Hardware .................................................................................................................... 14

Servers .................................................................................................................... 14

Network ................................................................................................................... 15

Storage ....................................................................................................................... 16

Use Cases .................................................................................................................. 17

Section 3: Deployment ............................................................................. 18

Configuration Considerations ..................................................................................... 19

Region Server Pools .................................................................................................. 22

Failover Process ........................................................................................................ 25

Section 4: Conclusion ............................................................................... 27

Section 5: Appendices .............................................................................. 28

Appendix A ................................................................................................................. 28


4 citrix.com

References .............................................................................................................. 28

Appendix B ................................................................................................................. 29

High Level Regional Diagrams ................................................................................ 29

Appendix C ................................................................................................................. 31

Identifying Services and Applications for DR/HA ..................................................... 31


5 citrix.com

Section 1: Overview

Executive summary There is much conversation around executing disaster recovery for a data center, and utilizing high availability wherever possible. However, what are the requirements around disaster recovery, and how does it differ from high availability? How do they work together to ensure your systems and applications are up and available, no matter what?

This white paper looks at understanding disaster recovery and high availability. As with most things in life, there are trade-offs. The more resilient to failure you want to be, the more it is going to cost. How do these trade-offs affect you? There is the old-fashioned approach of writing everything of importance to tape, storing the tape off-site, and waiting for a disaster to occur. Tape is a very low cost option, but it could take days or weeks to rebuild your environment. The other end of the spectrum comes from utilizing today’s technology and making everything active/active, essentially running two complete data centers in two different locations. The two data centers option is an extremely resilient, but also extremely costly option. Simply, you are betting you are going to have a disaster that affects at least one of your sites.

What exactly needs to be up and running as quickly as possible after a failure of your data center? Where does high availability come into play to help? This document looks at some of these questions, and asks a few more, to help you understand and make good decisions in building a disaster recovery plan.

This project is not looking at sizing, scaling, or performance, but at design considerations for disaster recovery. In the Solutions Lab, a team of engineers including lab hardware specialists, network specialists, storage specialists, architects, and Citrix experts were challenged to build a disaster recovery solution for a fictitious company defined by Solutions Lab Management. This document shows how the company was defined, how the team architected and then implemented a solution, and some of the issues and problems they uncovered as flaws in their plan or things they did not expect or anticipate. The end result plan was compared to how companies such as Citrix handle disaster recovery, and it was found to be very similar. The team had an advantage in that they were able to build the company data center to fit their design, not try to fit a design to an existing data center. Hopefully what they learned and uncovered will assist you as you think about building your own disaster recovery plan.

Note that a major component of any disaster recovery solution is the storage and storage vendor used. The concerns are around the amount of data to be moved between the sites and the acceptable delta between data synchronizations. For this paper, we worked with EMC, utilizing their storage solution to achieve our defined goals.

Audience

This paper was written for IT experts, consultants, and architects tasked with designing a disaster recovery plan.


6 citrix.com

Disaster Recovery vs. High Availability Before we can proceed, we need to align on some definitions and terms. For this paper, High Availability (HA) is focused more on the server level, and is configured in such a manner that the end user experiences little to no down time. Recovery can be automatic, simply failing over to another host, server, or instance of the application. HA is often thought of in terms of N+1, the addition of one more server (physical or virtual) or application than is required. If five physical servers were required to support the workload, then six would be configured with the load distributed across all six servers. If any single server fails, the remaining five can pick up the workload without significantly affecting the user experience. With software like Citrix XenDesktop, the same approach applies. If one delivery controller/broker, provisioning server, or SQL server is not sufficient to support the workload, a second one is deployed. Depending on the software, this can be either Active/Active where all are actively processing, or Active/Passive where the HA only becomes active in failure of the first system. In XenDesktop, we always recommend an Active/Active deployment.

Disaster Recovery (DR) implies a complete disaster, no access to the site or region, total failure. The recovery will require manual intervention at some point, and the response times for being operational again are defined by the disaster recovery specifics. We will talk more about this later in this paper.

Defining types of Disaster Recovery

For HA, we talked in terms of Active/Active and Active/Passive, where these terms define how the HA components act, either all are up and supporting users or one is awaiting a failure event and then proceeds to pick up the load. These terms can be applied to DR as well:

Active/Passive (A/P) – referred to as planned or cold

o Once a disaster strikes the second site must be brought up entirely

o Only as current as the last back up

o Could have hardware sitting idle waiting for disaster

Active/Active (A/A) – referred to as hot sites

o Everything replicated in the disaster site

o Duplicate hardware

o Everything that occurs on the primary site also occurs on the secondary site

o Load balanced

Active/Warm (A/W) – referred to as reactive or warm

o Some components online, ready

o Must define priority recovery

o When disaster occurs, provision capacity as needed

In A/P, depending on how quickly you need to be back up and running, it may be as simple as backing up to tape and in a disaster restoring from tape to available hardware. This is the lower cost solution, but not very resilient or quick for recovery. A/A has duplicate hardware and software running and supporting users. In a multi-site scenario, each site must have enough additional hardware to support the user failover. A/A is much quicker to recover from a disaster, but much more expensive from a Capital Expenditure (CAPEX) cost with hardware. Essentially, each site has a complete duplicate set of under-utilized hardware waiting for a disaster. With A/W, the plan is to define that which is critical to the company and what must be recovered as quickly as possible and having enough bandwidth at the other site(s) to support the requirement. Once the most critical environment is defined, the rest of the company can be dealt with. This does require some extra hardware in each region, but we can better manage the resources and costs.


7 citrix.com

Defining what is critical

In an A/A deployment, the thought is that everything is critical, and must be up and running. In an A/P deployment, critical uptime is not important. However, for A/W we must define what is critical, and which users are critical. The following terms are used going forward:

Mission Critical (MC) – Highest Priority

o Requires continuous availability

o Breaks in service are very impactful on the company business

o Availability required at almost any price

o Mission critical users are highest priority in event of a failure

Business Critical (BC) – High Priority

o Requires continuous availability, though short breaks in service are not catastrophic

o Availability required for effective business operation

o Business critical users have a less stringent recovery time

Business Operational / Productivity (PR) – Medium Priority

o Contributing to efficient business operation but doesn't greatly affect business

o Regular users, may not fail over, or done so as final steps

As stated earlier, we created a fictitious company for this disaster recovery plan scenario. This company has a single Mission Critical application and a single Business Critical application, and associated users. The “company president” defined the acceptable response times and requirements, including a desire to have a warm failover for mission- and business-critical users, and a passive failover for the rest of the company. The following sections highlight the development and implementation of the plan.


8 citrix.com

Section 2: Defining the Environment For this setup, the fictitious business was structured as one business with two regional sites. The business requires both company database (which is considered Mission Critical) and Exchange (which is considered Business Critical) availability. Region 1 focuses on company infrastructure and Region 2 focuses on a call center. MC and BC users are spread across multiple groups in each region. This setup must also be able to handle the total failure of both Regions 1 and 2 at the same time.

In a single region failure, the recovery goals for our setup are for MC applications and users to be back up and running within two hours with minimal data loss. BC applications and users must be back up and running within four hours with up to 60 minutes of acceptable data loss. If Regions 1 and 2 both fail, the third site must be up and running within five days with no more than 24 hours of acceptable data loss.

For a closer look at this diagram by region, see Appendix B at the end of the paper.


9 citrix.com

Service Descriptions This table defines our MC, BC and PR services and applications and our considerations in handling them in our setup.

Service Type

Service Description Configuration Requirements

Mission Critical

Microsoft SQL Sample Database - Northwind

SQL Sample Northwind Database is used along with a web server. This represents the Call Center mission critical application database.

SQL Sample database is deployed at all locations. Replication is handled by storage backend.

In case of major failure, the database must be delivered from the DR data center.

A maintenance message must be presented to external users when the database is not available.

Business Critical

Microsoft Exchange / Outlook

Access to email data for business critical users from Exchange database

The database is replicated between primary and secondary locations using Exchange database copies.

The database is backed up every 4 hours to the storage in DR location.

Business Operational / Productivity

Microsoft Office and file shares

All users use Microsoft Office to create and review documents.

Documents are stored on file server shares synced between regions.

Microsoft Office is published on XenApp.

DFS Replication is configured between primary sites and file-based backup is performed to the DR location every 8 hours.

In case of disaster, a limited set of users must have access to the DR file share location.

Published Microsoft Office must be unavailable to users when the file share is not available.


10 citrix.com

User to Site Assignment Each regional site in our setup has different type of users. Region 1 is focused on HR and engineering. Region 2 is focused on call center users. A majority of the users are hosted shared desktops, the remaining users are VDI users, either pooled or dedicated.

User Counts by Region

The table below shows the breakdown of users by region and how they are organized within the regions.

Region 1 User Counts Mission Critical Business Critical

Business Operational / PR

Engineering 30 60 560

HR 10 10 20

Management 5 5

Region 1 Grand Total 45 75 580

Region 2 User Counts Mission Critical Business Critical

Business Operational / PR

Call Center 20 60 520

Engineering 10 50

HR 5 25

Management 5 5

Region 2 Grand Total 40 90 570


11 citrix.com

Regional Site 1 – Network Diagram

For Region 1, the server configuration consisted of:

Three physical servers running XenServer, hosting infrastructure VMs.

Four physical XenApp hosts in a single delivery group, as a 3+1 HA model supporting the business operational users.

Four physical hosts running XenServer configured as a pool, in a 3+1 HA model supporting the mission- and business-critical users. This pool supported the following configuration:

o 30 Windows 8.1 Dedicated VDI VMs

o 90 Windows 8.1 Random Pooled VDI VMs

o 5 Windows 2012 R2 Multi-user XA/HSD VMs supporting 80 users

The Region 2 failover pool in Region 1 is four XenServer hosts in a 3+1 model supporting the following configuration:




o 3 SQL 2014 VMs in a cluster (Call center database failover)


12 citrix.com

Regional Site 2 – Network Diagram

For Region 2, the server configuration consisted of:

Three physical servers running XenServer, hosting infrastructure VMs, including the SQL call center cluster.

Four physical XenApp hosts in a single delivery group, as a 3+1 HA model supporting the business operational users.

Four physical hosts running XenServer configured as a pool, in a 3+1 HA model supporting the mission- and business-critical users. This pool supported the following configuration:




The Region 1 failover pool in Region 2 is four XenServer hosts in a 3+1 model supporting the following configuration:





13 citrix.com

Cold DR Site – Network Diagram

For the DR site, the Region 1 disaster recovery site was set up with four XenServer hosts in a 3+1 HA model supporting the following configuration:

o Windows 8.1 Dedicated VDI VMs

o Windows 2012 R2 Multi-user XA/HSD VMs

o Infrastructure VMs

The Region 2 disaster recovery site was set up with four XenServer hosts in a 3+1 HA model supporting:

o Windows 8.1 Dedicated VDI VMs

o Windows 2012 R2 Multi-user XA/HSD VMs

o Infrastructure VMs

Note: The networks for Region 1 and Region 2 in this site are set up with the same IP ranges as in the original regional sites.


14 citrix.com

Software The following is a list of software components deployed in the environment:

Component Version

Virtual Desktop Broker XenDesktop 7.6 Platinum Edition FP2

VDI Desktop Provisioning Provisioning Services 7.6

Endpoint Client Citrix Receiver for Windows 4.2 (ICA)

Web Portal Citrix StoreFront 3.0

License Server Citrix License Server 11.12.1

Office Microsoft Office 2013

Virtual Desktop OS (Pooled VDI) Microsoft Windows 8.1 x64

Virtual Desktop OS (Hosted Shared Desktops) Microsoft Windows Server 2012 R2 Datacenter

Database Server Microsoft SQL Server 2014

Hypervisor XenServer 6.5 SP1

Network Appliance NetScaler VPX, NS11.0: Build 62.10.nc

WAN Optimization CloudBridge WAN Accelerator

CBVPX 7.4.1

Storage Network Brocade 5100 switch

Storage DR For XtremIO: EMC RecoverPoint 4.1 SP2 P1 For Isilon: OneFS 7.2 SyncIQ

Note: All software is updated to run the latest hotfixes and patches

Hardware

Servers

The hardware used in this configuration were blade servers with 2-socket Intel Xeon E5-2670 @ 2.60GHz, with 192 GB of RAM and two internal hard drives.


15 citrix.com

Network

VMs were utilized as site edge devices that helped route traffic between regions. The perimeter network (also known as a DMZ) had a firewall between itself and the internet and another firewall between the perimeter network and production network.

NetScaler Global Site Load Balancing (GSLB) was used to determine which region the user is sent. If available, users are sent to their primary region. When the primary region is not available, users are sent to their secondary region. A pair of NetScaler VPX appliances per region were utilized for authentication, access, and VPN communications. Additionally, a pair of NetScaler Gateway VPX appliances were utilized per region to allow connectivity into the XenApp/XenDesktop environment. CloudBridge VPX appliances were utilized for traffic acceleration and optimization between regions. NetScaler CloudBridge Connector was configured for IPSec tunneling.

The following diagram is a detailed architectural design of our network implementation.


16 citrix.com

Storage Storage was configured using EMC XtremIO All-Flash Storage and Isilon Clustered NAS systems. Storage Network for EMC XtremIO was configured with Brocade Fibre Channel SAN switches. The following diagram gives a high level view for Region 1. As stated previously, failover to a DR site requires manual intervention, so the concern in syncing data comes down to a math problem. How much data do you need to sync between sites and what size pipe between the sites? That determines how long it will take to sync. Can you sync in the time allowed? If not, what do you have to correct the problem, reduce the amount of data or increase the pipe speed?

One thing to look at is the LUNs, or storage repositories. Our design created multiple volumes for mission critical data and business critical data, and scheduled syncs accordingly. It is crucial that you work with the storage vendor to get the proper configuration.


17 citrix.com

Use Cases The following use cases define the possible scenarios that must be considered and, for our case study, the users that must be supported. The minimum implies the mission critical and business critical users that need to be supported.

Use Case 1

The sites are configured as Active/Active using NetScaler GSLB.

If the Region 1 site fails, mission- and business-critical users will be able to connect and log on to the Region 2 site with the same data resources as were available in the Region 1 site.

With the Region 1 site back online, NetScaler GSLB will direct users to the correct site, as Region 1 site users log off from the Region 2 site and then log back into the Region 1 site.

A maximum of 120 users will have warm HA failover capability from Region 1 to Region 2.

Use Case 2

The sites are configured as Active/Active using NetScaler GSLB.

If the Region 2 site fails, mission- and business-critical users will be able to connect and log on to the Region 1 site with the same data resources as were available in the Region 2 site.

With the Region 2 site back online, NetScaler GSLB will direct users to the correct site, as Region 2 site users log off from the Region 1 site and then log back into the Region 2 site.

A maximum of 130 users will have warm HA failover capability from Region 2 to Region 1.

Use Case 3 – Cold DR

The sites configured as Active/Passive, with the goal of failing over only the mission critical users from the Region 1/Region 2 sites to the DR site.

This site will be based on backup data from Region1 and Region 2 and will go live within 5 days.

Manual process to switch to the DR site

When users login to the DR site, they should have any changes/modifications in their dedicated environment in the DR site environment. There is potential of data loss between the last site to site copy and the failover. Once failed over to DR site, when Region 1/Region 2 return online, and after allowing appropriate time for replication between sites, login should connect to Region 1/Region 2 and the changes should be reflected there.

The cold DR site will contain subset of the regional sites – including networking, infrastructure and dedicated VDIs.

o This approach allows us to both easily recover from disaster with backups, and later rebuild regional sites from the DR site data.

Mission Critical users will have primary access to the cold DR site, followed by Business Critical, and then the rest of the company depending on timelines and disaster impact.

o A maximum of 45 users will have cold DR access from Region 1.

o A maximum of 40 users will have cold DR access from Region 2.


18 citrix.com

Section 3: Deployment In building this configuration, this document is not a step by step manual, but a guide to help understand what needs to be done. Wherever possible, Citrix documentation was followed around deployment and configuration. The following configuration sections highlight any deviations or areas of importance to help with a successful deployment.

Implementing the software breaks down to two major areas. First, putting the correct software into each region. Second, configuring NetScaler for GSLB.

The process followed for deployment was:

1. Deploy XenServer pools.

2. Create required AD groups and DHCP scopes.

3. Prepare SQL Environment (SQL AlwaysOn). PVS 7.6 adds support for AlwaysOn.

4. Deploy XenDesktop environment.

5. Deploy Storefront servers and connect to XenDesktop.

6. Deploy PVS environment and create required vDisks.

7. Configure NetScaler GSLB, create site and service.

8. Configure NetScaler Gateway in Active/Passive mode and update Storefront configuration.

9. Deploy Microsoft Exchange Environment.

The NetScaler configurations are straightforward, there was nothing special done with configuring StoreFront. This was a typical XenDesktop and NetScaler Gateway configuration. Two StoreFront servers were configured to be load balanced by NetScaler.

NetScaler GSLB is where the focus is:

• Using LB Method StaticProximity: Region 1 users will be sent to Region 1 if it is online, otherwise the users will be set to Region 2 and vice versa.

• Using location settings in NetScaler to define the primary regions of the client’s local DNS Servers and for the GSLB sites and services.

• Users regardless of region use the same Fully Qualified Domain Name (FQDN) (i.e. desktop.domain.com) – NetScaler running ADNS will answer authoritatively with the IP of primary site.

• Once the user is redirected to the proper site, the user authenticates at AG, and is then redirected to local StoreFront to get access to resources.

Additionally, NetScaler CloudBridge Connector is configured for IPSec tunneling:

• An IPSec tunnel for AD replication, server/client communication is created using the outbound connection.

• A second IPSec tunnel is created for site to site data replication.


19 citrix.com

Configuration Considerations The following defines some of the specific configurations applied to environment:

XenApp/XenDesktop

Regional Sites R1 and R2

o 2 Delivery Controllers – primary regional site

o FMA Services to have SSL on Controllers and change XML Service ports from HTTP to HTTPS ports to secure traffic communication

o XD/XA Database on the Always On SQL group

Must have unique Site Database naming

o SSL to VDA feature of XenApp and XenDesktop 7.6

o Hosted shared desktops

5 Machine Catalogs

Physical XA HSD

XA HSD MC

XA HSD BC

XA HSD MC Failover

XA HSD BC Failover

5 Delivery Groups matching the catalogs

o Pooled VDI desktops

4 Machine Catalogs

PR

BC

PR Failover

BC Failover

4 Delivery Groups

o Dedicated VDI Desktops

4 Machine Catalogs

MC

BC

MC Failover

BC Failover

4 Delivery Groups


20 citrix.com

VDI Virtual Desktops

o Pooled Random VDI Desktops

VDI VMs Streamed from PVS vDisk

o Dedicated VDI Desktops

Static VMs

My Documents must be redirected to a network location on File Share

XenApp / HSD

o Deployed in two models

o Physical Hosts in N+1 HA Model – manually installed on hardware

o Virtualized XA HSD VMs in N+1 HA model – streamed from PVS vDisks

User Profile Manager

o Hosted Shared Desktop: User Profile Data: \\FS01\ProfileData\HSD\ #SAMAccountName#

o Hosted Virtual Desktop: User Profile Data: \\FS01\ProfileData\HVD\ #SAMAccountName#

o Hosted Virtual Desktop: User Profile Data: \\FS01\ProfileData\MC\ #SAMAccountName#

o User Profile and Folder redirection Policies

StoreFront VMs

o SSL configured to secure traffic communication

o 2 – StoreFront Servers (HA) and LB by NetScaler VPX

o Authentication is configured on NetScaler Gateway.

License Server VM

o 2 – HA license servers

o SSL configured to secure traffic communication

o Windows 2012 RDS Licenses

o Citrix Licensing Server


21 citrix.com

Isilon Scale-out NAS for each Site

o 4 - X410 Nodes

o 34 TB HDD + 1.6 TB SSD

o 128 GB RAM 8x16 GB

Provisioning Services

o 2 – PVS Server VMs in HA

o PVS DB Server configured on SQL AlwaysOn

o Utilizing remote storage location for vDisks on each PVS – remote storage attached to PVS VMs as 2nd drive via File Server and SMB/CIFS.

Separate locations for vDisk store for Mission Critical and Business Critical vDisks on File Server via SMB/CIFS

Regular vDisks located on local File Servers

o Multi–homed

Utilizing Guest VLAN as Management interface

Utilizing the PXE VLAN for Streaming interface

o DHCP for PVS network/PXE VLAN

o Cache in device RAM with overflow on hard disk

256MB for Windows 8.1 VDI

2048MB for XA HSD

NetScaler VPX VMs

o 2 – LB VPX in HA mode

LDAP Authentication

AG VIP

VPN

GSLB for regional sites

o 2 - VPXs for LB of StoreFront and XML services


22 citrix.com

Region Server Pools The following defines the VM breakdown per region for the different pools required within the infrastructure environment. In all cases, the VMs were balanced across XenServer hosts, and VMs were configured in an HA model; a minimum of two VMs for each required application.

Region 1:

2 – XenDesktop Brokers

2 - StoreFront VMs

2 - License Server VMs

2 - Provisioning Services

2 - File Server VMs

3 - SQL 2014 Database Server VM – Always On

2 – AD DC VMs

4 - Exchange server VMs

2 Mailbox

2 Client Access

Perimeter Network

1 – Firewall / Router VM

2 - NetScaler VPX VMs – HA Model – User Access

2 – CloudBridge VPX VMs - HA Model - Active/Passive – Site to Site user access WAN optimization

2 – CloudBridge VPX VMs - HA Model - Active/Passive – Site to Site data replication

2 - NetScaler VPX VMs – HA Model – Data Replication

R2 HA Fail-Over Pool

5 – XA HSD VMs

25 – Pooled VDI VMs

25 – Dedicated VDI VMs

3 – SQL Server VMs (Call Center Cluster)


23 citrix.com

Region 2:

2 – XenDesktop Brokers

2 - StoreFront VMs

2 - License Server VMs

2 - Provisioning Services

3 – SQL 2014 Database Server VMs – SQL Cluster

3 - SQL 2014 Database Server VM – Always On

2 - File Server VMs

2 – AD DC VMs

4 - Exchange server VMs

2 Mailbox

2 Client Access

Perimeter Network


2 - NetScaler VPX VMs – HA Model – User Access

2 – CloudBridge VPX VMs - HA Model - Active/Passive – Site to Site user access WAN optimization

2 – CloudBridge VPX VMs - HA Model - Active/Passive – Site to Site data replication

2 - NetScaler VPX VMs – HA Model – Data Replication

R1 HA Fail-Over Pool

5 – XA HSD VMs

10 – Pooled VDI VMs

30 – Dedicated VDI VMs


24 citrix.com

Region 3:

Infrastructure Pool to support Region 3

2 – AD DC VMs

1 VM to handle backups from regions 1 and 2

Region 1 Infrastructure Pool

2 – AD DC VMs

2 – Delivery Controllers

2 – StoreFront VMs

2 – License Server VMs

1 – File Server VMs

2 – SQL 2012 Database Server VM – Always On

4 – Exchange server VMs

2 Mailbox

2 Client Access

Region 2 Infrastructure Pool

2 – AD DC VMs

2 – Delivery Controllers

2 – StoreFront VMs

2 – License Server VMs

1 – File Server VMs

2 – SQL 2014 Database Server VM – Always On

2 – SQL 2014 Database Server VM – SQL Cluster

4 – Exchange server VMs

2 Mailbox

2 Client Access

Perimeter Network


2 – NetScaler VPX VMs – R1/R2 Access

VIP per Region – 2 VIPs

Note: The infrastructure VMs for regions 1 and 2 were duplicated in region 3 for networking purposes. By setting the networks correctly in region 3, once regions 1 and 2 were brought up, no network changes were required in their infrastructure or VHD files.


25 citrix.com

Failover Process The dedicated VMs present the biggest challenge in a failure. To address this, VMs are created in both regions for the failover dedicated VMs from the other region. However, no storage is attached to these VMs. In the event of a failure, these VMs will be assigned the proper VHD file from the backup storage location. It should also be noted that for fail-back after the failed region is back online, the dedicated VM VHD files will be deleted in the failed region and copied back from the failover region and attached to the proper VM. This ensures the latest version of the dedicated VMs will be restarted after the fail-back.

Note: In dealing with dedicated VMs, we realized that we had to carefully name the VHD files and associated files to ensure connecting the correct VHD file to the correct VM in failover and fail-back.

If there is a failure in either Region 1 or Region 2 (what’s called a “warm” failover), a few steps need to be taken. The actions differ depending on the failure. If it is a network access issue, or the Internet is down, the dedicated VMs in the failed region are placed in Maintenance Mode in Citrix Studio and shut down. The latest storage backup of the dedicated VMs in the new region must be made available and the storage for each VM needs to be attached individually to the pre-created VMs already present. Group policy is applied to the dedicated VMs OU which import the registry value, listing the delivery controllers host names, allowing VDA registration with the local delivery controllers. The pooled VDI and XA HSD VMs on the local delivery site are also taken off Maintenance Mode and brought online.

For Region 2, the SQL database for the call center application is brought online as well. Depending on the type of failure, you may need to power down the failed region firewall to force failover to the other region.

Once those steps are completed, you boot Mission Critical User VMs and Business Critical User VMs. Mission- and Business-Critical data is kept in sync between the sites. You can then communicate the availability to your users. The end users use the same URL as always, with GSLB redirecting as required.

For fail-back after recovery of the failed region has completed, the steps are to sync all storage back to the failed site, perform the necessary steps for the dedicated VMs, bring the applications back online, and bring up the users.

In a full loss of both Regions 1 and Regions 2, the DR site, or Region 3, needs to be brought online. The physical servers are powered up, making the XenServer pools accessible. The latest database and Exchange information are imported and the infrastructure for user VDI VMs should be restored and brought online. A new URL is required to log in. Once the site has been brought online, any new information, like a new URL for access, needs to be given to your users.


26 citrix.com

The following defines steps required to recover and bring Region 3 as defined back online:

• Active Directory, DNS and DHCP

o Import Domain Controllers from backup and restore Active Directory functionality

o Update DNS Records for Storefront / Access Gateway / Exchange MX

o Create DHCP Scopes

• NetScaler

o Rebuild NetScaler components, NetScaler Gateway

• XenServer

o Turn existing XenServer Pools on

• File Services

o Restore access to file services, user data and UPM.

• XenDesktop Environment

o Import SQL VMs and restore XenDesktop, PVS and Call Center application databases

o Import StoreFront, XenDesktop and PVS VMs and test connectivity to databases

• Exchange Environment

o Import Client Access and Mailbox Servers and restore databases

• External DNS

o Update External DNS records for Access Gateway URLs

o Update External MX records for email

o Update Outlook Anywhere, Active Sync, etc. DNS records


27 citrix.com

Section 4: Conclusion As stated in the beginning, the goal of this project was to challenge a group of engineers with creating a disaster recovery plan for a fictitious company. This meant understanding what was mission critical, business critical, and normal day-to-day work, and what applications and data needed to be ready in case of a disaster. This also meant understanding user needs for issues like dedicated VMs. This paper highlights and defines some of the issues around creating a disaster recovery environment. This is not a how-to, step-by-step manual, but a guide to help you understand the issues and concerns in doing disaster recovery, and things to consider when defining your disaster plan. It shows you how the Citrix Solutions Lab team of engineers defined, designed, and implemented a DR plan for a fictitious company. This may not be the optimal solution overall for your company, but it is one that you can utilize as a base line of considerations and operational steps to be used when you create your disaster recovery plan for you company.

During the process of deploying and testing, there were some realizations and changes made. One of the first was around failing back after a failover; how to handle the data. Do you sync back, or delete and copy back? Our decision was to delete and copy back, ensuring the original site is clean and up to date. Another realization was around the configuration of GSLB and the failed site. Since preparing the fail over site for access requires manual intervention, there is potential for GSLB to re-direct users to the fail over site before it is ready, users could hit a StoreFront before any personal desktops or applications are available for them, they would have access to any common applications or desktops.

We used two different SQL approaches, Always-on for our infrastructure environment and clustering for our data base application. This was done by design in the lab to show issues and considerations around both.

To support high availability between the two main regions and having a third region for total failover the one thing that our “company president” was less than thrilled with was the Cap-Ex cost of hardware not being fully utilized. This is a cost of doing business.

However, with the recent introduction of Citrix Workspace Cloud, an alternate may have come up that we are “reworking” our fictitious company toward. Rather than having additional hardware in Regions 1 and 2, what if there was a cloud site running at a minimum waiting for a region to fail, and spin up what is needed to support the failure? Essentially, what is needed in the cloud is a NetScaler VPX for connectivity, an AD server, a SQL Always on server, and an Exchange server. This keeps the mission critical and business critical environments in sync. You can then determine what else may be required to support each region. The one current caveat of the cloud is that currently no cloud supports desktop operating systems; VDI users get server operating systems running in a desktop mode. This is not a major issue for pooled VDI users, but does become something to be solved for dedicated VDI users.

Will the cloud work for you? Should you use additional hardware in your regions? What are your recovery times? How much of your environment is actually mission critical? These are questions we hope you are now considering as you build a disaster recovery plan for your company.


28 citrix.com

Section 5: Appendices

Appendix A

References

EMC Storage

http://www.emc.com/en-us/storage/storage.htm?nav=1

Brocade Storage Network

http://www.brocade.com/en/products-services/storage-networking/fibre-channel.html

XenApp

http://www.citrix.com/products/xenapp/overview.html

XenDesktop

http://www.citrix.com/products/xendesktop/overview.html

NetScaler

http://www.citrix.com/products/netscaler-application-delivery-controller/overview.html

CloudBridge

http://www.citrix.com/products/cloudbridge/overview.html

Citrix CloudBridge Data Sheet:

https://www.citrix.com/content/dam/citrix/en_us/documents/products-solutions/cloudbridge-data-sheet.pdf

http://www.emc.com/en-us/storage/storage.htm?nav=1

http://www.citrix.com/products/xenapp/overview.html

http://www.citrix.com/products/xendesktop/overview.html

http://www.citrix.com/products/netscaler-application-delivery-controller/overview.html

http://www.citrix.com/products/cloudbridge/overview.html

https://www.citrix.com/content/dam/citrix/en_us/documents/products-solutions/cloudbridge-data-sheet.pdf


29 citrix.com

Appendix B

High Level Regional Diagrams


30 citrix.com


31 citrix.com

Appendix C

Identifying Services and Applications for DR/HA

This section identifies all the applications, services and data items for planning within our setup.

Call Center

Type: Database and App

Description: Main application for call center activity required for company mission critical function

Level: Mission Critical

Primary Location: Region 2 (West Coast), Region 1, R3/DR in case of failover or disaster

Access Methods:

Local Web Browser

Published App Web Browser

Data: SQL Database

actual test database - Microsoft SQL Sample StoreFront

Data Location: SQL 2014 Cluster

Systems:

SQL 2014 Database servers

Web Servers

Notes:

Database servers and database must be made accessible in R1 and R3/DR in case of fail-over or disaster

Both database and web site for it would need to be created

http://businessimpactinc.com/install-northwind-database/

https://msdn.microsoft.com/en-us/library/vstudio/tw738475%28v=vs.100%29.aspx

Exchange

Type: Service

Description: Email service, required for internal and external communication

Level: Business Critical

Primary Location: Region 1 & 2, R3/DR in case of disaster

Some Exchange databases are region specific

Access Methods:

Local Outlook Application

Published Outlook Application

Web Outlook

Data: Exchange Databases

http://businessimpactinc.com/install-northwind-database/

https://msdn.microsoft.com/en-us/library/vstudio/tw738475%28v=vs.100%29.aspx


32 citrix.com

Data Location: Exchange Servers

Systems:

Exchange Mailbox Servers

Exchange Client Access Servers

Notes

Exchange will need to be accessible in DR scenario in R3/DR for mission-critical users

Microsoft Office

Type: Application

Description: Productivity applications for regular office work

Level:

Outlook - Business Critical

other office apps - Productivity


Access Methods:

Local Outlook Application

Published Outlook Application

Web Outlook

Data:

Outlook Data File

Outlook Address Book

Exchange Mailbox

Exchange Address Book

Data Location:

Exchange Servers

User Outlook file location (redirected from My Documents to UPM storage?)

Systems:

Exchange Mailbox Servers

Exchange Client Access Servers

Notes:

Outlook needs to be available in all regions in case of failover for business critical users.

XenDesktop

Type: Service

Description: Virtual Desktop Brokering and management system, required for virtual desktop access and assignment



33 citrix.com


Data: XD Site Databases, region specific.

Data Location: SQL Always On HA Group

Systems:

XD Delivery Broker Server VMs

Citrix Licensing Server VMs

Notes:

Must be available in all regions for mission- and business-critical users to be able to access desktops.

For R3/DR the XenDesktop database and SQL servers supporting it are required to be brought up before the XD Deliver Controllers

Licensing server must be available for XenDesktop functionality to allow user connections

StoreFront

Type: Service

Description: Web Portal into the XenDesktop environment, required for user session access



Access Methods: Web Browser, Citrix Receiver

Data: SF configuration

Data Location: SF servers

Systems: Storefront Server VMs

Notes:

Must be available in all regions for mission- and business-critical users to be able to access desktops.

Provisioning Services

Type: Service

Description: Virtual Desktop VM streaming and deployment system, required for the virtual desktop VMs launch



Access Methods: PXE and DHCP for the Virtual Desktop VMs

Data:

PVS Farm Databases

vDisks

Data Location:

Farm Database - SQL Always On HA Group


34 citrix.com

vDisks – File Servers

Systems:

PVS Server VMs

File Servers (for vDisks)

Notes:

Licensing server must be available for PVS functionality to allow virtual desktop launch

User Profiles

Type: Data

Description: User data required for all users work on virtual desktops



Access Methods: SMB

Data: User personal data, including redirected My Documents

Data Location: UPM File Servers

Systems: File Server VMs


35 citrix.com

Corporate Headquarters

Fort Lauderdale, FL, USA

Silicon Valley Headquarters

Santa Clara, CA, USA

EMEA Headquarters

Schaffhausen, Switzerland

India Development Center

Bangalore, India

Online Division Headquarters

Santa Barbara, CA, USA

Pacific Headquarters

Hong Kong, China

Latin America Headquarters

Coral Gables, FL, USA

UK Development Center

Chalfont, United Kingdom

About Citrix

Citrix (NASDAQ:CTXS) is leading the transition to software-defining the workplace, uniting virtualization, mobility management, networking

and SaaS solutions to enable new ways for businesses and people to work better. Citrix solutions power business mobility through secure,

mobile workspaces that provide people with instant access to apps, desktops, data and communications on any device, over any network

and cloud. With annual revenue in 2014 of $3.14 billion, Citrix solutions are in use at more than 330,000 organizations and by over 100

million users globally. Learn more at www.citrix.com

http://www.citrix.com/


Copyright © 2015 Citrix Systems, Inc. All rights reserved. XenApp, XenDesktop, XenServer, CloudBridge, and NetScaler are trademarks of

Citrix Systems, Inc. and/or one of its subsidiaries, and may be registered in the U.S. and other countries. Other product and company names

mentioned herein may be trademarks of their respective companies.

Date post:	17-Dec-2016
Category:	Documents
Upload:	dinhtram
View:	241 times
Download:	8 times