Implementation Guide Addedum
C H A P T E R 3
Technology OverviewThis chapter includes reviews of the following major technologies used in the DRaaS 2.0 solution:
• InMage ScoutCloud Platform, page 3-5
• VMware Virtual SAN, page 3-16
InMage ScoutCloud PlatformImplementing DRaaS presents several technology options, which have varying levels of cost, complexity, and operational models. A summary of technology options for the implementation is presented in Figure 3-1.
Figure 3-1 Many Approaches to DRaaS
The host-based replication technology illustrated in Figure 3-2 is one of the recommended implementations for Cisco's DRaaS solution architecture. It is delivered in partnership with the InMage ScoutCloud product offering because of the value and the differentiation it provides for delivering DR services for both physical-to-virtual (P2V) and virtual-to-virtual (V2V) workloads.
2943
36
Production Data Centers DRaaS Provider
Application Replication
Hypervisor Replication
Host Replication
SAN Level Replication
Remote Backups
PrimaryStorage
PrimaryStorage
5DRaaS 2.0 InMage and Virtual SAN
Chapter 3 Technology Overview
Figure 3-2 Host-Based Replication/Recovery
Layer 2 Extensions and IP mobility using Overlay Transport Virtualization (OTV) and LISP to support partial failovers and active-active scenarios are part of the VMDC VSA 1.0 architecture. The solution presents heterogeneous, storage, and infrastructure-agnostic data replication capabilities for the creation and offer of DR solution offerings. The system offers CDP-based recovery with the ability to roll back to any point in time. The system provides guaranteed application consistency for most of the widely used industry applications. See Figure 3-3.
Figure 3-3 Host-Based DRaaS on VMDC Architecture
The InMage ScoutCloud platform addresses the growing market for cloud-based disaster recovery products, also referred to as the Recovery as a Service (RaaS) market. InMage ScoutCloud leverages next generation recovery technologies including disk-based recovery, CDP, application snapshot API integration, asynchronous replication, application awareness, and WAN optimization.
These next generation recovery technologies are wrapped up in a single product offering, enabling MSPs and cloud providers to have the fastest time-to-market when offering customers a near zero RPO- and RTO-capable DRaaS with:
Protected Server VMDK
Recovered Windows or Linux Virtual Machinesin Secure Cloud
Protected Servers stored as Dormant VMS• System• Apps• Data
Failover at a Time ofTest or Disaster• Promote VMDK from
Master Target• Create production
VMs
2943
38
Physical Server
Windows or Linux Virtual Machines
Customer IT Environment
Cloud Recovery
Apps& Data
OS
Apps& Data
OS
Apps& Data
OS
OperatingSystem
VM Resources
Application
OperatingSystem
VM Resources
Application
OperatingSystem
VM Resources
Application
OperatingSystem
VM Resources
Application
OperatingSystem
VM Resources
Application
Master TargetVirtual Machine
DedicatedVirtual
Firewall
2966
64
Enterprise Network 1
VMDC-Based Provider Cloud
CSR
VSG
SLB
WAN Edge
Cu
sto
mer
1 N
etw
ork
Co
nta
iner
VSG
SLB
CSR SVI
SVI
Enterprise Network 2
CSR
VSG
SLB
WAN Edge
Cu
sto
mer
2 N
etw
ork
Co
nta
iner
VSG
SLB
CSR SVI
SVI
WAN EdgeASR 9000
vFW
Nexus 7000
SLB
Service Provider VMDC (VSA 1.0)
CSR 1000V
VM
VM
VMVM
VMVM
VM
6DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
• Best-in-class data protection;
• A comprehensive P2V and V2V recovery engine that supports all applications;
• A provisioning manager that automates provisioning of recovery for VMs and associated storage combined with a full-fledged multi-tenant portal.
Figure 3-4 shows the InMage ScoutCloud architecture in a DRaaS environment. For more information about InMage ScoutCloud and ScoutCloud as deployed in Cisco-validated DRaaS solution environments, visit the following resources:
• http://www.inmage.com
• http://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/DRaaS/1-0/DRaaS_1-0.pdf
Figure 3-4 InMage ScoutCloud Architecture
InMage ScoutCloud Concepts
Continuous Data Protection (CDP)
Continuous data protection refers to a technology that continuously captures or tracks data modifications by saving a copy of every change made to your data, essentially capturing every version of the data that you save. It allows you to restore data to any point in time. It captures the changes to data and sends them to a separate location. CDP-based solutions can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, mailboxes, messages, and database files and logs.
2942
05
CX Configuration
Server
ProcessServer
ProcessServerPhysical Servers
RemoteTargetServer
Customer 1
Customer 1All Virtual Environment
CX Configuration
Server
RemoteTargetServer
Customer 2
CX Configuration
Server
RemoteTargetServer
Customer 3
MSP/CloudProvider
WAN
VMware ESX Server
All Managed Through aMulti-Tenant Portal
Customer 2Mixed Physical andVirtual Environment
Customer 3All Virtual Environment
No Process Server
VMware ESX Server
VMware ESX Server
VMware ESX Server
7DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
Traditional backups require a schedule and restore data to the point at which it was backed up. CDP does not need a schedule because all the data changes on the primary server are tracked and sent to a secondary server asynchronously. Most CDP solutions save byte or block-level differences rather than file-level differences. This means that if you change one byte of a 100 GB file, only the changed byte or block is saved. CDP technology has the following fundamental attributes:
• Data changes of primary server are continuously captured or tracked.
• All data changes are stored in a separately located secondary server.
• CDP enables data recovery in much less time as compared to tape backup or archives.
Disaster Recovery (DR)
Disaster recovery is the process of preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. The DR solution using CDP technology replicates your data to a separately located secondary server. If a disaster occurs, you can get immediate access to a primary server's data, which is up-to-the minute of disaster.
Application Protection Plan
An efficient Application Protection Plan can protect customer's critical applications from natural as well as human-created disaster. Every individual application of an organization should have a unique protection plan where the application can have single or multiple protections; i.e., the application can be protected locally for backup purposes or it can be protected to remote locations for DR purposes.
Replication Stages
InMage ScoutCloud replicates drive level data in three stages:
• Resyncing (Step I)—In this step, data at the primary server is replicated to the secondary server. This is done only once for each drives that you want to replicate to a secondary server drive.
• Resyncing (Step II)—All data changes during resynchronization (Step I) are replicated to the secondary server in this step.
• Differential Sync—Differential Sync is a continuous process where any change in the primary server volume is copied to the Secondary server volume simultaneously.
Consistent Data
In case of DR, the restored data should be consistent with the original data. To ensure the consistency of backup data, the consistent tags/bookmarks are issued at the primary server at periodic intervals of time or on demand.
Journal/Retention or CDP Logs
The retention or CDP logs store information about data changes on primary server within a specified time period on a separately located secondary server. This timeframe is referred to as the retention window. Consistent points are stored as bookmarks/tags in retention window. An application can be rolled back to one of the bookmarks/tags in this retention window. Alternately, an application can be rolled back to any point in time of this retention window. Applications that are rolled back to any of the bookmarks/tags in this retention window will only be consistent. Three types of retention policy are associated with this retention window:
8DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
• Time-based—The data in the retention window will be overwritten after the specified time period.
• Space-based—The data in the retention window will be overwritten once the size is exhausted.
• Time and space-based—The data in the retention window will be overwritten once the time specified or space specified qualifies first.
Sparse Retention
For long-term data retention purposes, the sparse policy is used, which helps to save disk space on retention volumes and makes it possible to afford a wider retention window. Depending on the type of policy enforced, the retention window is maintained by discarding older data changes within the retention log files to make rooms for new data changes.
Failover
This is the process of switching production server to secondary server. The failover process can be a planned or an unplanned operation. The planned failover is used for periodic maintenance or software upgrades of primary servers wherein the data writes to primary server are stopped. An unplanned failover happens in case of actual failure of the primary server
Failback
This is the process of restoring the primary server from the secondary server after a planned or unplanned failover. A failover operation is usually followed by a failback operation. In this failback process, the data writes on the secondary server are also restored to the primary server. Scout also supports fast failback where the data changes of the secondary server are not applied to the primary server while restoring.
Snapshot
A snapshot is an exact replica of a primary server's data as it existed at a single point in time in retention window. The two types of snapshot are Physical Snapshot and Virtual Snapshot:
• For Physical Snapshot, you can take a snapshot on a physical volume. It requires the intended snapshot volume to be equal or larger than the Secondary server volume (in the replication pair).
• For Virtual Snapshot, you can take a snapshot on a virtual volume. It is also known as "vsnap," which requires minimal system resources and are faster in loading or unloading. These snapshots can be accessed in one of following modes:
– Read-Only—As the name indicates, read only snapshots are for informative purposes and are not capable of retaining writes on to them.
– Read-Write—Read/write virtual snapshots retain writes on to them; this is done by maintaining an archive log on some part of the local disk as specified.
– Read-Write Tracking—Read/write tracking virtual snapshots goes a step forward; this is especially useful if a new virtual snapshot has to be updated with the writes of an un-mounted virtual snapshot.
9DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
Application Consistency
Application Consistency ensures the usability of the application when DR copies of the application's primary server data are used in place of the original data. An application can be rolled back to any bookmark/tag in the retention window. Consistency bookmarks are of the following three types:
• Application bookmarks —This bookmark ensures consistency at the application level. This is issued after flushing the application buffers to the disk.
• File system bookmarks—This bookmark ensures consistency of the data at the file system level. This is issued after flushing the file system cache to the disk.
• User-defined bookmarks—This is a user-defined name for a bookmark that is associated with application bookmark or a file system bookmark or both. These are human readable bookmarks unlike the application or file system bookmarks, which are used by the DR administrators to recover the data.
InMage ComponentsA typical InMage deployment requires several different control servers be deployed both in the Enterprise and in the SP's data centers as either physical or virtual servers. For new installations, InMage recommends the OS versions shown in Table 3-1.
Table 3-2 describes the InMage control servers deployed in the topology for this project.
Note CentOS 6.4 contains the openssl-10.0.1e-16.el6_5.4.x86_64 package, which is affected by the Heartbleed vulnerability. InMage recommends using ScoutOS 6.2 (containing an OpenSSL version not impacted by Heartbleed) or upgrading SSL on the CentOS host to openssl-1.0.1e-16.el6_5.7.x86_64.
Table 3-1 Recommended OS Versions for InMage Deployments
Platform OS Version
VMware vCenter and vSphere client ESXi 5.1
InMage MT Windows Windows 2008 R2 Enterprise Edition
InMage MT Linux CentOS 6.2 64-bit
InMage CX/PS Scout CentOS 6.2 64-bit
InMage vContinuum Windows 2008 R2 Enterprise Edition
Table 3-2 InMage Control Servers Deployed
InMage Device Site Operating System InMage Version
vContinuum (Master Target) SP Windows Server 2008 R2
vCon_Windows_7.1.2.0GA Update_2
Dual Role (PS+CX) SP CentOS 6.4 * V7.1.10.0.GA.3302
Multi-Tenant Portal (MTP/RX)
SP CentOS 6.4 * V7.1.10.0.GA.3394.1
Process Server (PS) Enterprise CentOS 6.4 * V7.1.10.0.GA.3302
10DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
Alternately, SPs can use their hardened CentOS 6.2 image as InMage has a way to customize them by installing necessary RPMs from their public repository. To confirm the OpenSSL version running, use the rpm-qa | grep openssl command.
InMage Master Target
A dedicated VM created on secondary vSphere server to act as a target for replication is called master target (MT). It is used as a target for replicating disks from primary VMs and MT contains the retention (CDP) data. Retention data is the log of prior changes using which you can recover a VM to prior point in time or to a prior application consistent point.
When the initial replication plan is set up for a server or group of servers, the data volumes on the production servers are created as VMDKs on the target site and are mounted to the MT server for writing data. In the event of a disaster, the VMDK is released by the MT server and will be mounted to the actual recovery servers.
The MT should be of the same OS family as that of primary servers. If primary VMs are Windows, MT has to be Windows. For Linux primary VMs, MT must be a Linux VM.
Windows 2008 R2 is recommended to protect Windows VMs. You can have more than one master target on secondary vSphere servers. To perform failback protection and failback recovery, a MT VM is required on the primary vSphere server. In case of failback, replication is set in reverse direction from recovered secondary VMs back to MT on the primary vSphere server.
Figure 3-5 illustrates the MT functionality.
Figure 3-5 Master Target Functionality
Unified Agent
Unified Agent (aka VX Agent) is a lightweight agent that is installed on each VM or physical server protected. It offloads the data changes to the CX appliance. The Unified Agent is installed automatically through the vContinuum wizard.
Protected Server VMDK
Recovered Windows or Linux Virtual Machinesin Secure Cloud
ProtectedServers storedas Dormant VMS• System• Apps• Data
Failover at a Time ofTest or Disaster• Promote VMDK from
Master Target• Create production
VMs
2943
01
VMs Scout V2V
Physical Servers Scout P2V
PSServer
VMware ESX
VM VM VMWAN
Enterprise 1
Cloud Recovery
Apps& Data
OS
Apps& Data
OS
Apps& Data
OS
OperatingSystem
VM Resources
Application
OperatingSystem
VM Resources
Application
OperatingSystem
VM Resources
Application
MasterTarget
VM
11DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
InMage CX Server
CX Server is the combination of Configuration Server (CX-CS) and Process Server (CX-PS). The Scout PS/CS server is an appliance that performs the data movement between primary and secondary servers. It offloads various CPU intensive tasks from the primary server, such as bandwidth management, caching, and compression. It is also used to monitor protection plans through the vContinuum wizard.
The CX Server:
• Is responsible for data offload and WAN optimization functions such as:
– Compression
– Securing the data over WAN
– Data Routing
– Bandwidth Optimization
• Provides centralized UI for configuration and monitoring.
• Provides centralized error reporting using logs, SNMP, and email alerts.
• Is a mandatory component for all environments.
The CX-PS server is deployed on the Enterprise data server that receives data from the production servers and sends to the target site. It caches copies of block changes on primary servers residing in the Enterprise data center and sends them to the Master Targets residing in the SP secondary site.
The CX-CS server is deployed on the cloud SP's data center dedicated per customer to manage the replication and recovery. It allows an operator to perform tasks through a web-based UI, such as fault monitoring (e.g., RPO violation, agent heartbeat), configuration, accounting, and performance monitoring.
InMage RX Server
RX is the multi-tenant portal that enables management of all customer services through a single portal and provides:
• Centralized monitoring across all customers.
• Fully re-brandable, ready-to-use, customer-facing dashboard.
• Full-fledged API stack for deeper integration into partner portals.
• Replication health and statistics for each CS server.
• License statistics for each CS server.
• Source for alerts and notifications.
• Provisioning to create logical groups for multiple CS servers to enforce policies on multiple CS servers in one shot.
• Custom report on bandwidth usage for each CS server.
Figure 3-6 illustrates the InMage RX multi-tenant portal.
12DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
Figure 3-6 RX: Multi-Tenant Portal
InMage Management Console
The management console/GUI wizard is a Windows-based GUI wizard that guides the InMage administrator through the protection and recovery steps.
• In the case of Windows CX, it is installed along with the CX server. The vContinuum wizard can be installed on the MT.
• In the case of Linux CX, the Wizard has to be installed on Windows 2008 R2, Windows7, XP or Vista desktop.
vContinuum is stateless and does not have the current information regarding the replication status; it talks to the CX-CS server to get this information.
• The vContinuum wizard is pictured in Figure 3-7.
Figure 3-7 vContinuum Wizard
The vContinuum wizard helps the cloud provider to perform the following tasks:
• Push agents to source production servers
• Create protection plans to protect servers
2942
04
Product -ion
ServersCS Server
Customer 1
Product -ion
ServersCS Server
ManagedServicesProvider
Multi-Tenant PortalRX Server
CS Server
Mon
itor,
Alerts,
etc.
Customer 2
Secondary Site
CS/CX Server
MT
Firewall 1 Firewall 2
2943
48
WAN
Primary Site
PS Server
ProductionESX Server
StandbyESX Server
PhysicalMachine 1
PhysicalMachine 2
MT
VM1
VM2
User(vCon Wizard)
13DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
• Modify existing protection plan
• Perform DR drill
• Resume protection
• Failback
• Offline Sync
Refer to InMage's compatibility matrix for more information on supported operating system versions.
• http://support.inmage.net/partner/poc_blds/14_May_2013/Docs/vContinuum/InMage_Scout_Cloud_vContinuum_Compatibility_Matrix.pdf
InMage Self Service for Enterprise Clients
Self Service can be enabled for the customers in the following ways:
• RX Portal—RX multi-tenant portal also allows the customers to perform recovery of their environments. The SP can control this functionality by enabling the recovery option for the customer user account within the RX.
• vContinuum—vContinuum is a dedicated component deployed per-customer on the SP cloud. The SP can provide access to the vContinuum GUI to their customer who wants total control of the DR process. vContinuum allows customers to perform all the operations required for protecting and recovering the workloads into the cloud.
InMage OperationInMage ScoutCloud is based on CDP technology that gives it granular DR capabilities to meet the most stringent DR requirements. InMage ScoutCloud can be configured to support long distance DR requirements and operational recovery requirements, and also supports heterogeneous servers running on Windows, Linux, or UNIX. ScoutCloud supports a browser-based management UI allowing all management operations for both application and data recovery across different production servers and applications to be tracked and managed using a common management paradigm. A CLI management interface is also available. Management capabilities are protected using a multi-level security model.
InMage ScoutCloud replicates a production server's data to one or more secondary servers that can be either local or remote and virtual or physical systems. ScoutCloud can be deployed in existing environments without disrupting existing business continuity.
To understand how ScoutCloud works, we will look at a basic configuration with a single primary server and multiple secondary servers communicating to CX-CS server through a single CX-PS. See Figure 3-8. The CX-CS is deployed in the primary server LAN network component whose failure and/or replacement do not affect the production server's operation. The VX and FX component of ScoutCloud are deployed on the primary server, utilizing negligible resources on the primary server. They send writes asynchronously as they occur on the primary server to CX-CS. The VX and FX component of Scout are deployed on the secondary server as well, to communicate continuously with CX-CS.
14DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
Figure 3-8 InMage Scout Environment
ScoutCloud protects data by setting replication between the primary server drive/file and the secondary server drive/file. This drive-level replication process happens in stages.
• At the beginning of the replication process, a baseline copy of the primary server's drive that is being protected is created at the secondary server ("Resyncing," Step I).
• Data changes occurring during this step are sent to the secondary server ("Resyncing," Step II).
• Thereafter, Scout captures and sends only the changes in the primary server drive ("Differential Sync," Step III).
This differential sync is a continuous process that is achieved through the VX agents. Scout supports fast resync, in which the replication process starts directly from a differential sync stage instead of the replication stages. Unlike drive-level replication, file- or folder-level replication between the primary and secondary server are one-time activities, which is achieved through the FX agents.
To facilitate maintenance activities on the primary server, or in the case of an actual failure of the primary server, Scout switches the primary server to a secondary server through a failover mechanism. A failover operation is always followed by a failback operation, i.e., restoring the primary server from the secondary server. Scout uses CDP technology to replicate data so that it can restore data to any point in time. To ensure the consistency of the primary server drive date, the consistent tags/bookmarks are issued at the primary server at periodic intervals of time or on demand. The secondary server can be rolled back to any of the consistency bookmarks to ensure consistency of backup/DR data.
InMage ScoutCloud also supports storing Snapshots (exact replicas of the primary server's drive data as it existed in a single point in time) on physical or virtual volumes. Snapshots are stored as per-consistency bookmarks applied on the secondary server.
InMage Component Flow
This section looks at the flow of data between the various InMage ScoutCloud components.
Process Server (PS) to Master Target (MT)
Figure 3-9 shows the flow of data from protected servers to the PS server at the Enterprise, to the MT at the service provider.
Scout Environment
SecondaryServers(Target)
PrimaryServer
(Source)
ProcessServer
CS Server
2943
49
WANLAN
Scout Agent(VX/FX)
15DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
Figure 3-9 Data Flow: Protected Servers to the PS Server
The Agent that is running on the source-protected servers collects data from the servers as it is created and sends it to the local PS server, which then sends the data to the MT server residing at the SP premises where the data will be stored.
The PS server is also responsible for compressing and encrypting the data before sending it over to the MT server. It is capable of caching the data for extended periods of time in any WAN failure scenarios.
PS to MT (Reverse Protection)
Figure 3-10 shows the data flow from the recovered servers after a failover into the Enterprise data center.
Figure 3-10 Data Flow: Reverse Protection
The data flow is similar to the server protection scenario. The changed data from the recovered server is collected by the DataTap agent on the server and is sent to the CX/PS Scout Server on the SP side. The server compresses and encrypts the data and in turn sends the data to the MT server on the Enterprise side. An MT server is required on the Enterprise side for this failback scenario.
VMware Virtual SANVirtual SAN is a new software-defined storage solution that is fully integrated with vSphere. Virtual SAN aggregates locally attached disks in a vSphere cluster to create a storage solution-a shared datastore-that rapidly can be provisioned from VMware vCenter during VM provisioning operations. It is an example of a hypervisor-converged platform-that is, a solution in which storage and compute for VMs are combined into a single device, with storages being provided within the hypervisor itself as opposed to via a storage VM running alongside other virtual machines.
Virtual SAN is an object-based storage system designed to provide virtual machine-centric storage services and capabilities through a Storage Policy-Based Management (SPBM) platform. SPBM and VM storage policies are solutions designed to simplify VM storage placement decisions for vSphere administrators.
Virtual SAN is fully integrated with core vSphere enterprise features such as VMware vSphere High Availability (vSphere HA), VMware vSphere Distributed Resource Scheduler™ (vSphere DRS), and VMware vSphere vMotion®. Its goal is to provide both high availability and scale-out storage functionality. It also can be considered in the context of QoS because VM storage policies can be created to define the levels of performance and availability required on a per-VM basis.
2942
11
VM or HostLinux or Window
Master Target(SP)
PS ScoutServer
(Enterprise)
CompressedEncrypted
VMware ESX
VM VM VM
VMware ESX
VM VM VM
2942
12
Master Target(ENT)
RecoveredVM
CX/PS ScoutServer (SP)
CompressedEncrypted
VMware ESX
VM VM VM
VMware ESX
VM VM VM
16DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
Note This Virtual SAN technology overview has been compiled, directly and indirectly, from resources available on the Virtual SAN resources website, at the following URL: http://www.vmware.com/products/virtual-san/resources.html
The Virtual SAN Shared DatastoreThe Virtual SAN shared datastore is constructed with the minimum three ESXi hosts, each containing at least one SSD (solid-state drive) and one MD (magnetic drive). Each SSD forms a disk group on the host to which the MD belongs. The VMware VM files are stored on the MD while the SSD handles the read caching and write buffering. The disk group on each host is joined to a single Network Partition Group, shared and controlled between the hosts. Figure 3-11 illustrates a Virtual SAN cluster with the minimum configuration.
Figure 3-11 VMware Virtual SAN
For this test effort, the base Virtual SAN cluster was built with at least three hosts, each having one disk group comprised of one 400GB SSD and four 1TB MDs, controlled by a RAID controller. Each host had a single VMkernel NIC (vmk1), on the 10.32.1.0/24 network, for Virtual SAN communication on a 10Gb physical NIC. Multicast was enabled as required for Virtual SAN control and data traffic. Figure 3-12 illustrates the particular environment built for this test effort. Details of the physical configuration are given in Table 3-1 on page 3-29.
Figure 3-12 Virtual SAN Test Environment
2964
91
Virtual SAN
vSphere
Host 1
DiskGroup
vCenter Server
Host 2
DiskGroup
Host 3
DiskGroup
VM VM VM VM VM
2966
65
Host 1
vmk1 vmk1 vmk1
VSAN Network10.32.1.0/24
DiskGroup
Host 2
DiskGroup
Host 3
DiskGroup
17DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
The size and capacity of the Virtual SAN-shared datastore are dictated by the number of magnetic disks per disk group in a vSphere host and by the number of vSphere hosts in the cluster. For example, using the configuration of this test environment, the cluster is composed of three vSphere hosts, where each host contains one disk group composed of four magnetic disks of 1TB in size each, the total raw capacity of the Virtual SAN shared datastore is 11.9TB after subtracting the metadata overhead capacity.
Formulae:
• One (1) disk group x four (4) magnetic disks x 1TB x three (3) hosts = 11.9TB raw capacity
• 12TB raw capacity - 21GB metadata overhead = 11.9TB usable raw capacity
With the Cisco UCS C240-M3 rack-mount servers being used to build the Virtual SAN cluster, the theoretical maximum datastore capacity is roughly 672TB, according to the following formula:
• Three (3) disk groups x seven (7) magnetic disks x 1TB x 32 hosts = 672TB raw capacity
After the Virtual SAN shared datastore has been formed, a number of datastore capabilities are surfaced up to vCenter Server. These capabilities, which are based on storage capacity, performance, and availability requirements, are discussed in detail in Virtual SAN Storage Policy-based Management (SPBM), page 3-19. The essential point is that they can be used to create a policy that defines the storage requirements of a virtual machine.
These storage capabilities enable the vSphere administrator to create VM storage policies that specify storage service requirements that must be satisfied by the storage system during VM provisioning operations. This simplifies the VM provisioning operations process by empowering the vSphere administrator to select the correct storage for VMs easily.
Read Caching and Write BufferingThe flash-based device (e.g., SSD) in the Virtual SAN host serves two purposes: caching the reads and buffering the writes coming from the resident VMs. The read cache keeps a cache of commonly accessed disk blocks. This reduces the I/O read latency in the event of a cache hit. The actual block that is read by the application running in the VM might not be on the same vSphere host on which the VM is running.
To handle this behavior, Virtual SAN distributes a directory of cached blocks between the vSphere hosts in the cluster. This enables a vSphere host to determine whether a remote host has data cached that is not in a local cache. If it is not locally cached, the vSphere host can retrieve cached blocks from a remote host in the cluster over the Virtual SAN network. If the block is not in the cache on any Virtual SAN host, it is retrieved directly from the magnetic disks.
The write cache performs as a nonvolatile write buffer. The fact that Virtual SAN can use flash-based storage devices for writes also reduces the latency for write operations.
Because all the write operations go to flash storage, Virtual SAN ensures that a copy of the data exists elsewhere in the cluster. All VMs deployed onto Virtual SAN inherit the default availability policy settings, ensuring that at least one additional copy of the VM data is available. This includes the write cache contents.
After writes have been initiated by the application running inside of the guest operating system (OS), they are sent in parallel to both the local write cache on the owning host and the write cache on the remote hosts. The write must be committed to the flash storage on both hosts before it is acknowledged.
This means that in the event of a host failure, a copy of the data exists on another flash device in the Virtual SAN cluster and no data loss will occur. The VM accesses the replicated copy of the data on another host in the cluster via the Virtual SAN network.
18DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
Virtual SAN Storage Policy-based Management (SPBM)All VMs deployed on a Virtual SAN cluster must use a VM Storage Policy and, if none are administratively defined, the default is applied. VM Storage Policies define the requirements of the application running in the VM from an availability, sizing, and performance perspective. Five VM Storage Policy requirements in Virtual SAN exist, as described in Table 3-3.
Virtual SAN Recommendations and LimitsThe following are the limits and recommendations for Virtual SAN at this paper's publication date.
Limits:
• Maximum 32 hosts per Virtual SAN cluster
• Maximum 5 disk groups per host
• Maximum 7 MDs per disk group
• Maximum 1 SSD per disk group
Recommendations:
• Each cluster host shares identical hardware configuration
• Each cluster host has like number of disk groups
• SSD-to-MD capacity ratio of 1:10 of the anticipated consumed storage capacity before the Number of Failures to Tolerate (FTT) is considered
• Use NL-SAS drives or better due to higher queue depths than SATA drives
• RAID controller queue depth >256 (highly recommended)
• N+2 cluster node redundancy for production environments
Table 3-3 VM Storage Policy Requirements
Policy Definition Default Maximum
Number of Disk Stripes Per Object
The number of MDs across which each replica of a storage object is distributed.
1 12
Flash Read Cache Reservation
Flash capacity reserved as read cache for the storage object
0% 100%
Number of Failures to Tolerate
The number of host, disk, or network failures a storage object can tolerate. For n failures tolerated, n+1 copies of the object are created and 2n+1 hosts contributing storage are required.
1 3 (in an 8-host cluster)
Force Provisioning Determines whether the object will be provisioned even if currently available resources do not satisfy the VM Storage Policy requirements.
Disabled (Enabled)
Object Space Reservation
The percentage of the logical size of the storage object that should be reserved (thick provisioned) upon VM provisioning. (The remainder of the storage object will be thin provisioned.)
0% 100%
19DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
• Each cluster host has a single Virtual SAN-enabled VMkernel NIC
Virtual SAN RequirementsAn abbreviated listing of the requirements needed for running a Virtual SAN virtual storage environment follows:
• vCenter Server: Minimum version 5.5 Update 1
• vSphere: Minimum version 5.5
• Hosts: Minimum three (3) ESXi hosts
• Disk Controller:
– SAS or SATA HBA or
– RAID controller
–Must function in either pass-through (preferred) or RAID 0 modes
–Queue depth >256 highly recommended
• Hard Disk Drives: Minimum one (1) SAS, NL-SAS, or SATA magnetic hard drive per host
• Flash-Based Devices: Minimum one (1) SAS, SATA, or PCI-E SSD per host
• Network Interface Cards: Minimum one (1) 1Gb or 10Gb (recommended) network adapter per host.
• Virtual Switch: VMware VDS or VSS, or Cisco Nexus 1000v
• VMkernel Network: VMkernel port per host for Virtual SAN communication
Defining VM RequirementsWhen the Virtual SAN cluster is created, the shared Virtual SAN datastore-which has a set of capabilities that are pushed up to vCenter-is also created.
When a vSphere administrator begins to design a virtual machine, that design is influenced by the application it will be hosting. This application might potentially have many sets of requirements, including storage requirements.
The vSphere administrator uses a VM storage policy to specify and contain the application's storage requirements in the form of storage capabilities that will be attached to the VM hosting the application; the specific storage requirements will be based on capabilities surfaced by the storage system. In effect, the storage system provides the capabilities, and VMs consume those capabilities via requirements written in the VM storage policy.
Distributed RAIDIn additional storage environments, Redundant Array of Independent Disks (RAID) refers to disk redundancy inside the storage chassis to withstand the failure of one or more disk drives.
Virtual SAN uses the concept of distributed RAID, by which a vSphere cluster can contend with the failure of a vSphere host, or of a component within a host-for example, magnetic disks, flash-based devices, and network interfaces-and continue to provide complete functionality for all virtual machines. Availability is defined on a per-VM basis through the use of VM storage policies.
20DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
vSphere administrators can specify the number of host component failures that a VM can tolerate within the Virtual SAN cluster. If a vSphere administrator sets zero as the number of failures to tolerate in the VM storage policy, one host or disk failure can affect the availability of the virtual machine.
Using VM storage policies along with Virtual SAN-distributed RAID architecture, VMs and copies of their contents are distributed across multiple vSphere hosts in the cluster. In this case, it is not necessary to migrate data from a failed node to a surviving host in the cluster in the event of a failure.
Virtual SAN Storage Objects and ComponentsWhile the traditional understanding of a VM is that it is a set of files (.vmx, .vmdk, etc.), because the Virtual SAN datastore is an object datastore, a VM on a Virtual SAN is now made up of a set of objects. For VMs on Virtual SAN, four kinds of Virtual SAN objects exist:
• The VM home or "namespace" directory
• A swap object (if the VM is powered on)
• Virtual disks/VMDKs
• Delta-disks created for snapshots (each delta-disk is an object)
Note The VM namespace directory holds all VM files (.vmx, log files, etc.), excluding VM disks, deltas, and swap, all of which are maintained as separate objects.
Note It is important to understand how Virtual SAN objects and components are built and distributed because soft limitations exist; exceeding those limitations may affect performance.
Each object is deployed on Virtual SAN as a distributed RAID tree and each leaf of the tree is said to be a component. The policies relevant to Virtual SAN object and component count and limitations include the Failures-to-Tolerate (FTT) policy and the Stripe-Width policy. If, for example, deploying a VM with a Stripe Width of two means that a RAID-0 stripe would be configured across two magnetic disks for the VM disk. Similarly, if the FTT policy for that VM is configured as one, a RAID-1 mirror of the VM components would be set up across hosts.
Figure 3-13 represents a possible layout for the components in the above scenario. The stripe components form a RAID-0 configuration, which is then mirrored across hosts using a RAID-1 configuration.
Figure 3-13 Sample Component Layout for VM on Virtual SAN
Following are some considerations to keep in mind when working with objects and components:
• Each VM has, potentially, four kinds of objects: Namespace; VMDK; Swap; Snapshot delta-disk
2964
97
Host 1 Host 2 Host 3 Host 4 Host 5
Comp
CompComp Comp Witness
RAID 0 RAID 0
RAID 1
21DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
– Namespace—Every VM has a namespace object, and only one
– VMDK—Every VM will have one VMDK object for each attached virtual disk
– Swap—Every powered-on VM will have a swap object
– Delta-disk—Every VM will have one delta-disk object for each snapshot created
• Of the four families of objects, only the VMDKs and delta-disks will inherit the Stripe Width policy administratively applied to the VM. Because performance is not a major requirement for the namespace or swap objects, the Stripe Width will always be set to 1.
• Witness components will be created to arbitrate between remaining copies should a failure occur so that two identical copies of data are not activated at the same time. Witnesses are not objects but are components within each object RAID tree. More information on witnesses is provided below.
Note VMware recommends the default settings for NumberOfFailuresToTolerate and Stripe Width.
Witness Components
As mentioned above, witnesses are components that are deployed to arbitrate between the remaining copies of data should a failure occur within the Virtual SAN cluster, ensuring no split-brain scenarios occur. At first glance, the way witnesses are deployed seems to be illogical, but the algorithm governing this mechanism is not very complex and is worth mentioning here.
Witness deployment is not predicated on any FailuresToTolerate or Stripe Width policy setting. Rather, witness components are defined by three names (Primary, Secondary, and Tiebreaker) and are deployed based on the following three rules:
• Primary Witnesses—Primary Witnessees need at least (2 * FTT) + 1 nodes in a cluster to be able to tolerate FTT number of node / disk failures. If after placing all the data components, we do not have the required number of nodes in the configuration, primary witnesses are on exclusive nodes until there are (2*FTT)+ 1 nodes in the configuration.
• Secondary Witnesses—Secondary Witnesses are created to make sure that every node has equal voting power towards quorum. This is important because every node failure should affect the quorum equally. Secondary witnesses are added so that every node gets equal numbers of component; this includes the nodes that only hold primary witnesses. Therefore, the total count of data component + witnesses on each node are equalized in this step.
• Tiebreaker Witnesses—If, after adding primary and secondary witnesses, we end up with even number of total components (data + witnesses) in the configuration, then we add one Tiebreaker Witness to make the total component count odd.
Note This is all that will be said about witness functionality here, although Chapter 3, “Summary of Caveats” demonstrates these three rules in action in deployment examples for this project. This paper is indebted to Rawlinson's blog post on this topic, from which the three rules were quoted verbatim and to which the reader is encouraged to go to gain an even better understanding. http://www.punchingclouds.com/2014/04/01/vmware-virtual-san-witness-component-deployment-logic/
22DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum
Chapter 3 Technology Overview
Flash-Based Devices in Virtual SANFlash-based devices serve two purposes in Virtual SAN. They are used to build the flash tier in the form of a read cache and a write buffer, which dramatically improves the performance of VMs. In some respects, Virtual SAN can be compared to a number of "hybrid" storage solutions on the market that also use a combination of flash-based devices and magnetic disk storage to boost the performance of the I/O and that have the ability to scale out based on low-cost magnetic disk storage.
Read Cache
The read cache keeps a cache of commonly accessed disk blocks. This reduces the I/O read latency in the event of a cache hit. The actual block that is read by the application running in the VM might not be on the same vSphere host on which the VM is running.
To handle this behavior, Virtual SAN distributes a directory of cached blocks between the vSphere hosts in the cluster. This enables a vSphere host to determine whether a remote host has data cached that is not in a local cache. If that is the case, the vSphere host can retrieve cached blocks from a remote host in the cluster over the Virtual SAN network. If the block is not in the cache on any Virtual SAN host, it is retrieved directly from the magnetic disks.
Write Cache (Write Buffer)
The write cache performs as a nonvolatile write buffer. The fact that Virtual SAN can use flash-based storage devices for writes also reduces the latency for write operations.
Because all the write operations go to flash storage, Virtual SAN ensures that a copy of the data exists elsewhere in the cluster. All VMs deployed onto Virtual SAN inherit the default availability policy settings, ensuring that at least one additional copy of the VM data is available. This includes the write cache contents.
After writes have been initiated by the application running inside of the guest operating system (OS), they are sent in parallel to both the local write cache on the owning host and the write cache on the remote hosts. The write must be committed to the flash storage on both hosts before it is acknowledged.
This means that in the event of a host failure, a copy of the data exists on another flash device in the Virtual SAN cluster and no data loss will occur. The VM accesses the replicated copy of the data on another host in the cluster via the Virtual SAN network.
23DRaaS 2.0 InMage and Virtual SAN
Implementation Guide Addedum