VMware SDDC on IBM Cloud - Advanced Detailed Design 4 of 133 Copyright IBM and VMware 5.5.6...

Copyright IBM and VMware Page 1 of 133

VMware SDDC on IBM Cloud - Advanced

Detailed Design

Date: 16th March 2016

Version: 1.1

Page 2 of 133 Copyright IBM and VMware

Table of Contents

1 Introduction ............................................................................................................................... 8

1.1 Pre-requisites...................................................................................................................... 8

1.2 Summary of Changes ......................................................................................................... 9

2 System Context ...................................................................................................................... 10

2.1 Actors ................................................................................................................................ 10

2.2 Systems ............................................................................................................................ 11

3 Architecture Overview ............................................................................................................ 12

3.1 Physical Infrastructure ...................................................................................................... 12

3.2 Virtual Infrastructure ......................................................................................................... 13

3.3 Infrastructure Management .............................................................................................. 13

3.4 Common Services ............................................................................................................ 13

3.5 Cloud Management Services ........................................................................................... 13

3.6 Operational Services ........................................................................................................ 13

3.7 Business Services ............................................................................................................ 14

4 Logical Operational Model ...................................................................................................... 15

4.1 Logical Operational Model Structure ................................................................................ 15

4.2 Central Cloud .................................................................................................................... 17

4.3 Physical Infrastructure ...................................................................................................... 17

4.3.1 Cluster Architecture ................................................................................................. 17

4.3.2 Physical Network ..................................................................................................... 19

4.3.3 Physical Storage ...................................................................................................... 19


4.4.1 Compute Virtualization ............................................................................................. 20

4.4.2 Storage Virtualization ............................................................................................... 21

4.4.3 Network Virtualization .............................................................................................. 21


4.5.1 Compute Management ............................................................................................ 26

4.5.2 Storage Management .............................................................................................. 26

4.5.3 Network Management .............................................................................................. 27

4.6 Common Services ............................................................................................................ 27

4.6.1 Identity and Access Services ................................................................................... 27

4.6.2 Domain Name Services ........................................................................................... 28

4.6.3 NTP Services ........................................................................................................... 28

4.6.4 SMTP Services ........................................................................................................ 28

4.6.5 Certificate Authority Services ................................................................................... 28



4.7.1 Service Catalog ........................................................................................................ 28

4.7.2 Self-Service Portal ................................................................................................... 28

4.7.3 Infrastructure and Process Orchestration ................................................................ 29

4.7.4 Software Orchestration ............................................................................................ 29


4.8.1 Backup and Restore ................................................................................................ 29

4.8.2 Disaster Recovery .................................................................................................... 30

4.8.3 Monitoring ................................................................................................................ 32

4.8.4 Log Consolidation and Analysis ............................................................................... 33

4.8.5 Patching ................................................................................................................... 34

4.9 Business Services ............................................................................................................ 34

4.9.1 Business Management ............................................................................................ 34

4.9.2 IT Financials ............................................................................................................. 34

4.9.3 IT Benchmarking ...................................................................................................... 34

4.10 Cloud Region ................................................................................................................ 35

5 Physical Operational Model .................................................................................................... 36

5.1 Physical Layer .................................................................................................................. 36

5.1.1 Compute .................................................................................................................. 36

5.1.2 Storage .................................................................................................................... 38

5.1.3 Network .................................................................................................................... 38


5.2.1 Compute Virtualization ............................................................................................. 39

5.2.2 Storage Virtualization ............................................................................................... 39

5.2.3 Network Virtualization .............................................................................................. 42


5.3.1 vCenter Server Instances ........................................................................................ 63

5.4 Common Services ............................................................................................................ 66

5.4.1 Identity and Access Services ................................................................................... 66

5.4.2 Domain Name Services ........................................................................................... 68

5.4.2.2 DNS Configuration Requirements ....................................................................... 69

5.4.3 NTP Services ........................................................................................................... 70

5.4.4 SMTP Services ........................................................................................................ 70

5.4.5 Certificate Authority Services ................................................................................... 71


5.5.1 Cloud Management Physical Design ....................................................................... 74

5.5.2 vRealize Automation Supporting Infrastructure ....................................................... 80

5.5.3 vRealize Automation Cloud Tenant Design ............................................................. 80

5.5.4 vRealize Automation vSphere Integration Design ................................................... 85

5.5.5 Infrastructure Source Endpoints .............................................................................. 89


5.5.6 Virtualization Compute Resources .......................................................................... 89

5.5.7 Process Orchestration ............................................................................................. 90

5.5.8 Software Orchestration ............................................................................................ 94

5.5.9 Infrastructure Orchestration ..................................................................................... 96


5.6.1 Backup and Restore ................................................................................................ 96

5.6.2 Disaster Recovery .................................................................................................. 101

5.6.3 Monitoring .............................................................................................................. 106

5.6.4 Log Consolidation and Analysis ............................................................................. 114

5.6.5 Patching ................................................................................................................. 121

5.7 Business Services .......................................................................................................... 122

Appendix A – Bare Metal Summary ............................................................................................ 124

Management Cluster Nodes .................................................................................................... 124

Compute Cluster Nodes ........................................................................................................... 124

Edge Cluster Nodes ................................................................................................................. 125

Appendix B – Software Bill of Materials ...................................................................................... 126

Appendix C – Management Virtual Machine Summary............................................................... 128

Appendix D – Maximum Configurations ...................................................................................... 131

Appendix E – Compatibility Guide ............................................................................................... 132

Browsers .................................................................................................................................. 132

Guest Operating Systems ........................................................................................................ 132

Table of Figures

Figure 1 VMware SDDC on IBM Cloud Introduction 8

Figure 2 VMware SDDC on IBM Cloud System Context 10

Figure 3 VMware SDDC on IBM Cloud Architecture Overview 12

Figure 4 Logical Structure View 15

Figure 5 Component Interaction Diagram 16

Figure 6 Logical Operational Model 17

Figure 7 Logical Cluster Structure 18

Figure 8 Network Virtualization 22

Figure 9 Dual-Region Data Protection Architecture 30

Figure 10 Disaster Recovery Architecture 31

Figure 11 vRealize Operations Manager Architecture 32

Figure 12 Physical Operational Model - Virtual Servers, Networking and Clusters 36


Figure 13 Network connections per physical node 39

Figure 14 VSAN concept 40

Figure 15 Network Switch Design for Management Hosts 44

Figure 16 Network Switch Design for Edge Hosts 47

Figure 17 Network Switch Design for Compute Hosts 49

Figure 18 Network Virtualization Conceptual Design 54

Figure 19 Cluster Design for NSX for vSphere 55

Figure 20 Virtual Application Network Components and Design 59

Figure 21 vRA Virtual Network Design 60

Figure 22 Virtual Application Network Configuration in Central Cloud and Cloud Region 62

Figure 23 vCenter Server and PSC Deployment Model 64

Figure 24 vRealize Automation Conceptual Design 72

Figure 25 vRealize Automation Design Overview for Central Cloud 75

Figure 26 vRealize Automation Design Overview for Additional Cloud Regions 76

Figure 27 Tenant Design for Single Region 81

Figure 28 Tenant Design for Two Regions 82

Figure 29 vRealize Automation Integration with vSphere Endpoint – Central Cloud 87

Figure 30 vRealize Automation Integration with vSphere Endpoint – Central Cloud and a Cloud

Region (Region A) 88

Figure 31 Template Synchronization 90

Figure 32 Software Orchestration Logical Design 95

Figure 33 vSphere Data Protection Logical Design 97

Figure 34 Logical Network Design for Cross-Region Deployment with Management Application

Network Container 102

Figure 35 Logical Design of vRealize Operations Manager Central Cloud and a Cloud Region

(Region A) Deployment 107

Figure 36 Networking Design of the vRealize Operations Manager Deployment 110

Figure 37 Application Virtual Networks in the vRealize Operations Manager Topology 111

Figure 38 Logical Design of vRealize Log Insight 114

Figure 39 Networking Design for the vRealize Log Insight Deployment 116

Figure 40 Application Virtual Networks in the vRealize Log Insight Topology 117

Figure 41 vRealize Business Logical Design 122

List of Tables

Table 1 VMware SDDC on IBM Cloud Interfaced Actors 10

Table 2 VMware SDDC on IBM Cloud Interfaced Systems 11

Table 3 vRealize Operations Manager Logical Node Architecture 33

Table 4 NFS Configuration for vSphere Data Protection and vRealize Log Insight 38


Table 5 VSAN disk table 40

Table 6 VSAN policies 41

Table 7 VSAN object policy defaults 42

Table 8 VLAN Mapping to Traffic Types 42

Table 9 Management Cluster Distributed Switch 43

Table 10 Management Cluster Distributed Switch Port Group Configuration Settings 43

Table 11 Management Virtual Switch Port Groups and VLANs 45

Table 12 Management VMkernel Adapter 45

Table 13 Edge Cluster Distributed Switch 46

Table 14 Management Cluster Distributed Switch Port Group Configuration Settings 46

Table 15 Edge Virtual Switch Port Groups and VLANs 48

Table 16 Edge VMkernel Adapter 48

Table 17 Compute Cluster Distributed Switch 48

Table 18 Compute Cluster Distributed Switch Port Group Configuration Settings 49

Table 19 Compute Virtual Switch Port Groups and VLANs 50

Table 20 Compute VMkernel Adapter 50

Table 21 NSX Components Sizing 53

Table 22 Load Balancer Features 57

Table 23 Management Applications IP Addressing 61

Table 24 OSPF Area ID 62

Table 25 Specifications for Management vCenter Server Appliance 65

Table 26 Specifications for Platform Service Controller for Management Cluster 65

Table 27 Specifications for Compute and Edge vCenter Server Appliance 65

Table 28 Specifications for Platform Service Controller for Management Cluster 65

Table 29 Requirements for Active Directory Service 67

Table 30 Authentication types used 68

Table 31 Server Sizing 68

Table 32 Domain Naming Example 69

Table 33 SoftLayer DNS servers 70

Table 34 Time sources 70

Table 35 Root CA and Subordinate CA sizing 72

Table 36 Cloud Management Services Components 73

Table 37 Load Balancer Application Profile 78

Table 38 Load Balancer Service Monitoring Configuration 79

Table 39 Load Balancer Pool Specifications 79

Table 40 Virtual Server Characteristics 79

Table 41 Base Windows Server Blueprint 83


Table 42 Base Windows Blueprint Sizing 84

Table 43 Base Linux Server Blueprint 84

Table 44 Base Linux Blueprint Sizing 84

Table 45 vRealize Integration with vSphere 85

Table 46 vRealize Orchestrator Default Configuration Ports 91

Table 47 vRealize Orchestrator Default External Communication Ports 91

Table 48 vRO Service Monitor Specifications 93

Table 49 vRO Service Pool Characteristics 93

Table 50 vRO Virtual Server Characteristics 93

Table 51 Software Orchestration Components Sizing 96

Table 52 VMware vSphere Data Protection Performance 97

Table 53 Backup Jobs in Central Cloud 99

Table 54 Backup Jobs in Additional Cloud Region 101

Table 55 SRM Windows server sizing 105

Table 56 vSphere Replication Appliance 106

Table 57 Analytics Cluster Node Configurations 107

Table 58 DRS Cluster Anti-Affinity Rule for vRealize Operations Manager Nodes 108

Table 59 Remote Collector Node Sizes 108

Table 60 DRS Cluster Anti-Affinity Rule for vRealize Operations Remote Collector Nodes 109

Table 61 IP Subnets in the Application Virtual Network of vRealize Operations Manager 111

Table 62 DNS Names for the Application Virtual Networks 112

Table 63 Node Sizing 115

Table 64 IP Subnets in the Application Isolated Networks 117

Table 65 Example DNS names of Log Insight nodes 117

Table 66 Virtual Disk Configuration in the vRealize Log Insight Virtual Appliance 118

Table 67 Compute Resources for vUM vCenter Managing the Management Cluster 121

Table 68 Compute Resources for vUM vCenter Managing the Compute and Edge Clusters 121

Table 69 Management - Bare Metal Bill of Materials 124

Table 70 Compute - Bare Metal Bill of Materials 124

Table 71 Edge - Bare Metal Bill of Materials 125

Table 72 Software Bill of Materials 126

Table 73 List of Management Cluster Virtual Machines and Sizes 128

Table 74 List of Default Edge Cluster Virtual Machines 130


1 Introduction VMware Software Defined Data Center (SDDC) on IBM Cloud allows existing VMware virtualized

datacenter clients to extend into the IBM Cloud. This permits uses like capacity expansion into the cloud

(and contraction when not needed), migration to the cloud, disaster recovery to the cloud, backup into the

cloud and the ability to stand up a dedicated cloud environment for development, test, training or lab.

This document details the design of the Advanced version of the VMware SDDC on IBM Cloud which

targets designs requiring high levels of scalability and multiple regions.

Figure 1 VMware SDDC on IBM Cloud Introduction

1.1 Pre-requisites

The design requires the following pre-requisites:

Client is required to acquire all necessary software licenses and/or keys for all products used in

this design prior to commencement of implementation

Client is required to provide SoftLayer account

Client is responsible for SoftLayer related charges as a result of this design’s implementation

Client is responsible for connectivity from this design to any on premises environment or systems

Client is responsible for connectivity into this design for access by administrators and end users

Client is responsible to acquire and provide domain name

Client is responsible to provide hostname prefixes for the SoftLayer bare metal devices

provisioned through this design

Client to provide connection details and necessary credentials for any systems external to this

design that are to be integrated (refer to system context for options)

Client is responsible for licensing of any software products provisioned with the design


1.2 Summary of Changes

This section records the history of significant changes to this document. Only the most significant changes

are described here.

Version Date Author Description of Change

1.0 16th Feb

2016

Simon Kofkin-Hansen

Richard Ehrhardt

Razvan Ionescu

Daniel de Araujo

Frank Chodacki

Bob Kellenberger

Bryan Buckland

Christopher Moss

Daniel Arrieta Alvarez

Initial Release of Document

1.1 16th March

2016

Minor reported spelling and grammar corrections.

1.2 30th Sept

2016

Corrected NIC model number in Appendix A,

added Appendix F


2 System Context When depicting the VMware SDDC on IBM Cloud design as a single object, the following are the external

actors and systems that interface with the design.

SoftLayer

VMware on IBM Cloud

ClientOn PremisevSphere

Connects To

ManagesCloud

ConsumesCloud Services

ProvidesBare Metal Compute,

Storage, andNetwork

SendsEmails

TrustRelationship

for CloudServices

ExternalDomain

for CloudServices

TimeSynchronisation

for Cloud Services

BuildsEnvironment

ManagesServices

NTP

ClientDNS

ClientAuth.

SMTPRelay

PatchRepo

Checksfor Updates

RecipesPatches

End UsersService Provider

Cloud Admin

Figure 2 VMware SDDC on IBM Cloud System Context

2.1 Actors

The actors that interface with the design are described in the following table. There is not a direct

correlation between actors and persons. An actor role may be performed by one or more persons.

Alternately, one person may perform more than one actor role.

Table 1 VMware SDDC on IBM Cloud Interfaced Actors

Actor Description

Cloud Admin The cloud admin or administrator is responsible

for maintaining the cloud services.

This includes,

Assigning virtual resources to groups

Maintaining Cloud Software Platform

System Administrator roles

Service Provider Manages the cloud services that are provided to

the client users.

This includes,

Service Catalog configuration

Defining roles

Defining groups

Configuring user access

Tenant Administrator roles


User Consumes the services that the cloud admin

allows access to. This typically includes,

Provisioning VMs

De-provisioning VMs

Provisioning Patterns

De-provisioning Patterns

Start / Stop / restart VMs and Patterns

2.2 Systems

The systems that interface with the design are described in the following table.

Table 2 VMware SDDC on IBM Cloud Interfaced Systems

System Description

SoftLayer SoftLayer provides the bare metal, physical

networking and NFS storage in addition to the

automation to build the design when ordered.

Client On Premises vSphere The design is able to connect to an existing

vSphere environment on a client premises to

enable Hybrid capabilities.

Client SMTP Relay The design connects its SMTP server to a client’s

SMTP relay service to provide notifications on

aspects such as the process orchestration.

Client Authentication The design is able to connect to an existing client

authentication system to establish a trust

relationship which extends the client’s

authentication system into the cloud for use by the

cloud management platform.

Client DNS The design is able to connect to a client’s domain

name service (DNS) to extend the domain service

into the cloud for use by the cloud management

platform.

NTP Service The design requires an external NTP service to

provide time synchronization services for use by

the cloud management platform.

Patch Repo There are a number of internet based patch

repositories that the cloud management platform

applications need to connect to in order to

maintain the security and stability of the cloud

environment.


3 Architecture Overview

VMware SDDC on IBM Cloud provides VMware automation technology on SoftLayer. This includes

virtual networking, virtual storage, process orchestration, infrastructure orchestration and software

orchestration. It also provides the tools for management of the services providing these functions. The

architecture consists of at least one central cloud region built on SoftLayer, which provides the main portal

for users and administration, plus it can include one or more cloud regions which are managed by the

central cloud and provide additional functionality for remote locations. The architecture is scaled out within

a region, or by adding regions.

Figure 3 VMware SDDC on IBM Cloud Architecture Overview

3.1 Physical Infrastructure

The physical infrastructure consists of three main components, physical compute, physical network and

physical storage. The physical compute provides the physical processing and memory that is used by the

virtualization infrastructure. The physical network provides the network connectivity into the environment

that is then consumed by the network virtualization. The physical storage provides the raw storage capacity


consumed by the virtualization infrastructure. For this design the physical infrastructure components are

provided by SoftLayer bare metal and all components are supported on the VMware Hardware

Compatibility Guide (HCG).

3.2 Virtual Infrastructure

The physical infrastructure is consumed by the virtual infrastructure. The virtual infrastructure reflects the

physical with three different components, compute virtualization, storage virtualization and network

virtualization. Each of these interface with the respective component in the physical infrastructure. The

virtual infrastructure is installed on each physical device to form a node, for example a compute node. All

the virtual resources interface to the virtual infrastructure for access to the physical infrastructure. The

virtual infrastructure is accessed by either the cloud admin or the infrastructure management component..

3.3 Infrastructure Management

Infrastructure management provides the logic to ensure the maximum benefit derived for the virtual

infrastructure. This includes functions such as pooling virtual infrastructure and moving virtual resources

off a node for maintenance or in the case of node failure. It controls placement of virtual resources on

nodes to balance load along with placement based on business rules. It is only accessed by the cloud admin.

All other access to this component is through API access from other components.

3.4 Common Services

Common services provides the services which are consumed by the other cloud management services. It

includes, identity and access services, SMTP services, NTP services, domain name services and certificate

authority. This component is also the primary interface to external systems. Common services can connect

to the client’s DNS for requests outside the domain managed by the cloud services. It connects to the

external NTP service to synchronize it’s NTP service with an outside stratum. A trust relationship can be

established between the common services and the client’s authentication service for common authentication

to the cloud services.

3.5 Cloud Management Services

The cloud management services provide the primary interface to the users to consume cloud services in

addition to the orchestration engines to process the service requests. The self-service portal is used as the

primary interface to view the available cloud services (the service catalog) as well as to obtain a view of

existing cloud resources that are deployed. The service catalog is the list of available services that are

managed by the service provider. The service provider is able to determine which services are available to

specific users or groups. The process orchestration engine controls the steps required to perform a service.

This includes actions such as, obtaining an approval or connecting to an operational service system as part

of the process. The process orchestration engine calls the infrastructure orchestration engine to orchestrate

the build of the virtual resources for a service. The software orchestration engine builds the software that

runs on the virtual resources.

3.6 Operational Services

Operational services provide monitoring, patching, log consolidation, log analysis, disaster recovery and

backup services for the cloud management platform. The monitoring looks for issues with the cloud

management platform and notifies cloud admin via alerts on the operations console as well as emails via

the external SMTP relay. The patching connects to the external patch repository in order to obtain update

information in support of the security or stability of the cloud management platform. Log consolidation

collects the logs from the cloud management platform into a central repository which the log analysis

service then operates on to provide the cloud admin with diagnostic information. The backup service keeps

copies of the cloud management platform outside of the virtual infrastructure so it can be restored in the

event of failure or corruption. The disaster recovery service needs at least one cloud region separate to the


central cloud to which the cloud management platform are replicated to. In the event of failure at the

primary site, the cloud management platform is restarted at the cloud region.

3.7 Business Services

The business services component provides the service provider with analytics on IT financials, business

management and benchmarking aspects of the cloud. The IT financials provides the service provider with

details of the total cost of cloud ownership. The business management functions provide metering and

chargeback capabilities for the service provider by user or group. The benchmarking functions provide the

ability for service providers to analyze where the IT spend for the cloud is going, where it could be in the

future and paths that need to be taken to improve.


4 Logical Operational Model

The logical operational model provides guidance as to the design elements required to meet the functional

requirements.

4.1 Logical Operational Model Structure

The design consists of two distinct elements. A central cloud, through which the user and service provider

manage the entire cloud, and optionally, one or more associated cloud regions. Only the central cloud

contains the self-service portal. Additional regions (cloud regions) are added to provide remote sites, or

additional capacity beyond that of a single central cloud within the same site. Each cloud region is

configured into the central cloud for management. On premises vSphere environments are connected to

SoftLayer via either a VPN connection over the internet or dedicated links to form additional cloud regions.

The design of on premises vSphere environments is outside the scope of this document.

Figure 4 Logical Structure View


Within a central cloud, the components interact with each other as follows:

Figure 5 Component Interaction Diagram

Both the central cloud and any additional cloud regions are built on SoftLayer.


4.2 Central Cloud

The central cloud hosts the primary portal through which users access the cloud services. It has connections

to all remote regions.

The functions in a central cloud map to the following software products described in more detail in this

section.

Physical Infrastructure

Cloud ManagementServices

Common Services Operational Services

Virtual Infrastructure

Business Services

Infrastructure Management

Compute Management Storage Management

Software Orchestration

Process Orchestration

Network Management

Physical StoragePhysical Compute Physical Network

Storage VirtualisationCompute Virtualisation Network Virtualisation

Service Catalog

Self Service Portal

IT Benchmarking

Infrastructure Orchestration

Domain Name Services

Identity & Access Services

NTP Services

SMTP Services

Certificate Authority Services

vRLI

Log Consolidation & Analysis

vRO

PS

Monitoring

vUM

Patching

SRM Disaster

Recovery

vDP Backup &

Restore

IT FinancialsBusiness Management

Mic

roso

ft A

ctiv

e D

irec

tory

+ s

ervi

ces

VM

war

e v

Rea

lize

Suit

e

vCenter

ESXi VSAN NSX

SoftLayer

vRealize Business

Figure 6 Logical Operational Model

4.3 Physical Infrastructure

The physical Infrastructure is broken up into compute, storage and network. The compute and storage areas

are combined in the cluster architecture. The network area is described in the network transport section.

4.3.1 Cluster Architecture

This design splits the physical layer into clusters. A cluster represents the aggregate of the compute and

memory resources of all the hosts in the cluster. All hosts in the cluster share network and storage

resources. The use of clusters allows workloads (user or management) to be placed onto specific hardware.


Each cluster is managed as a single entity, so user workloads can be managed separately from management

workloads.

The design differentiates between the following types of clusters:

Compute cluster (one or more)

Management cluster

Edge cluster

Storage cluster

Figure 7 Logical Cluster Structure

4.3.1.1.1 Compute Clusters

Compute clusters host the VMware SDDC on IBM Cloud users’ virtual machines (sometimes

referred to as workloads or payloads). Each compute cluster is built using SoftLayer bare metal.

The environment is scaled by adding nodes to the initial compute cluster up the maximum number

of nodes per cluster (refer to the physical operational model for details). Once the maximum has

been reached, additional compute clusters are added to the environment.

4.3.1.1.2 Management Cluster

The management cluster houses the virtual machines that manage the cloud. Like the compute

clusters, the management cluster is built using SoftLayer bare metal. These servers host vCenter

Server, NSX Manager, NSX Controller, vRealize Operations Management, vRealize Log Insight,

vRealize Automation, and other shared management components.

4.3.1.1.3 Edge Cluster

Edge clusters connect the virtual networks (overlay networks) provided by NSX for vSphere and

the external networks. This includes both north-south (into the environment from outside) and

east-west (between management and compute clusters) communications.

Edge clusters provide the following main functions:


Support on-ramp and off-ramp connectivity to physical networks

Connect to client on premises environments

4.3.1.1.4 Storage Cluster

A storage cluster provides network-accessible storage via NFS. This is used for backup and log

archive purposes. The compute, management and edge clusters utilize Virtual Storage Area

Network (VSAN) which aggregates disks located in each node of the clusters.

4.3.2 Physical Network

The physical and layer 2 networking is handled by SoftLayer. The SoftLayer physical fabric provides a

robust IP transport layer with the following characteristics:

Simplicity

Scalability

High bandwidth

Fault-tolerant transport

4.3.2.1.1 Simplicity

The network infrastructure at SoftLayer is simplified and standardized on three physical networks

containing public, private, and out of band management (IPMI) traffic. Both the private and public

networks are deployed to utilize up to 20Gbps bandwidth per physical host. The out of band

management network is connected with a 1Gbps link per host.

Upon ordering infrastructure components within SoftLayer, VLANs are provisioned for each of

the three networks mentioned. If existing VLANs exist within an environment and there is enough

space for a bare metal device to be placed in the same pod, SoftLayer will automatically assign the

new device an IP on the same VLAN. The design incorporates SoftLayer portable subnets to

provide IP addressing for virtual machines as well as IP addresses for the bare metal hosts.

SoftLayer has also standardized the networking infrastructure using best-of-breed networking

vendors. As a result, SoftLayer is able to implement and reuse automation patterns to setup,

configure, and monitor the network infrastructure. Some of this automation has been exposed via

API and is used by this design to simplify management tasks.

4.3.2.1.2 Scalability

The SoftLayer network is designed in a multi-tier model. Each rack in a SoftLayer datacenter

contains 2 frontend customer switches (FCS) and 2 backend customer switches (BCS) connected

to the public and private networks, respectively. These customer switches then connect to separate,

peered aggregation switches; the aggregated switches are then attached to a pair of separate routers

for L3 networking. This multi-tier design allows the network to scale across racks, rows, and pods

within the SoftLayer datacenter.

4.3.2.1.3 High Bandwidth

Every upstream network port in the SoftLayer datacenter has multiple 10Gbps or 40Gbps

connections. Every rack is terminated with multiple 10Gbps or 40Gbps connections to the public

Internet and multiple 10Gbps or 40Gbps connections to the private network.

4.3.2.1.4 Fault-tolerant transport

Redundancy is provided at the server level using 2x10Gbps NICs. Additionally, the backend

server, frontend server, aggregation switches and routers are redundantly connected.

4.3.3 Physical Storage

There are two types of storage used within this design: VSAN and NFS.


operating system and any applications running on it could be impacted. By using vSphere

virtualization, virtual machines can be re-started on remaining hosts in a cluster in the event of the

catastrophic failure of a host. This can also be used by the cloud admin to take a host offline for

maintenance without affecting workloads on the cluster.

4.4.1.4 Performance

The amount of physical resources available to virtual machines can be controlled through vCenter

resource pools. This allows for different resource pools which can have higher or lower priority of

physical resources based on share allocation.

4.4.1.5 Workload Movement

By linking a central cloud with one or more cloud regions, vCenter Server allows a virtual

machine to be migrated to a remote cloud region or from cloud region to cloud region. This is also

possible from an on premises installation attached to the same environment. This enables

workloads to be migrated from client premises into the cloud and back again..

4.4.2 Storage Virtualization

Storage virtualization provides two levels of virtualization. The first is the virtualization of the storage

arrays and the second is the virtualization of the block storage used by virtual machines.

4.4.2.1 Virtual Storage Area Network (VSAN)

Virtual Storage Area Networking (VSAN) emulates a physical storage area network entirely

within the virtualization layer. Each host in the cluster contains local drives that are combined in

software to behave as a single disk array that is shared between all the hosts in the cluster as a

shared datastore.

Since there is no physical storage area network, VSAN has the advantage of fewer components

(no external drive array, fiber cabling, etc.). It allows ease of scaling when adding new compute

nodes with less administration like performing tasks such as, LUN allocation which are no longer

necessary. Plus, VSAN provides high performance since local disk is used and disk I/O is spread

out across all hosts within a cluster.

Storage policies are used to define storage attributes such as performance and protection levels.

The policy is set per virtual machine allowing great flexibility with the service levels available.

4.4.2.2 Virtual Machine Disks (VMDK)

Each virtual machine has at least one virtual machine disk (VMDK). Additional disks can be

added to a virtual machine. The virtual disks are provisioned on to the datastores provided by

VSAN. All virtual disks are thin provisioned, so unused disk space within a single virtual disk

does not take up datastore disk capacity.

4.4.3 Network Virtualization

Network virtualization provides a network overlay that exist within the virtual layer. As a result, it can

provide more rapid provisioning, deployment, re-configuration and destruction over physical devices.


4.4.3.1 Network Virtualization Components

The network virtualization architecture of this design utilizes VMware NSX for vSphere and

vSphere Distributed Switches (vDS). The virtualized network is organized hierarchically, with the

following components from bottom to top:

Data plane with the NSX vSwitch and additional components

Control plane with the NSX Controller

Management plane with the NSX Manager

Consumption plane with a Cloud management portal

Figure 8 Network Virtualization

4.4.3.1.1 Distributed Virtual Switches

This design implements vSphere distributed switches. vSphere Distributed Switch (vDS) offers

several enhancements over standard virtual switches.

Centralized management. Because distributed switches are created and managed

centrally on a vCenter Server system, they make the switch configuration more consistent

across ESXi hosts. Centralized management saves time, reduces mistakes, and lowers

operational costs.

Additional features. Distributed switches offer features that are not available on

standard virtual switches. Some of these features can be useful to the applications and

services that are running in the organization’s infrastructure. For example, NetFlow and

port mirroring provide monitoring and troubleshooting capabilities to the virtual

infrastructure.


Distributed virtual switch implements health checks. The health check service helps identify and

troubleshoot configuration errors in vSphere distributed switches.

Health check helps identify the following common configuration errors:

Mismatched VLAN trunks between a vSphere distributed switch and physical switch.

Mismatched MTU settings between physical network adapters, distributed switches, and

physical switch ports.

Mismatched virtual switch teaming policies for the physical switch port-channel settings.

Health check monitors VLAN, MTU, and teaming policies:

VLANs. Checks whether the VLAN settings on the distributed switch match the trunk

port configuration on the connected physical switch ports.

MTU. For each VLAN, checks whether the physical access switch port MTU jumbo

frame setting matches the distributed switch MTU setting.

Teaming policies. Checks whether the connected access ports of the physical switch that

participate in an EtherChannel are paired with distributed ports whose teaming policy is

IP hash.

Health check is limited to the access switch port to which the distributed switch uplink connects.

With network I/O control, the distributed switch allocates bandwidth for the following system

traffic types:

vSphere vMotion traffic

Management traffic

VMware vSphere Replication traffic

NFS traffic

VMware Virtual SAN traffic

vSphere Data Protection backup traffic

Virtual machine traffic

Fault tolerance traffic

iSCSI traffic

Network I/O control details

The bandwidth for each network resource pool is controlled by setting the physical adapter shares

and host limits. The bandwidth for virtual machines is controlled by bandwidth reservation for an

individual VM, similar to the way memory and CPU reservation is used.

The physical adapter shares assigned to a network resource pool determine the share of the total

available bandwidth guaranteed to the traffic that is associated with that network resource

pool. The share of transmit bandwidth that is available to a network resource pool is determined

by these factors:

The network resource pool's shares.

Other network resource pools that are actively transmitting.

4.4.3.1.2 Data Plane

The NSX data plane consists of the NSX vSwitch, which is based on the vSphere Distributed

Switch (vDS) and includes additional components. These components include kernel modules

(VIBs), which run within the ESXi kernel and provide services such as virtual distributed

router (VDR) and distributed firewall (DFW). The NSX kernel modules also enable Virtual

Extensible LAN (VXLAN) capabilities.

The NSX vSwitch abstracts the physical network and provides access-level switching in the

hypervisor. It is central to network virtualization because it enables logical networks that


are independent of physical constructs such as VLAN. The NSX vSwitch provides multiple

benefits.

Three types of overlay networking capabilities:

Creation of a flexible logical Layer 2 overlay over existing IP networks on

existing physical infrastructure.

Support for east/west and north/south communication while maintaining

isolation between tenants.

Support for application workloads and virtual machines that operate as if they

were connected to a physical Layer 2 network.

Support for VXLAN and centralized network configuration.

A comprehensive toolkit for traffic management, monitoring and troubleshooting within a

virtual network which includes port mirroring, NetFlow/IPFIX, configuration backup and

restore, network health check, Quality of Service (QoS), and Link Aggregation Control

Protocol (LACP)

In addition to the NSX vSwitch, the data plane also includes gateway devices (NSX Edge

gateways), which can provide Layer 2 bridging from the logical networking space (VXLAN) to

the physical network (VLAN). NSX Edge Gateway devices offer Layer 2, Layer 3, perimeter

firewall, load-balancing and other services such as Secure Socket Layer (SSL), Virtual Private

Network (VPN), and Dynamic Host Control Protocol (DHCP).

4.4.3.1.3 Control Plane

The NSX control plane runs in the NSX Controller, which enables unicast VXLAN and control-

plane programming of elements such as VDR (virtual distributed router). Unicast support is

necessary as the multicast IP range per each VLAN is limited within SoftLayer. The number of

multicast or unicast IPs determines the number of VXLANs that can be provisioned.

In all cases the controller is part of the control plane and does not have any data plane

traffic passing through it. The controller nodes are deployed in a cluster per NSX Manager to

enable high availability and scalability. A failure of one or all controller nodes does not impact

data plane traffic.

4.4.3.1.4 Management Plane

The NSX management plane consists of the NSX Manager, which is the single point

of configuration, and the REST API entry-points. NSX Manager integrates with vCenter. There is

one NSX Manager per vCenter Server.

4.4.3.1.5 Consumption Plane

Different actors interact with NSX for vSphere to access and manage the associated services in

different ways:

Cloud admin can manage the NSX environment from the vSphere Web Client.

Users can consume the network virtualization capabilities of NSX for vSphere through the

CMP (vRealize Automation) UI when deploying applications.

4.4.3.2 Network Virtualization Services

Network virtualization services include logical switches, logical routers, logical firewall, and other

components of NSX for vSphere.


4.4.3.2.1 Logical Switches

Cloud deployments have a variety of applications that are used across multiple tenants. These

applications and tenants require isolation from each other for security, fault isolation, and

overlapping IP addresses. The NSX for vSphere logical switch creates logical broadcast domains

or segments to which an application or tenant virtual machine can be logically wired. This allows

for flexibility and speed of deployment while still providing all the characteristics of a physical

network's broadcast domains (VLANs) without physical Layer 2 sprawl or spanning tree issues.

A logical switch is distributed and can span arbitrarily large compute clusters. This allows for

virtual machine mobility (migration with vMotion) within a region and between regions, without

limitations of the physical Layer 2 (VLAN) boundary.

4.4.3.2.2 Logical Routers

Dynamic routing provides the necessary forwarding information between Layer 2 broadcast

domains, thereby allowing the cloud admin to decrease the size of Layer 2 broadcast domains and

improve network efficiency and scale. NSX for vSphere extends this intelligence to where the

workloads reside for east/west routing. This allows more direct VM-to-VM communication

without the costly need to extend hops. At the same time, logical routers provide north/south

connectivity, thereby enabling users to access public networks.

4.4.3.2.3 Logical Firewall

NSX for vSphere Logical Firewall provides security mechanisms for dynamic virtual datacenters.

The Distributed Firewall component of Logical Firewall allows a cloud admin to segment

virtual datacenter entities like virtual machines based on VM names and attributes, user

identity, vCenter objects like datacenters, and hosts, or based on traditional networking

attributes like IP addresses, port groups, and so on.

The Edge Firewall component helps a cloud admin to meet key perimeter security

requirements, such as building DMZs based on IP/VLAN constructs, tenant-to-tenant

isolation in multi-tenant virtual datacenters, Network Address Translation (NAT), partner

(extranet) VPNs, and user-based SSL VPNs.

The Flow Monitoring feature displays network activity between virtual machines at the

application protocol level. The cloud admin can use this information to audit network traffic,

define and refine firewall policies, and identify threats to a client’s network.

4.4.3.2.4 Logical Virtual Private Networks (VPN’s)

SSL VPN-Plus allows remote users to access private corporate applications. IPSec VPN offers

site-to-site connectivity between an NSX Edge instance and remote sites. L2 VPN allows users to

extend their datacenter by allowing virtual machines to retain network identity across geographical

boundaries.

4.4.3.2.5 Logical Load Balancers

The NSX Edge load balancer enables network traffic to follow multiple paths to a specific

destination. It distributes incoming service requests evenly among multiple servers in such a way

that the load distribution is transparent to users. Load balancing thus helps in achieving optimal

resource utilization, maximizing throughput, minimizing response time, and avoiding overload.

NSX Edge provides load balancing up to Layer 7.

4.4.3.2.6 Service Composer

Service Composer helps provision and assign network and security services to applications in a

virtual infrastructure. The service provider maps these services to a security group, and the

services are applied to the virtual machines in the security group.



The infrastructure management element manages the compute, network and storage virtual resources

provided by the lower layer. It also provides consolidation services to the upper layers for operational

services. These functions are provided by VMware vCenter Server.

4.5.1 Compute Management

In this design, VMware vCenter is employed to centralize the management of the compute resources within

each ESXi host. While the ESXi hosts can be managed individually, placing them under vCenter control

enables the following capabilities:

Centralized control and visibility of all aspects within managed ESXi hosts and virtual machines.

Provides the single pane of glass interface view via the vCenter web client for compute, network

and storage management.

Proactive Optimization. Enables allocation and optimization of resources for maximum efficiency

across the ESX hosts. See section 6.4.1 Compute Virtualization for optimization features enabled

by vCenter Server.

Extended management function for other integrated products and services such as VMware NSX,

VMware Data Protection, VMware Update Manager and others as “snap-ins” extending the

vCenter web interface.

Monitoring, alerting, scheduling. Cloud admins can view events, alerts within the vCenter web

client, and configure scheduled actions.

Automation engine. VMware vCenter is the engine which performs the tasks given to it via the

vSphere API web interface. VMware vRealize Automation and vRealize Orchestration are

examples of applications that drive vCenter actions via the API.

4.5.2 Storage Management

VMware vCenter enables centralized storage management within this design which allows for

configuration and management of the following storage types:

Local disk storage. Local hard disk drives (HDD) or solid state drives (SDD) that are attached to

the local ESXi hosts.

Storage area attached storage (SAN). Remote block storage that is attached to the ESXi host via

fiber channel or TCPIP protocols.

Network attached storage. (NAS) File based storage that is attached to the ESXi hosts via the NFS

protocol.

Virtual SAN Storage. Configured within the cluster object in vCenter, enables the aggregation of

local disk storage across ESXi hosts into a shared pool of storage across all ESX hosts within a

given cluster. Once configured, an outage of the vCenter server does not affect the availability of

VSAN storage to the cluster.

Within this design vCenter management of storage is primarily focused on NAS and VSAN storage as

SAN is not employed. Only the ESXi host OS and swap space are used on local non VSAN disk storage.

4.5.2.1 NFS Storage management

vCenter is responsible for configuring the mounting of NFS data stores to each ESXi host within a

cluster. This ensures its access availability to any virtual machine with virtual disk files (VMDK)


residing on the NFS based data store, should a vMotion of the virtual machine from one ESXi host

to another occur within a cluster.

4.5.2.2 VSAN Storage management

Here again, the vCenter interface or web API has the capability of configuring VSAN data stores

for a particular cluster at the cluster object level within the vCenter interface. Configuring VSAN

within vCenter involves the following areas of configuration:

Licensing. Prior to enabling VSAN, a valid license within vCenter licensing section is

required.

VSAN network. Used to configure the network VSAN will use for its back plane

network. Virtual machines are made storage fault tolerant across ESXi hosts local disks

on this network).

Disk group configuration. On each ESXi host that contributes its local disks to a Virtual

SAN cluster, disks are organized into disk groups. A disk group is a main unit of storage

on a host. Each disk group includes one SSD and one or multiple HDDs.

VSAN Policies. Storage policies define the virtual machine storage characteristics.

Storage characteristics specify different levels of service for different virtual machines.

4.5.3 Network Management

vCenter Server is used to create standard and distributed virtual switches. The virtual switches connect

virtual machine (VM) network interfaces to portgroups allowing for communication between VM’s hosted

on the same host or different hosts. To establish communication between hosts, virtual switches need to be

connected to physical uplinks which are the network interfaces of the ESXi hosts. VM’s connected on the

same virtual switch and hosted on the same host can communicate directly without the need of an external

uplink.

vCenter Server enables creation of distributed portgroups for virtual machines (aggregated virtual ports

with a particular set of specifications).

4.6 Common Services

Common services provide the services used by other services in the cloud management platform. This

includes identity and access services, domain name services, NTP services, SMTP services and Certificate

Authority Services.

4.6.1 Identity and Access Services

In this design, Microsoft (MS) Active Directory (AD) is employed to provide authentication and directory

services back end to the VMware Platform Service Controller (PSC) and VMware identity appliance.

Within this design the VMware software components authenticate against the identity appliance which in

turn authenticates against the MS AD service. The AD in this design can be extended to other regions by

adding an additional AD server for that particular region’s subdomain.


4.6.2 Domain Name Services

Domain Name Services (DNS) within this design are for the cloud management and infrastructure

components only. DNS in this design serves as host name to IP resolution for the cloud management

platform and service resolution for the AD components. When an instance of this design is tied to a

customer on premises solution, this design’s DNS servers are referenced by the on premises DNS

infrastructure in addition to acting as a proxy for the customer’s DNS infrastructure.

4.6.3 NTP Services

This design’s NTP servers are a sub stratum of the SoftLayer infrastructure NTP server. They serve to

ensure that all components are in time synchronization for the needs of authentication, replication,

clustering, log synchronization and certificate services. This includes physical and virtual components.

4.6.4 SMTP Services

Simple Mail Transfer Protocol (SMTP), is utilized within this design by various components for the needs

of outbound notification only. For inbound email requirements (vRealize Automation, vRealize Business),

the customer email servers need to be configured.

4.6.5 Certificate Authority Services

An Enterprise Certificate Authority (CA) based on Microsoft (MS) CA services built into MS Windows is

employed in this solution to replace self-signed certificates for web interfaces within this design.


The cloud management services provide the service catalog, self-service portal and orchestration. This is

provided by VMware vRealize Automation, vRealize Orchestrator and Rapid Deployment Services (RDS)

pattern automation.

4.7.1 Service Catalog

The service catalog is published through the self service catalog and allows users to request the provided

services which can include provisioning new virtual machines from templates, provisioning new

environments consisting of one or more virtual machines with software products as blueprints (also known

as patterns), or managing existing deployed resources. Advanced services are also available through the

service catalog by calling the orchestration component for process orchestration.

The service provider role is able to customize the services available to users as well as publish additional

services.

4.7.2 Self-Service Portal

The self service portal provides a single point of access for users to the VMware SDDC on IBM Cloud

solution. Authentication to the portal is performed against the Active Directory service.


4.7.3 Infrastructure and Process Orchestration

Orchestration is provided by vRealize Orchestrator. It allows for tasks and remediation actions to be

automated including integration with third party IT operations software.

vRealize Orchestrator consists of:

Workflow designer which incorporates an easy-to-use drag and drop interface to assemble

workflows. The designer runs on Windows, Linux and Mac OS desktops.

Scripting designer which allows for new building blocks to be created or imported for the vRealize

Orchestrator platform.

Orchestration engine which runs the workflows and associated scripts.

The default implementation includes a built-in workflow library with common tasks. Workflows are able to

be versioned and packaged to assist with change management.

4.7.4 Software Orchestration

Software Orchestration is provided by a Rapid Deployment Services (RDS) solution with IBM Open

Patterns. RDS implements a distributed file repository and the configuration management tools to deliver

IBM Open Patterns on deployed workloads. IBM Open Patterns describe the pre-defined architecture of

an application. For each component of the application (i.e. database, web server, etc.), the pattern defines:

Pre-installation on an operating system Pre-integration across components Pre-configured & tuned Pre-configured Monitoring Pre-configured Security Lifecycle Management


Operational services provide management of the cloud services. This includes backup & restore, disaster

recovery, monitoring, log consolidation & analysis and patching functions.

4.8.1 Backup and Restore

The data protection service protects the infrastructure that provides the virtualization, operations, security

and cloud services. It does not protect any deployed user virtual machines.

Data protection solutions provide the following functions in the design:

Back up and restore virtual machines and database applications.

Store data according to company retention policies.


Inform administrators about backup and restore activities through reports.

vSphere Data Protection provides the data protection service in each region. This is separate to disaster

recovery and applies even if only the central cloud exists.

An FTP server is used to backup NSX Manager. The FTP server supports SFTP and FTP protocols.

Figure 9 Dual-Region Data Protection Architecture

4.8.2 Disaster Recovery

The disaster recovery service adds to the data protection service by protecting the management services in

the case of a complete site failure. It is an optional service to provide additional protection.

Since this requires more than one site, it is only applicable where a central cloud and at least one cloud

region has been included.

VMware Site Recovery Manager (SRM) and vSphere Replication are used to provide this service, together

with keeping the same IP addressing of the cloud management services at both sites.

Note: Each central cloud or cloud region in this design is equivalent to the site construct in Site Recovery

Manager.

Since the central cloud contains the portal and manages the services in all the regions, the following

applications are in scope of disaster recovery protection:


vRealize Automation together with VMware vRealize Orchestrator

Analytics cluster of vRealize Operations Manager

The services that support the services at each site do not require disaster recovery protection. This includes;

vSphere, NSX and vCenter services which manage the services at the local site only.

Authentication, DNS and NTP which is distributed to the cloud regions anyway.

vRealize Log Insight and Software Orchestration which is replicated to all cloud regions

Figure 10 Disaster Recovery Architecture


4.8.3 Monitoring

vRealize Operations Manager is used to track and analyze the operation of multiple data sources within the

design by using specialized analytics algorithms. These algorithms help vRealize Operations Manager to

learn and predicts the behavior of every object it monitors. Users access this information by using views,

reports, and dashboards.

vRealize Operations Manager contains functional elements that collaborate for data analysis and storage,

and support creating clusters of nodes with different roles.

Figure 11 vRealize Operations Manager Architecture

For high availability and scalability, several vRealize Operations Manager instances are deployed in the

management cluster where they have the following roles:

Master Node. Required initial node in the cluster. In large-scale environments the master node

manages all other nodes. In small-scale environments, the master node is the single standalone

vRealize Operations Manager node.

Master Replica Node. Enables high availability of the master node.

Data Node. Enables scale-out of vRealize Operations Manager in larger environments. Data

nodes have adapters installed to perform collection and analysis. Data nodes also host vRealize

Operations Manager management packs.

Remote Collector Node. Enables navigation through firewalls, interfaces with a remote data

source, reduces bandwidth across regions, or reduces the load on the vRealize Operations

Manager analytics cluster. Remote collector nodes only gather objects for the inventory and

forward collected data to the data nodes. Remote collector nodes do not store data or perform

analysis. In addition, they can be installed on a different operating system than the rest of the

cluster nodes.

The master and master replica nodes are data nodes with extended capabilities.


vRealize Operations Manager forms two types of cluster according to the nodes that participate in a cluster:

Analytics clusters. Tracks, analyzes, and predicts the operation of monitored systems. Consists of

a master node, data nodes, and master replica node.

Remote collectors cluster. Only collects diagnostics data without storage or analysis. Consists

only of remote collector nodes.

The functional components of a vRealize Operations Manager instance interact to provide analysis

of diagnostics data from the datacenter and visualize the result in the Web user interface.

Table 3 vRealize Operations Manager Logical Node Architecture

Architecture Component Diagram Description

Admin / Product UI server. The UI server is a Web

application that serves as both user and administration

interface.

REST API / Collector. The Collector collects data

from all components in the datacenter.

Controller. The Controller handles the data flow the

UI server, Collector, and the analytics engine.

Analytics. The Analytics engine creates all

associations and correlations between various data sets,

handles all super metric calculations, performs all

capacity planning functions, and is responsible for

triggering alerts.

Persistence. The persistence layer handles the read and

write operations on the underlying databases across all

nodes.

FSDB. The File System Database (FSDB) stores

collected metrics in raw format. FSDB is available in

all the nodes.

xDB (HIS). The xDB stores data from the Historical

Inventory Service (HIS). This component is available

only on the master and master replica nodes.

Global xDB. The Global xDB stores user preferences,

alerts, and alarms, and customization that is related to

the vRealize Operations Manager. This component is

available only on the master and master replica nodes.

4.8.4 Log Consolidation and Analysis

Log consolidation and analysis provides consolidation of the logs that are produced by each of the cloud

services together with analysis of those logs. For this design, this function is provided by vRealize Log

Insight.

vRealize Log Insight provides real-time log management and log analysis with machine learning-based

intelligent grouping, high-performance searching, and troubleshooting across physical, virtual, and cloud

environments.


vRealize Log Insight collects data from ESXi hosts using the syslog protocol. It connects to vCenter Server

to collect events, tasks, and alarms data, and integrates with vRealize Operations Manager to send

notification events and enable launch in context.

4.8.5 Patching

Patching of the VMware software components is achieved with VMware Update Manager. This includes

the VMware ESXi hosts, virtual appliances and management tooling in the design. It connects to the

internet to obtain the latest vulnerability patches and automatically applies user-defined patches to the

relevant components to eliminate the vulnerabilities.


Business services are those services that provides business functions. This includes business management,

IT financials and IT benchmarking. vRealize Business (VRB) is configured to provide financial

information, reporting and modeling. VRB integrates with vRealize Automation.

4.9.1 Business Management

vRealize Business provides the following business management capabilities:

Automatic private cloud metering

costing and pricing

4.9.2 IT Financials

vRealize Business provides the following capability for financial management:

Automatic service catalog pricing (Integrated with vRealize Automation)

Private cloud consumption analysis

Out-of-the-box reporting (Exportable data set)

4.9.3 IT Benchmarking

Additionally, vRealize Business can assist in modeling cost projections across cloud environments.

Private cloud and public cloud cost comparison


4.10 Cloud Region

The cloud region is a child instance of the design. It is not standalone and requires a central cloud to

provide the cloud management services. Provisioning and management of virtual resources is done through

the central cloud.

The cloud management services do not exist for a cloud region since this is handled by the central cloud

and the operational management services which contain collectors/relays to pass information back to the

central cloud.


5 Physical Operational Model The physical operational model elaborates by applying the non-functional requirements to the logical

operational model.

Figure 12 Physical Operational Model - Virtual Servers, Networking and Clusters

5.1 Physical Layer

5.1.1 Compute

The design leverages SoftLayer to provide the compute. This allows for flexibility in provisioning bare

metal. Compute nodes are able to be deployed rapidly without needing orders to be delivered and the same

nodes are able to be decommissioned without needing to wait for depreciation schedules or reselling.

SoftLayer offers a variety of bare metal Intel based hardware from 1U to 4U chassis sizes, 2GB of memory

up to 3TB and from 4 CPU cores up to 48 cores. For this design, a 2U server has been selected to allow for

the lowest cost of entry, while still allowing for scaling up to 10,000 deployed VMs in a single central

cloud or cloud region.

For security and to isolate management, network and user workloads (resources) this design branches out to

3 vSphere cluster types with the following functions:

Management Cluster

Edge Cluster

Compute Clusters

This allows the scaling of one function independent of the other as the deployment is scaled out, making for

a more effective use of resources.


5.1.1.1 Management Cluster

The management cluster is the heart of operation and control of the design. It is sized from onset

of deployment to allow for Compute cluster expansion and additional feature expansion without

requiring additional sever nodes.

It consists of 4 nodes of the following specification:

2 x 12 core CPUs (24 cores total), plus Hyperthreading

256GB RAM

~6.3TB Usable VSAN storage

~1TB Usable Local Disk for operating system

Refer to Appendix A – Bare Metal Summary for hardware details.

5.1.1.2 Edge Cluster

VMware NSX is an integral part of the design. “Edge” virtual appliances are used where VPN end

points or load balancing is required. Edge virtual appliances are controlled and can be dynamically

provisioned as user applications are “spun up” with patterns or are preconfigured to support

management functions. In either case they are deployed to the Edge cluster. This ensures that

network connectivity and performance are not affected by varying workloads in the other clusters.

It is sized from onset of deployment to allow for Compute cluster expansion without requiring

additional servers. As the Edge virtual appliances are small, VSAN storage requirements are held

to a minimum while maintaining redundancy and performance.



128GB RAM

~ 3.2TB Usable VSAN storage

~ 1TB Usable Local Disk for operating system


5.1.1.3 Compute Cluster

As the users or administrators within the customer’s organization deploy whatever applications

they require via vRealize Automate, the compute workloads requested get deployed to this cluster.

With “scale up” as well as “out” in mind, high resource intensive applications and other mixed

workloads are able to be absorbed. Additional clusters are provisioned when the capacity of the

each cluster is reached.



512GB RAM

~6.3TB Usable VSAN storage

~1TB Usable Local Disk for operating system


Each node supports 80 VMs of 2 vCPU, 8GB RAM and 70GB Disk. Taking into account the

following:

CPU over-commit 7

90% CPU usage limit on ESXi

Memory over-commit is 1.6

80% memory usage limit on ESXi


No disk over-commit

Maximum of 48 nodes per cluster / Cluster and 3 clusters for 10,000 VM support.

Cluster sizing based upon 1 node failure per 15 nodes. So up to 15 nodes, a single node is reserved

in case of failure. Between 15 and 30 nodes, there are two nodes kept in reserve and so forth up to

45 nodes where 3 nodes are in reserve.

5.1.2 Storage

Network File System (NFS) is a file system protocol that allows a user on a client computer to access files

over a network much like local storage is accessed. In this case, the client computer is an ESXi host, and

the storage is provided by a NFS-capable external storage array.

The management cluster uses VMware Virtual SAN for primary storage and NFS for secondary storage.

For compute clusters, the decision on which technology to use is based on the performance, capacity, and

capabilities (replication, deduplication, compression, etc.) required by the workloads that are running in the

clusters.

For this design, additional storage for the ESXi management cluster is connected to SoftLayer File Storage

on Performance Storage via 10 Gbps links with Jumbo Frames (MTU 9000) enabled. For the virtual

machines that provide the backup service and log archiving (i.e., vSphere Data Protection and vRealize Log

Insight) the NFS datastores are configured in the following manner:

Table 4 NFS Configuration for vSphere Data Protection and vRealize Log Insight

Product Configuration

vSphere Data Protection vSphere Data Protection is I/O intensive and is placed on its own,

unique NFS datastore sized at 4TB

vRealize Log Insight vRealize Log Insight uses NFS datastores sized at 1TB for archive

storage and can be shared will other virtual machines.

5.1.3 Network

The design leverages SoftLayer physical networking and this is broken up into server connections, VLANs

and MTU packet sizing.

5.1.3.1 Server connections

Each compute node (physical server) within the design has two 10Gb Ethernet connection into

each SoftLayer Top of Rack (ToR) switch (public and private) setup as individual connections

(un-bonded) for a total of 4 x 10Gbps connections. This allows each networking interface card

(NIC) connection to work independent of one another.


Figure 13 Network connections per physical node

5.1.3.2 VLANs

Four VLANs are included per design including the public network. They are as listed for this

design:

Private VLAN 1 (default / untagged)

Private VLAN 2 (trunked / tagged)

Private VLAN 3 (trunked / tagged )

Public VLAN 4 (default / untagged)

5.1.3.3 MTU Sizing

The private network connections are configured to use jumbo frame MTU size of 9000. This is

maximum MTU allowed within VMware and SoftLayer limit which improves performance for

large data transfers such as storage and vMotion. The public network connections use a standard

Ethernet MTU frame size of 1500. This needs to be maintained as any changes may affect packet

fragmentation over the internet.

5.2 Virtual Infrastructure

Virtual infrastructure consists of compute, storage and network virtualization.

5.2.1 Compute Virtualization

This design uses VMware vSphere ESXi version 6.0 u1 to virtualize the management, compute, and edge

servers. The ESXi hypervisor is installed on the 2x1TB RAID-1 disk array contained on each server.

RAID-1 is used in this design to provide redundancy for the vSphere hypervisor.

5.2.2 Storage Virtualization

For this design, VMware Virtual Storage Area Network (VSAN) storage is employed for all storage needs

with the exception of vRealize Log Insight log archival storage and VMware Data Protection backup

storage. VSAN allows for the local storage across multiple ESXi hosts within a vSphere cluster to be

represented as a single virtual machine datastore. VSAN supports only SATA, SAS HDD, and PCIe

storage. Two 1TB SATA drives are excluded in each node regardless of which cluster it belongs to for the

purpose of housing the ESXi installation.


Figure 14 VSAN concept

5.2.2.1 RAID Controller

As of this version of the design, only the Avago MegaRAID 9361-8i RAID controller within

SoftLayer hardware is supported by VMware VSAN. Disk caching need is disabled and the

controller is set in JBOD mode for all VSAN drives. Disks and disk groups

Depending on the cluster function (Management, Edge, Compute) the number and sizes of the

disks changes. Each VSAN disk group requires a solid state disk (SSD) for a cache layer. Within

all clusters, a 2U server type is employed with a total of 12 drive slots maximum capacity.

Excluding the ESXi OS drives, the following is the drive layout for each cluster type:

Table 5 VSAN disk table

Cluster type Number

of VSAN

disk

groups

Number

of SSDs

# of SSD

+ HDD

per disk

group

Size of

SSDs

Number

of HDDs

Size of

HDDs

Management 2 2 1+4 1,200GB 8 2,000GB

Edge 1 1 1+4 1,200GB 4 2,000GB

Compute 2 2 1+4 1,200GB 8 2,000GB

5.2.2.2 Virtual network setup

For this design the VSAN traffic will traverse between ESXi hosts on a dedicated private VLAN.

No other traffic will occupy the VSAN VLAN. The two network adapters attached to the private

network switch and configured in within vSphere as a virtual distributed switch (vDS) with both

network adapters as uplinks. A dedicated VSAN kernel port configured for the VSAN VLAN

resides within the vDS. Jumbo frame are enabled for the private vDS.

5.2.2.3 Virtual SAN Policy Design

Once VMware Virtual SAN is enabled and configured, create storage policies that define the

virtual machine storage characteristics. Storage characteristics specify different levels of service


for different virtual machines. The default storage policy tolerates a single failure and has a single

disk stripe. Use the default unless a client’s environment requires policies with non-default

behavior. If a custom policy is configured, Virtual SAN will guarantee it; however, if Virtual SAN

cannot guarantee a policy, it is not possible to provision a virtual machine that uses the policy

unless it is enabled to force provisioning. This design will use the default policy.

Table 6 VSAN policies

Capability Use Case Value Comments

Number of

failures to

tolerate

Redundancy Default

1

Max 3

A standard RAID 1 mirrored configuration that

provides redundancy for a virtual machine disk. The

higher the value, the more failures can be tolerated.

For n failures tolerated, n+1 copies of the disk are

created, and 2n+1 hosts contributing storage are

required.

A higher n value indicates that more replicas of

virtual machines are made, which can consume more

disk space than expected.

Number of disk

stripes per

object

Performance Default

1

Max 12

A standard RAID 0 stripe configuration used to

increase performance for a virtual machine disk.

This setting defines the number of HDDs on which

each replica of a storage object is striped.

If the value is higher than 1, increased performance

can result. However, an increase in system resource

usage might also result.

Flash read

cache

reservation (%)

Performance Default

0

Max

100%

Flash capacity reserved as read cache for the storage

is a percentage of the logical object size that will be

reserved for that object.

Only use this setting for workloads if a client must

address read performance issues. The downside of

this setting is that other objects cannot use a reserved

cache.

VMware recommends not using these reservations

unless it is absolutely necessary because unreserved

flash is shared fairly among all objects.

Object space

reservation (%)

Thick

provisioning

Default

0

Max

100%

The percentage of the storage object that will be thick

provisioned upon VM creation. The remainder of the

storage will be thin provisioned.

This setting is useful if a predictable amount of

storage will always be filled by an object, cutting

back on repeatable disk growth operations for all but

new or non-predictable storage use.

Force

provisioning

Override

policy

Default:

No

Force provisioning allows for provisioning to occur

even if the currently available cluster resources

cannot satisfy the current policy.

Force provisioning is useful in case of a planned

expansion of the Virtual SAN cluster, during which

provisioning of VMs must continue. Virtual SAN


Capability Use Case Value Comments

automatically tries to bring the object into compliance

as resources become available.

By default, policies are configured based on application requirements. However, they are applied

differently depending on the object.

Table 7 VSAN object policy defaults

Object Policy Comments

Virtual machine

namespace

Failures-to-Tolerate: 1 Configurable. Changes are not recommended.

Swap Failures-to-Tolerate: 1 Configurable. Changes are not recommended.

Virtual disk(s) User-Configured

Storage Policy

Can be any storage policy configured on the

system.

Virtual disk

snapshot(s)

Uses virtual disk policy Same as virtual disk policy by default. Changes

are not recommended.

If a user-configured policy is not specified, the default system policy of 1 failure to tolerate and 1

disk stripe is used for virtual disk(s) and virtual disk snapshot(s). Policy defaults for the VM

namespace and swap are set statically and are not configurable to ensure appropriate protection for

these critical virtual machine components. Policies must be configured based on the application’s

business requirements. Policies give Virtual SAN its power because it can adjust how a disk

performs on the fly based on the policies configured.

5.2.3 Network Virtualization

This design uses vSphere virtual distributed switches and VMware NSX for vSphere to implement virtual

networking. By using NSX, this design implements software-defined networking.

5.2.3.1 Virtual Distributed Switch Design

The design uses a minimum number of switches. Clusters connected to public network is

configured with two distributed virtual switches. Clusters restricted to private network have only

one distributed virtual switch.

Separating different types of traffic is required to reduce contention and latency. Separate

networks are also required for access security. VLANs are used to segment physical network

functions. This design uses four (4) VLANs. Three (3) for private network traffic and one (1) for

public network traffic. Traffic separation is detailed in the section below.

Table 8 VLAN Mapping to Traffic Types

VLAN Traffic Type

VLAN1 ESXi Management, VXLAN (VTEP)

VLAN2 VSAN

VLAN3 vMotion, NFS, vSphere Replication

VLAN4 All – Internet access

Traffic from workloads will travel on VXLAN backed logical switches.


5.2.3.1.1 Management Cluster Distributed Switches

The management cluster uses a dual vSphere Distributed Switch with the configuration settings

shown in this section.

Table 9 Management Cluster Distributed Switch

vSphere

Distributed

Switch

Name

Function Network

I/O

Control

Load

Balancing

Mode

Number of

Physical

NIC Ports

MTU

vDS-Mgmt-

Priv ESXi management

Network IP Storage

(NFS)

Virtual SAN

vSphere vMotion

VXLAN Tunnel

Endpoint (VTEP)

vSphere

Replication/vSphere

Replication NFC

Enabled Based on

source

MAC hash

2 9,000

(Jumbo

Frame)

vDS-Mgmt-

Pub External

management traffic

(North-South)

Enabled Based on

source

MAC hash

2 1,500

(default)

Table 10 Management Cluster Distributed Switch Port Group Configuration Settings

Parameter Setting

Load balancing Route based on the source MAC hash

Failover detection Link status only

Notify switches Enabled

Failback No

Failover order Active uplinks: Uplink1, Uplink2

Management cluster hosts are connected to both private and public networks. There are 2

distributed virtual switches: one for private network and one for public network. Each switch has a

dedicated pair of 10Gbps network adapters.


Figure 15 Network Switch Design for Management Hosts


Table 11 Management Virtual Switch Port Groups and VLANs

vSphere

Distributed

Switch

Port Group

Name

Teaming Uplinks VLAN ID

vDS-Mgmt-Priv

vDS-Mgmt-Priv-

Management

Source MAC

hash

Active: 0, 1 VLAN1

vDS-Mgmt-Priv vDS- Mgmt-Priv -

vMotion

Source MAC

hash

Active: 0, 1 VLAN3

vDS-Mgmt-Priv vDS- Mgmt-Priv -

VSAN

Source MAC

hash

Active: 0, 1 VLAN2

vDS-Mgmt-Priv


5.2.3.1.2 Edge Cluster Distributed Switches

The edge cluster uses a dual vSphere Distributed Switch with the configuration settings shown in

this section.

Table 13 Edge Cluster Distributed Switch

vSphere

Distributed

Switch Name

Function Network

I/O

Control

Load

Balancing

Mode

Number of

Physical

NIC Ports

MTU

vDS-Edge-Priv ESXi

management

Virtual SAN

vSphere

vMotion

VXLAN

Tunnel

Endpoint

(VTEP)

Enabled Based on

source

MAC hash

2 9,000

(Jumbo

Frame)

vDS-Edge-Pub External

user traffic

(North-

South)

Enabled Based on

source

MAC hash

2 1500

(default)

Table 14 Management Cluster Distributed Switch Port Group Configuration Settings

Parameter Setting




Failback No



Edge cluster hosts are connected to both private and public networks. There are 2 distributed

virtual switches: one for private network and one for public network. Each switch has a dedicated

pair of 10Gbps network adapters.

Figure 16 Network Switch Design for Edge Hosts


Table 15 Edge Virtual Switch Port Groups and VLANs

vSphere

Distributed

Switch

Port Group

Name


vDS-Edge-Priv

vDS-Edge-Priv-

Management

Source MAC hash Active: 0, 1 VLAN1

vDS-Edge-Priv vDS-Edge-Priv -

vMotion


vDS-Edge-Priv vDS-Edge -Priv-

VSAN


vDS-Edge-Priv vDS-Edge-Priv-

VTEP


vDS-Edge-Pub vDS-Edge-Pub-

External


Table 16 Edge VMkernel Adapter

vSphere

Distributed

Switch

Network Label Connected Port

Group

Enabled Services MTU

vDS-Edge-Priv

Management vDS-Edge-Priv -

Management

Management

Traffic

1,500 (default)

vDS-Edge-Priv vMotion vDS-Edge-Priv -

vMotion

vMotion Traffic 9,000

vDS-Edge-Priv VTEP vDS-Edge-Priv -

VTEP

- 9,000

vDS-Edge-Priv VSAN vDS-Edge-Priv -

VSAN

VSAN 9,000

5.2.3.1.3 Compute Cluster Distributed Switches

The compute cluster uses a single vSphere Distributed Switch with the configuration settings

shown in this section.

Table 17 Compute Cluster Distributed Switch

vSphere

Distributed

Switch Name

Function Network

I/O

Control

Load

Balancing

Mode

Number of

Physical

NIC Ports

MTU

vDS-Compute-

Priv ESXi

management

Virtual SAN

vSphere

vMotion

VXLAN

Tunnel

Enabled Based on

source

MAC hash

2 9,000

(Jumbo

Frame)


vSphere

Distributed

Switch Name

Function Network

I/O

Control

Load

Balancing

Mode

Number of

Physical

NIC Ports

MTU

Endpoint

(VTEP)

Table 18 Compute Cluster Distributed Switch Port Group Configuration Settings

Parameter Setting




Failback No


Compute cluster hosts are connected to the private network. The switch has a dedicated pair of

10Gbps network adapters.

Figure 17 Network Switch Design for Compute Hosts


Table 19 Compute Virtual Switch Port Groups and VLANs

vSphere

Distributed

Switch

Port Group

Name


vDS-Compute-

Priv

vDS-Compute-

Priv-

Management

Source MAC

hash

Active: 0, 1 VLAN1

vDS- Compute

-Priv

vDS-Compute-

Priv-vMotion

Source MAC

hash

Active: 0, 1 VLAN3

vDS- Compute

-Priv

vDS-Compute-

Priv-VSAN

Source MAC

hash

Active: 0, 1

VLAN2

vDS- Compute

-Priv

vDS-Compute-

Priv-VTEP

Source MAC

hash

Active: 0, 1 VLAN1

Table 20 Compute VMkernel Adapter

vSphere

Distributed

Switch

Network Label Connected

Port Group

Enabled

Services

MTU

vDS-Compute-

Priv

Management vDS-Compute-

Priv-

Management

Management

Traffic

1,500 (default)

vDS-Compute-

Priv

vMotion vDS-Compute-

Priv-vMotion

vMotion Traffic 9,000

vDS-Compute-

Priv

VTEP vDS-Compute-

Priv-VTEP

- 9,000

vDS-Compute-

Priv

VSAN vDS-Compute-

Priv-VSAN

VSAN 9,000

5.2.3.1.4 NIC Teaming

With the exception of VSAN, this design uses NIC teaming to avoid single point of failure and

provide load balancing. To accomplish this, the design uses an active-active configuration with a

route that is based on source MAC hash for teaming.

5.2.3.1.5 Network I/O Control

This design uses Network I/O control enabled on all distributed switches with default shares.

Network I/O control increases resiliency and performance of the network.

Utilizing network I/O control, this design uses the distributed switch to allocate bandwidth for the

following system traffic types:

vSphere vMotion traffic

Management traffic

VMware vSphere Replication traffic

NFS traffic

VMware Virtual SAN traffic

vSphere Data Protection backup traffic

Virtual machine traffic


5.2.3.1.6 VXLAN

This design uses VXLAN to create isolated, multi-tenant broadcast domains across datacenter

fabrics and to enable customers to create elastic, logical networks that span physical network

boundaries.

VXLAN works by creating Layer 2 logical networks that are encapsulated in standard Layer 3 IP

packets. A Segment ID in every frame differentiates the VXLAN logical networks from each other

without any need for VLAN tags. As a result, large numbers of isolated Layer 2 VXLAN

networks can coexist on a common Layer 3 infrastructure.

5.2.3.2 Software-Defined Network Design

NSX offers the following Software-Defined Network (SDN) capabilities crucial to support the

cloud management platform operations.

load balancing

firewalls

routing

logical switches

VPN access

Because NSX for vSphere is tied to a vCenter Server domain, this design uses two separate NSX

instances. One instance is tied to the management vCenter Server, and the other instance is tied to

the compute and edge vCenter Server.

SDN capabilities are consumed via the cloud management platform, vSphere web client and API.

The design uses API’s to automate the deployment and configuration of NSX components by

users and cloud admin actors.

5.2.3.3 NSX for vSphere Components

This section describes the NSX for vSphere component configuration.

5.2.3.3.1 NSX Manager

NSX Manager provides the centralized management plane for NSX for vSphere and has a one-to-

one mapping to a vCenter Server instance. This design uses two NSX managers (one for each

vCenter Server in the design).

NSX Manager performs the following functions.

Provides the single point of configuration and the REST API entry-points in a vSphere

environment for NSX for vSphere.

Is responsible for deploying NSX Controller clusters, Edge distributed routers, and Edge

service gateways in the form of OVF appliances, guest introspection services, and so on.

Is responsible for preparing ESXi hosts for NSX for vSphere by installing VXLAN,

distributed routing and firewall kernel modules, and the User World Agent (UWA).

Communicates with NSX Controller clusters via REST and with hosts via the RabbitMQ

message bus. The internal message bus is specific to NSX for vSphere and does not

require setup of additional services.

Generates certificates for the NSX Controller instances and ESXi hosts to secure control

plane communications with mutual authentication.

5.2.3.3.2 NSX Controller

The NSX Controllers perform the following functions:

Provide the control plane to distribute VXLAN and logical routing information to ESXi

hosts.

Include nodes that are clustered for scale-out and high availability.

Slice network information across cluster nodes for redundancy.


Remove requirement of VXLAN Layer 3 multicast in the physical network.

Provide ARP suppression of broadcast traffic in VXLAN networks.

NSX for vSphere control plane communication occurs over the management network.

This design implements a cluster of 3 NSX controllers for each NSX manager enabling high

availability for the controllers.

5.2.3.3.3 NSX vSwitch

The NSX for vSphere data plane consists of the NSX vSwitch. This vSwitch is based on the

vSphere Distributed Switch (vDS) with additional components for add-on services. The add-on

NSX for vSphere components include kernel modules which run within the hypervisor kernel and

provide services such as distributed logical router (DLR) and distributed firewall (DFW), and

enable VXLAN capabilities.

This design uses NSX vSwitch.

5.2.3.3.4 NSX Logical Switching

NSX for vSphere logical switches create logically abstracted segments to which tenant virtual

machines can be connected. A single logical switch is mapped to a unique VXLAN segment and is

distributed across the ESXi hypervisors within a transport zone. It allows line-rate switching in the

hypervisor without the constraints of VLAN sprawl or spanning tree issues.

This design uses NSX Logical Switching for handling compute workloads and connectivity

between different network zones.

5.2.3.3.5 Distributed Logical Router

The NSX for vSphere Distributed Logical Router (DLR) is optimized for forwarding in the

virtualized space, that is, forwarding between VMs on VXLAN- or VLAN-backed port groups.

This design does not use distributed logical routers.

5.2.3.3.6 User World Agent

The User World Agent (UWA) is a TCP (SSL) client that facilitates communication between the

ESXi hosts and the NSX Controller instances as well as the retrieval of information from the NSX

Manager via interaction with the message bus agent. UWA is installed on each ESXi host.

5.2.3.3.7 VXLAN Tunnel Endpoint

VXLAN Tunnel Endpoints (VTEPs) are responsible for encapsulating VXLAN traffic as frames

in UDP packets and for the corresponding de-encapsulation. VTEPs take the form of VMkernel

ports with IP addresses and are used both to exchange packets with other VTEPs.

This design uses a single VTEP per host.

5.2.3.3.8 Edge Services Gateway

The primary function of the NSX for vSphere Edge services gateway is north-south

communication, but it also offers support for Layer 2, Layer 3, perimeter firewall, load balancing

and other services such as SSL-VPN and DHCP-relay.

In this design, the edges also ensure east-west communication.

5.2.3.3.9 Distributed Firewall

NSX for vSphere includes a distributed kernel-level firewall known as a distributed firewall.

Security enforcement is done at the kernel and VM network adapter level. This enables firewall


rule enforcement in a highly scalable manner without creating bottlenecks on physical appliances.

The distributed firewall has minimal CPU overhead and can perform at line rate.

This design does not automatically implement distributed firewall. The cloud admin actor is able

to enable this feature post implementation if required.

5.2.3.3.10 Logical Load Balancer

The NSX for vSphere logical load balancer provides load balancing services up to Layer 7

(application), allowing distribution of traffic across multiple servers to achieve optimal resource

utilization and availability. The logical load balancer is a service provided by the NSX Edge

services gateway.

This design implements load balancing for management virtual machines.

5.2.3.4 NSX for vSphere Physical Network Requirements

VXLAN packets cannot be fragmented. Since VXLAN adds its own header information the MTU

needs to be 1,600.

SoftLayer has a limitation of 136 VXLAN addresses. As such, the VXLAN control plane in this

design uses unicast mode to circumvent this limitation.

The NSX Manager synchronizes with the same NTP server as the rest of the vSphere

environment. This avoids time drift which can cause problems with authentication. The NSX

Manager must be in sync with the vCenter Single Sign-On server.

5.2.3.5 NSX for vSphere Specifications

The following table lists the components involved in the NSX for vSphere solution and the

requirements for installing and running them. The compute and storage requirements have been

taken into account when sizing resources to support the NSX for vSphere solution.

Table 21 NSX Components Sizing

VM

vCPU Memory Storage Quantity per

Deployment by

Type

NSX Manager 4 12 GB 60 GB 1

NSX Controller 4 4 GB 20 GB 3

NSX Edge

services

gateway - Quad

Large

4 1 GB

512 MB

2

NSX Edge

services

gateway – X-

Large

6 8 GB 4.5 GB (+4 GB

for SWAP)

2

The Quad Large model is utilized for high performance firewall. X-Large is utilized for both high

performance load balancing and routing.


5.2.3.6 Network Virtualization Conceptual Design

The following diagram depicts the conceptual tenant architecture components and their

relationship.

Figure 18 Network Virtualization Conceptual Design

The conceptual design has the following key components.

External Networks. Connectivity to and from external networks is through a perimeter

firewall. The main external network is the Internet.

Perimeter Firewall. The logical firewall exists at the perimeter of the datacenter. Each

tenant receives either a full instance or partition of an instance to filter external traffic.

This is the primary access method for user data.

Provider Logical Router (PLR). The PLR exists behind the perimeter firewall and

handles north/south traffic that is entering and leaving a tenant.

Tenant Edge services gateways. Edge services gateways that provide routing and

firewalling capabilities.

Internal Non-Tenant Networks. A single management network, which sits behind a

perimeter firewall but not behind the PLR. Enables cloud admin to manage the cloud

environment.

Internal Tenant Networks. Connectivity for the main tenant workload.


5.2.3.7 Cluster Design for NSX for vSphere

Management Stack

In the management stack, the underlying hosts are prepared for NSX for vSphere. The

management stack has these components:

NSX Manager instances for both stacks (management stack and compute/edge stack),

NSX Controller cluster for the management stack,

NSX Edge service gateways for the management stack.

Compute/Edge Stack

In the compute/edge stack, the underlying hosts are prepared for NSX for vSphere. The

compute/edge stack has these components:

NSX Controller cluster for the compute stack

All NSX Edge service gateways of the compute stack that are dedicated to handling the

north/south traffic in the datacenter. A separate edge stack helps prevent VLAN sprawl

because any external VLANs need only be trunked to the hosts in this cluster.

Multiple compute clusters that run the tenant workloads and have the underlying hosts

prepared for NSX for vSphere.

Figure 19 Cluster Design for NSX for vSphere


5.2.3.8 High Availability of NSX for vSphere Components

The NSX Manager instances of both stacks run on the management cluster. vSphere HA protects

the NSX Manager instances by ensuring that the VM is restarted on a different host in the event of

primary host failure.

The NSX Controller nodes of the management stack run on the management cluster and the NSX

for vSphere Controller nodes of the compute stack run on the edge cluster. In both clusters,

vSphere Distributed Resource Scheduler (DRS) rules ensure that NSX for vSphere Controller

nodes do not run on the same host.

The data plane remains active during outages in the management and control planes although the

provisioning and modification of virtual networks is impaired until those planes become available

again.

The NSX Edge services gateways and distributed logical router control VMs of the compute stack

are deployed on the edge cluster. The NSX Edge service gateways and distributed logical router

control VMs of the management stack run on the management cluster.

All NSX Edge components are deployed in NSX for vSphere HA pairs. NSX for vSphere HA

provides better availability than vSphere HA. By default, the VMs fail over within 15 seconds

versus a potential 5 minutes for a restart on another host under vSphere HA.

5.2.3.9 Logical Switch Control Plane Mode Design

The control plane decouples NSX for vSphere from the physical network and handles the

broadcast, unknown unicast, and multicast (BUM) traffic within the logical switches. The control

plane is on top of the transport zone and is inherited by all logical switches that are created within

it, although it is possible to override aspects of the control plane. The following options are

available:

Multicast Mode. The control plane uses multicast IP addresses on the physical network.

Use multicast mode only when upgrading from existing VXLAN deployments. In this

mode, PIM/IGMP must be configured on the physical network.

Unicast Mode. The control plane is handled by the NSX Controllers and all replication

occurs locally on the host. This mode does not require multicast IP addresses or physical

network configuration.

Hybrid Mode. This mode is an optimized version of the unicast mode where local traffic

replication for the subnet is offloaded to the physical network. Hybrid mode requires

IGMP snooping on the first-hop switch and access to an IGMP querier in each VTEP

subnet. Hybrid mode does not require PIM.

Due to SoftLayer constraints on the number of available multicast addresses, this design uses

unicast mode.

5.2.3.10 Transport Zone Design

A transport zone is used to define the scope of a VXLAN overlay network which can span one or

more clusters within a vCenter Server domain. One or more transport zones can be configured in

an NSX for vSphere solution. A transport zone is not meant to delineate a security boundary.

The design implements a single transport zone per NSX instance. Each zone will be scaled to

respect the limit of 100 DLRs per ESXi host. A single transport zone per NSX instance supports

extending networks across resource stacks.


5.2.3.11 Routing Logical Design

The routing logical design has to consider different levels of routing in the environment:

North-south. The Provider Logical Router (PLR) handles the north/south traffic to and

from a tenant

East-west. Internal east-west routing at the layer beneath the PLR deals with the

application workloads.

NSX Edge services gateways is deployed for each management application to provide routing for

the application. Virtual application network is attached directly to the edge services gateway. No

distributed logical routers are deployed beneath.

5.2.3.12 Firewall Logical Design

In this design, firewall functions are controlled at NSX Edge services gateway firewall. The Edge

services gateway virtual machines act as the entry point to a tenant’s virtual network space. NSX

Edge services gateways are configured with the firewall enabled to support functionality that is

required by virtual application networks. External access via load balancer virtual IPs when public

services are provided by redundant management application VMs. External access is via a DNAT

rule when service is provided by a single VM.

5.2.3.13 Load Balancer Design

The Edge services gateway implements load balancing within NSX for vSphere; it has both a

Layer 4 and a Layer 7 engine that offer different features, summarized in the following table.

Table 22 Load Balancer Features

Feature Layer 4 Engine Layer 7 Engine

Protocols TCP TCP

HTTP

HTTPS (SSL Pass-through)

HTTPS (SSL Offload)

Load Balancing Method Round Robin

Source IP Hash

Least Connection

Round Robin

Source IP Hash

Least Connection

URI

Health Checks TCP TCP

HTTP (GET, OPTION, POST)

HTTPS (GET, OPTION, POST)

Persistence (keeping client

connections to the same

back-end server)

TCP: SourceIP TCP: SourceIP, MSRDP

HTTP: SourceIP, Cookie

HTTPS: SourceIP, Cookie,

ssl_session_id

Connection Throttling No Client Side: Maximum

concurrent connections,

Maximum new connections per

second

Server Side: Maximum

concurrent connections



3-tier client/server architecture with a presentation tier (user interface), functional process logic

tier, and data tier. This architecture requires a load balancer for presenting end-user facing

services. This design implements each of these management applications as their own trust zone

and isolates management applications from each other.

Management applications are placed on isolated networks (VXLAN backed networks). The virtual

application network is fronted by an NSX Edge services gateway to keep applications

isolated. The use of load balancer interfaces is required for inbound access. The use of SNAT is

required for outbound access. Direct access to virtual application networks is by connecting to

Windows machines that are connected to the management networks directly.

Unique addressing is required for all management applications. This approach to network

virtualization service design improves security and mobility of the management applications, and

reduces the integration effort with existing customer networks.

From a software-defined networking design perspective, each management application is treated

as a separate management tenant that requires one or more logical networks via VXLAN segment.

The VXLAN segments are connected to the outside world by using a pair of NSX Edge services

gateways.

Figure 20 Virtual Application Network Components and Design

The NSX Edge services gateway associated with a management application is connected via an

uplink connection to a public accessible network and contains at least one IPv4 address on this

network.

As a result, this device offers the following capabilities.

If the IPv4 range that is used for the application-internal network does not overlap with

any other existing IPv4 range, the central DNS service can create DNS entries for nodes

on this internal network. Split DNS is not necessary.

Inbound access to services, such as Web UI, is supported by the load balancer capabilities

of the NSX Edge services gateway.

Application nodes can access the corporate network or the Internet via Source NAT

(SNAT) on the NSX Edge services gateway or via the vSphere management network.


Routed (static or dynamic) access to the vSphere management network is available for

access to the vCenter Server instances.

5.2.3.17 Virtual Network Design Example

The following figure presents the design for vRealize Automation that is implemented by the

architecture. This design is utilized for other management applications and can be implemented for

workloads deployed on compute cluster.

Figure 21 vRA Virtual Network Design

The design is set up as follows:

vRealize Automation is deployed onto a single Layer 2 segment, which is provided by a

VXLAN virtual wire. Micro segmentation between NSX components is not required and

therefore not used.

The network used by vRealize Automation connects to external networks through NSX

for vSphere. NSX Edge services gateways route traffic between management application

virtual networks and the public network.

Access to the isolated vSphere-Mgmt network is available through the MgmtCentral-

Edge services gateway that provides region-specific routing.


All Edge services gateways are connected over the networkExchange network that acts as

a transit network and as an interface to exchange routing information between the Edge

services gateways. To provide easy mobility of the network used by vRealize

Automation during recovery in another region, this network uses an RFC 1918 isolated

IPv4 subnet and uses Source NAT (SNAT) to access external networks such as the

Internet.

Services such as a Web GUI, which must be available to the users of vRealize

Automation, are accessible via the NSX Edge load balancer on the IPv4 address residing

on the external network.

Each application must use a unique IPv4 range for the application internal network(s). The unique

IPv4 range supports use of the central DNS service for creating DNS entries for nodes on this

internal network. The following table shows an example of how a mapping from management

applications to IPv4 subnets might look.

Table 23 Management Applications IP Addressing

Management Application Internal IPv4 Subnet

vRealize Automation (includes vRealize Orchestrator) 192.168.11.0/24

vRealize Automation Proxy Agents 192.168.12.0/24

192.168.13.0/24

vRealize Operations Manager 192.168.21.0/24

vRealize Operations Remote Collector 192.168.22.0/24

192.168.23.0/24

vRealize Log Insight 192.168.31.0/24

192.168.32.0/24

The management applications vRealize Automation, vRealize Operations Manager, and vRealize

Log Insight divert from the above described setup slightly:

vRealize Automation uses three network containers.

o One container is for the main vRealize Automation application cluster that can

be failed over by using Site Recovery Manager to a secondary region.

o Two additional network containers - one in each region - hold vRealize

Automation proxy agents.

vRealize Operations Manager uses three network containers.

o One container is for the main vRealize Operations Manager analytics cluster that

can be failed over to a secondary region by using Site Recovery Manager.

Two additional network containers - one in each region - are for connecting remote

collectors.

o vRealize Log Insight does not use Site Recovery Manager to fail over between

regions. Instead, a dedicated instance of Log Insight is deployed in each region.

To support this configuration, an additional cloud region is required.


5.2.3.18 Routing and Region Connectivity Design

This multi-region design is an extension of the single-region design that takes into account

management component failover. The figure below is an example of how virtual application

networks are built for vRealize Automation in the central cloud and cloud regions. The same

virtual application network configuration is valid for all the other virtual application networks.

Figure 22 Virtual Application Network Configuration in Central Cloud and Cloud Region

5.2.3.18.1 Dynamic Routing

Routing is handled by NSX Edge. Management network Edge services gateways,

MgmtCentral-Edge and MgmtRegionA-Edge, work in conjunction with Edge services

gateways that are configured to create virtual application network for each central cloud or cloud

region management component. A dedicated network, networkExchange, facilitates the exchange

of transit network traffic and the exchange of routing tables. All NSX Edge services gateways that

are attached to the networkExchange network segment run OSPF to exchange routing information.

OSPF route information exchange depends on the area definition, which is based on the local

vSphere-Mgmt network:

Table 24 OSPF Area ID

Network Region IP Addressing Area ID

vSphere-Mgmt Central Cloud 172.16.11.0/24 16

vSphere-Mgmt Region A

(Cloud Region)

172.17.11.0/24 17


Virtual application network Edge services gateway receives an OSPF default route from

the Management Edge services gateway which has access to the public network. Components

directly attached to the vSphere-Mgmt network use the Management Edge services gateway as the

default gateway to reach all components in the environment.

All VMs within virtual application networks can route to VMs in other virtual application

networks. Base OSPF Area IDs on the second octet of the vSphere management network, and use

MD5 encryption for authentication.

NSX Edge appliance size is X-Large and is deployed in High Availability (HA) mode.

A dedicated network for exchange of routing information and traffic between different gateways is

configured. Access to public networks uses SNAT.

5.2.3.18.2 Region Connectivity and Dynamic Routing

There is one Management Edge services gateway in each region. The role of the Management

Edge services gateway is to provide connectivity between regions and to exchange region-specific

routes with other connected Management Edge services gateways. The Management Edge services

gateway is configured to exchange OSPF routing data with the entire region-specific virtual

application network. Edge services gateways are attached to their respective networkExchange

networks. These routes are consolidated and exchanged with all connected Management Edge

services gateways using iBGP. All Management Edge services gateways belong to the same AS

(Autonomous System) with regards to BGP routing.

In this design, connectivity between two regions uses the direct backend connectivity available

between sites.

5.2.3.18.3 Virtual Application Networks and Dynamic Routing Across Regions

For the components that fail over to another region in the event of a disaster, route redistribution is

disabled in NSX Edge. This prevents the same routes from being advertised in both regions for

their virtual application networks. This makes it possible for the virtual application network Edge

services gateway to participate in region-specific OSPF route exchanges without announcing its

virtual application network route.

To facilitate failover of the virtual application network from one region to the other for some

components, this design creates a shadow virtual application network in the recovery region, an

additional cloud region in this design. This shadow virtual application network is configured

identically, with the same SNAT and DNAT rules and load balancer VIP as needed. The only

difference is the IP addressing used on the networkExchange network and the optional public

interface.

Component virtual application networks can be moved between regions either for testing or during

failover. When a component is moved the location of that component’s virtual application

network needs to be changed in the VPN configuration. The virtual application network Edge

services gateway also needs to be reconfigured to either start or stop redistributing connected

virtual application networks. The decision to start or stop redistribution depends on whether

the virtual application network Edge services gateway is now the active Edge services gateway or

the failed over Edge services gateway.


Infrastructure management is provided by VMware vCenter Server. The configuration of vCenter Server is

described in the following sections.

5.3.1 vCenter Server Instances

The vCenter Server design includes both the vCenter Server instances and the VMware Platform Services

Controller instances.


A VMware Platform Services Controller (PSC) groups a set of infrastructure services including vCenter

Single Sign-On, License service, Lookup Service, and VMware Certificate Authority. Although the

Platform Services controller and the associated vCenter Server system can be deployed on the same virtual

machine (embedded Platform Services Controller), this design using a separate PSC and vCenter Server.

This design includes two vCenter Server instances per central cloud or cloud region; one managing the

management cluster, the other managing the compute and edge clusters. This separation of the management

and compute stack not only provides additional security, but also supports scalability.

The vCenter Server is deployed as the Linux-based vCenter Server Appliance which is functionally

equivalent to the Windows based application, but easier to deploy and maintain.

The two external Platform Services Controller (PSC) instances and two vCenter Server instances are both

housed in the management cluster. The external PSC instances are used to enable replication of data

between PSC instances. To achieve redundancy, the design joins the two Platform Services Controller

instances to the same vCenter Single Sign-On domain, and connects each vCenter Server instance to their

respective Platform Services Controller instance. Joining all Platform Services Controller instances into a

single vCenter Single-Sign-On domain provides this design the capability to share authentication and

license data across all components and regions. Additionally, the vCenter Servers in each region are

configured to utilize Enhanced Linked Mode so that all inventories and actions are available from a each

vCenter Server in the region.

In terms of configuration, each vCenter Server system is configured with a static IP address and host name

from the management private portable SoftLayer subnet. The IP address will have a valid (internal) DNS

registration including reverse name resolution. The vCenter Server systems will also maintain network

connections to the following components:

All VMware vSphere Client and vSphere Web Client user interfaces.

Systems running vCenter Server add-on modules, such as NSX or vUM.

Each ESXi host.

Figure 23 vCenter Server and PSC Deployment Model


5.3.1.1 vCenter Server Resiliency

Protecting the vCenter Server system is important because it is the central point of management

and monitoring for the environment. In this design, the vCenter Servers and PSCs are protected

via vSphere HA.

5.3.1.2 Size of vCenter Server Appliances

The size of the vCenter Service appliances is described in the following sections.

5.3.1.2.1 vCenter Server Appliance for Management Cluster

The number of hosts that are managed by the Management vCenter Server is less than 100 and the

number of virtual machines to be managed is less than 1,000. As a result, this design includes a

“small” vCenter Server configured with the following specifications and still allows for cloud

admin’s to include additional services within the cluster:

Table 25 Specifications for Management vCenter Server Appliance

vCPUS Memory Disk Space Disk Type

4 16 GB 136 GB Thin

5.3.1.2.2 Platform Services Controller for Management Cluster

The Platform Services controller is deployed onto the management cluster with the following

configuration:

Table 26 Specifications for Platform Service Controller for Management Cluster


2 2 GB 30 GB Thin

Additionally, the PSC connects to the Active Directory server within this design for common

authentication.

5.3.1.2.3 vCenter Server Appliance for Compute and Edge Clusters

The Compute and Edge vCenter Server in this design manages up to 1,000 hosts or 10,000 virtual

machines whichever comes first. As a result, this design includes a “large” vCenter Server

configured with the following specifications:

Table 27 Specifications for Compute and Edge vCenter Server Appliance


16 32 GB 295 GB Thin

5.3.1.2.4 Platform Services Controller for Compute and Edge Clusters

The Platform Services controller for vCenter managing the Compute and Edge clusters are

deployed onto the management cluster and contain the following configuration:

Table 28 Specifications for Platform Service Controller for Management Cluster


2 2 GB 30 GB Thin

Additionally, the PSC conencts to the Active Directory server within this design for common

authentication.


5.3.1.3 vCenter Database Design

A vCenter Server Appliance can use either a built-in local PostgreSQL database or an external

Oracle database. Both configurations support up to 1,000 hosts or 10,000 virtual machines. This

design will utilize the built-in PostgreSQL to reduce both overhead and Microsoft or Oracle

licensing costs. This also avoids problems with upgrades and support since the ability to attach to

an external database is deprecated for vCenter Server Appliance in the next release.

5.3.1.4 vCenter Networking Design

The vCenter Servers and Platform Services Controllers in each region will be placed onto the

management network to access physical ESXi hosts within the region.

5.3.1.5 Cluster Configuration

Each of the clusters will be configured per the following sections.

5.3.1.5.1 vSphere DRS

This design will utilize vSphere Distributed Resource Scheduling (DRS) in each cluster to initially

place and dynamically migrate virtual machines to achieve balanced compute clusters. The

automation level is set to fully automated so that initial placement and migration recommendations

are executed automatically by vSphere. Note that power management via the Distributed Power

Management feature is not used in this design. Additionally, the migration threshold is set to

partially aggressive so that vCenter will apply priority 1, 2, 3, and 4 recommendations to achieve

at least a moderate improvement in the cluster’s load balance.

5.3.1.5.2 Affinity Rules

In this design, affinity and anti-affinity rules are used to ensure placement of virtual machines that

are in a high availability cluster are not placed on the same ESXi host.

5.3.1.5.3 vSphere HA

This design will use vSphere HA in each cluster to detect compute failures and recover virtual

machines running within a cluster. The vSphere HA feature in this design is configured with both

host monitoring and admission control enabled within the cluster. Additionally, each cluster will

reserve 25% of cluster resources as spare capacity for the admission control policy.

By default, the VM restart priority is set to medium and the host isolation response is set to “leave

powered on”. Additionally, VM monitoring is disabled and the datastore heart-beating feature is

configured to include any of the cluster datastores. In this design, the datastore is a VSAN-backed

datastore.

5.3.1.5.4 vSphere vMotion

vSphere vMotion is enabled in this design by the use of shared storage and network configuration.

5.3.1.5.5 vSphere FT

vSphere FT is not utilized or available in this design.

5.4 Common Services

5.4.1 Identity and Access Services

Microsoft Active Directory (AD) is used as the identity provider (IDP) in this design. In a single region

deployment of this design, two MS Windows 2012 R2 Active Directory server VMs are configured for a


single AD forest. Naming of the domain is derived from input data prior to design provisioning. Each

server has the MS Windows DNS service installed and configured with all forward and reverse lookup

zones as type AD integrated multi master enabled. Subsequent regions contain only one AD / DNS server

in each region setup as a peer multi master within its own MS AD site configured within the AD Sites

management console.

5.4.1.1 Forrest and subdomains

This design uses Microsoft Active Directory for authentication and authorization to resources

within the tornado.local domain. For a multi-region deployment, the design utilizes a

domain and forest structure to store and manage Active Directory objects per region.

Table 29 Requirements for Active Directory Service

Requirement Domain

Instance

Domain Name Description

Active

Directory

configuration

Parent

Active

Directory

tornado.local Contains Domain Name System

(DNS) server, time server, and

universal groups that contain global

groups from the child domains and

are members of local groups in the

child domains.

Central

child Active

Directory

central.tornado.local Contains DNS records which

replicate to all DNS servers in the

forest. This child domain contains

all design management users, and

global and local groups.

Region-A

child Active

Directory

regiona.tornado.local Contains DNS records which

replicate to all DNS servers in the

forest. This child domain contains

all design management users, and

global and local groups.

Active

Directory users

and groups

- The default AD internal LDAP

directory structure will be used for

users and computer objects.

ou=computers, DC=central,

DC=tornado, DC=local

ou=users, DC=central, DC=tornado,

DC=local

5.4.1.2 AD Functional level

This design’s AD functional level will be set at “Windows 2008”, which is the lowest level which

is allowed by the other components in this design.

5.4.1.3 AD Trusts

External trusts to customer AD environments are configured as required.

5.4.1.4 Authentication

Within this design, authentication is provided by a mix of the following:

direct AD integration

SAML authentication via the VMware Platform Services Controller (PSC), backed by

AD


local user accounts within the application.

The VMware Platform Services Controller is configured to use the AD servers within this design

for its identity management directory. The following chart describes which authentication

method is used for user / administrator access for each component

Table 30 Authentication types used

Component Authentication Context Authentication authority

vRA Administration Active Directory

Tenant Active Directory

vROps Administration / Roles Active Directory

vCenter Server Administration / Roles Active Directory

vRO Administration / Roles Active Directory

ESXi Administration Active Directory

NSX Administration Active Directory

vRI Administration Active Directory

vRB Administration Active Directory

Tenant Active Directory

vUM Administration Active Directory

NSX Administration Active Directory

VPN Active Directory

vDP Administration Active Directory

5.4.1.5 AD VM server sizing

Table 31 Server Sizing

Attribute Specification

Number of CPUs 4

Memory 8 GB

Disk size 80 GB

Number of servers in

central cloud

4

Number of servers per

additional cloud regions

2

5.4.2 Domain Name Services

Domain name services (DNS) is a fundamental function of any multi-component network

infrastructure design. DNS not only stores host name to IP address information but also service

type information required by Microsoft Active Directory and other applications. Dynamic DNS

capability is also required for MS AD deployments for the registration of the AD service record

types as subzone creation. The AD integrated MS DNS server is utilized for this design.


5.4.2.1 DNS Design

In a single region deployment of this design, four MS Windows 2012 R2 Active Directory server

VMs are configured. Two for the root of the forest domain and 2 for the region subdomain. Each

server has the MS Windows DNS service installed and configured with all forward and reverse

lookup zones as type AD integrated multi master enabled. Subsequent regions contain only one

AD / DNS server in each region setup as a peer multi master within its own MS AD site

configured within the AD Sites management console. The following applies to all services and

hosts within this design.

All nodes must use static IP addresses.

As a DNS best practice, the IP address, FQDNs and short names of all nodes must be

forward and reverse resolvable.

All nodes must be accessible from the vCenter Server instances and from the machine

that hosts the vSphere Client (if vSphere Client is used instead of vSphere Web Client).

All nodes in the vRealize Operations Manager analytics cluster must have unique host

names.

NTP source must be used for all cluster nodes

DNS is an important component for the operation of this design. For a multi-region deployment,

other regions must provide a root and child domains which contain separate DNS records.

5.4.2.2 DNS Configuration Requirements

The following table shows an example of the domain naming that can be implemented by the

design. The specific domain naming is determined by the client requirements.

Table 32 Domain Naming Example

Requirement Domain Instance Description

DNS host

entries

tornado.local Resides in the tornado.local

domain.

central.tornado.local

and regiona.tornado.local

DNS servers reside in the

central.tornado.local and

regiona.tornado.local

domains.

Configure both DNS servers with

the following settings:

Dynamic updates for the

domain set to Nonsecure

and secure.

Zone replication scope for

the domain set to All DNS

server in this forest.

Create all hosts listed in the

DNS Names

documentation.

Once properly configured, all nodes within this design are resolvable by FQDN.


5.4.2.3 DNS Forwarding

For name resolution outside of the design, the forest root DNS domain servers will have

forwarders configured to the internal SoftLayer DNS servers if no client on premises DNS is

available. If a client on premises DNS is available, then that becomes the DNS forwarding address

or addresses.

Table 33 SoftLayer DNS servers

DNS hostname IP v4 address

rs1.service.softlayer.com 10.0.80.11

rs2.service.softlayer.com 10.0.80.12

5.4.3 NTP Services

Time synchronization is critical for the functionality of many of the services that comprise this design. For

this purpose, Microsoft Windows Active Directory (AD) servers within the solution are configured as NTP

sources for all services within the design with the exception of VMware Data Protection (vDP), vRealize

Log Insight (vRI) and the ESXi hosts. The Windows AD servers are configured to utilize the internal

SoftLayer time server “servertime.service.softlayer.com” as the authoritative time source. The vDP

application server, is configured to utilize VMware tools as its time source per the vDP documentation

recommendation. The ESXi hosts and vRI are configured to point to the SoftLayer NTP server.

Table 34 Time sources

Service / Application Time Source

Active Directory servers servertime.service.softlayer.com

vDP VMware tools

ESXi servertime.service.softlayer.com

vRealize Log Insight servertime.service.softlayer.com

All else AD Servers

5.4.4 SMTP Services

SMTP services are required for outbound notifications and may be used for inbound communications to

vRealize Automation. The following sections details the configuration of SMTP in this design.

5.4.4.1 SMTP outbound

Within this design, email notifications are sent using SMTP for the following products:

vRealize Automation

vRealize Business

vRealize Operations

vRealize Log Insight

vRealize Orchestrator

vCenter Server

VMware Data Protection


An SMTP server service is configured on each windows AD server which is utilized as an SMTP

relay for all outbound SMTP notifications. This service is configured to use the default SMTP port

(25) where the connection to the customer’s email servers is over direct or VPN connection. As

SoftLayer blocks port 25 outbound, a secure SSL port (465, 587), or custom port is configured on

the SMTP server based on the destination email provider, provided by the customer. This is only

required where a direct connection or VPN from the customer site into the management network is

not available.

5.4.4.2 SMTP inbound

For vRealize Automation, an inbound email server to handle inbound email notifications, such as

approval responses, is required to be configured. This is typically the customer email server with

email accounts created for vRA. Only one, global inbound email server, which appears as the

default for all tenants, is needed. The email server provides accounts that are able to be customized

for each user, providing separate email accounts, usernames, and passwords. Each tenant can

configure an override to change these settings. If service provider actors do not override these

settings before enabling notifications, vRealize Automation uses the globally configured email

server.

The connection to the customer email system requires that this email system be network reachable

by the vRA system.

5.4.5 Certificate Authority Services

By default, vSphere 6.0 uses TLS/SSL certificates that are signed by VMCA (VMware Certificate

Authority) residing on the VMware Platform Services Controller appliance (PSC). By default, these

certificates are not trusted by end-user devices or browsers. It is a security best practice to replace at least

user-facing certificates with certificates that are signed by a third-party or enterprise Certificate Authority

(CA). Certificates for machine-to-machine communication can remain as VMCA-signed certificates. For

this design, a Microsoft Active Directory Enterprise integrated certificate authority is used. A two tier MS

certificate authority design is employed with 1 windows server as the offline root CA and an additional

windows server as the online subordinate CA.

5.4.5.1 Offline root CA server VM

As per Microsoft best practice, the root CA windows server does not participate in this design’s

AD domain. Setup the root CA per the following Microsoft information page:

https://technet.microsoft.com/library/hh831348.aspx

Note that while it is not recommended to connect this machine to a network, it will be network

connected until the root CA is generated. Once generated, transfer the root certificate files to the

subordinate CA server, remove the network access from the root CA server and shut down the OS.

It is recommended that client’s export this VM as an OVA to a secure location and remove it from

the vCenter Server’s inventory.

5.4.5.2 Online subordinate CA server VM

Setup the subordinate CA per the following MS link:


Note that this system will also be configured for handling certificate requests and distributing

certificate revocation lists (CRL).

The VMCA is configured with a certificate issued via the MS Active directory integrated CA such

that all subsequent browser access to VMware services, are validated by any clients that a

members of the designs AD domain or have imported the domains trusted root certificate.




Table 35 Root CA and Subordinate CA sizing


OS Windows 2012 R2

Number of CPUs 2

Memory 4 GB

Disk size 60 GB


The Cloud Management Services provide the management components for the cloud solution. This layer

includes the Service Catalog, which houses the facilities to be deployed, Orchestration which provides the

workflows to get the catalog items deployed, and the Self-Service Portal that empowers the users to take

full advantage of the Software Defined Datacenter. vRealize Automation provides the Portal and the

Catalog, and vRealize Orchestrator takes care of the process orchestration.

The conceptual design of the vRealize Automation Cloud Management Services is illustrated in the

following diagram. Key design components and their descriptions are also provided.

Figure 24 vRealize Automation Conceptual Design

The Cloud Management Services consist of the following elements and components:

Users:

Cloud administrators – Tenant, group, fabric, infrastructure, service, and other administrators as

defined by business policies and organizational structure.


Cloud (or tenant) users – Provide direct access to virtual machine to perform operating system-

level operations provided by vRealize Automation IaaS services.

Cloud Management Portal:

vRealize Automation portal, Admin access – The default root tenant portal URL used to set up and

administer tenants and global configuration options.

vRealize Automation portal, Tenant access – Refers to a subtenant and is accessed using a unique

URL, with an appended tenant identifier.

It is also possible for a tenant portal to refer to the default tenant portal in some configurations. In

this case, the URLs are the same and the user interface is contextually controlled by the assigned

RBAC permissions of that user.

Tools and supporting infrastructure:

VM Templates and Blueprints - These are the templates used in authoring the blueprints that

tenants (users) use to provision their cloud workloads.

Provisioning infrastructure - the following are the on-premises and off-premises resources which together

form a hybrid cloud

Virtual – Supported hypervisors and associated management tools.

Cloud – Supported cloud providers and associated API interfaces.

In the above diagram illustrating the conceptual design of the Cloud Management Platform, these

resources are located in the Internal Virtual Resources and the External Cloud Resources

components.

The cloud management services deliver multi-platform and multi-vendor cloud services. The services

available include the following items.

Comprehensive and purpose-built capabilities that provide standardized resources to global

customers in a short time span.

Multi-platform and multi-vendor delivery methods that integrate with existing enterprise

management systems.

Central user-centric and business-aware governance for all physical, virtual, private, and public

cloud services.

Design that meets the customer and business needs and is extensible.

Table 36 Cloud Management Services Components

Component Services Provided

vRealize Automation Identity Appliance vCenter Single Sign-On

vRealize Automation virtual appliance vRealize Automation Portal Web/App

Server

vRealize Automation vPostgreSQL

Database

vRealize Automation Service Catalog

vRealize Automation IaaS components vRealize Automation IaaS Web

Server

vRealize Automation IaaS Manager

Services


Component Services Provided

Distributed execution components vRealize Automation Distributed

Execution Managers:

Orchestrator

Workers

Integration components vRealize Automation Agent machines

Provisioning infrastructure vSphere environment

vRealize Orchestrator environment

Other supported physical, virtual, or

cloud environments.

Supporting infrastructure Microsoft SQL database environment

Active Directory environment

SMTP

NTP

DNS

5.5.1 Cloud Management Physical Design

This design uses NSX logical switches to abstract the vRealize Automation application and its supporting

services. This abstraction allows the application to be hosted in any given region regardless of the

underlying physical infrastructure such as network subnets, compute hardware, or storage types. This

design hosts the vRealize Automation application and its supporting services in the central cloud. The same

instance of the application manages both central cloud and any additional cloud regions.


Figure 25 vRealize Automation Design Overview for Central Cloud


Figure 26 vRealize Automation Design Overview for Additional Cloud Regions

The configuration of these elements is described in the following sections.

5.5.1.1 vRealize Identity Appliance

vRealize Identity Appliance provides an infrastructure-independent failover capability. A single

appliance is deployed in the solution. vSphere HA is used to ensure high availability for Identity

Appliance.

The appliance is configured with 1 vCPU and 2 GB of RAM.

5.5.1.2 vRealize Automation Appliance

The vRealize Automation virtual appliance includes the Web portal and database services.

The vRealize Automation portal allows self-service provisioning and management of cloud

services, as well as authoring blueprints, administration, and governance. The vRealize

Automation virtual appliance uses an embedded PostgreSQL database for catalog persistence and

database replication.


The solution deploys two instances of the vRealize Automation appliance to achieve redundancy.

The database is configured between two vRealize Automation appliances for high availability.

Data is replicated between the embedded PostgreSQL database instances. Database instances are

configured in an active/passive. In this configuration, manual failover between the two database

instances is required.

Each appliance is configured with 4 vCPU and 16 GB of RAM.

5.5.1.3 vRealize Automation IaaS Web Server

vRealize Automation IaaS Web server provides a user interface within the vRealize Automation

portal web site for the administration and consumption of IaaS components. Two vRealize

Automation IaaS web servers are installed on virtual machines. Each virtual machine runs

Microsoft Windows Server 2012 R2 and it performs Model Manager (Web) and IaaS Web

functions. Each virtual machine is sized to 4 vCPU, 4 GB of RAM and 60 GB HDD.

5.5.1.4 vRealize Automation IaaS Manager Service and DEM Orchestrator Server

The vRealize Automation IaaS Manager Service and Distributed Execution Management (DEM)

server are at the core of the vRealize Automation IaaS platform. The vRealize Automation IaaS

Manager Service and DEM server supports several functions.

Manages the integration of vRealize Automation IaaS with external systems and

databases.

Provides multi-tenancy.

Provides business logic to the DEMs.

Manages business logic and execution policies.

Maintains all workflows and their supporting constructs.

A Distributed Execution Manager (DEM) runs the business logic of custom models, interacting

with the database and with external databases and systems as required. DEMs also manage cloud

and physical machines. The DEM Orchestrator monitors the status of the DEM workers. DEM

worker manages the scheduled workflows by creating new workflow instances at the scheduled

time and allows only one instance of a particular scheduled workflow to run at a given time. It also

preprocesses workflows before execution. Preprocessing includes checking preconditions for

workflows and creating the workflow's execution history.

The vRealize Automation IaaS Manager Service and DEM server are separate servers, but are

installed on the same virtual machine.

Two virtual machines are deployed to run both IaaS Manager Service and DEM Orchestrator. The

two servers share the same active/passive application model. Only one manager service can be

active at a time.

Each virtual machine runs Microsoft Windows Server 2012 R2. Each virtual machine is sized to 2

vCPU, 4 GB of RAM and 60 GB HDD.

5.5.1.5 vRealize Automation IaaS DEM Worker Server

vRealize Automation IaaS DEM workers are responsible for the provisioning and deprovisioning

tasks initiated by the vRealize Automation portal. DEM workers communicate with vRealize

Automation endpoints. In this instance, the endpoint is vCenter Server.

Each DEM Worker can process up to 15 concurrent workflows. Beyond this limit, workflows are

queued for execution. The current design implements 2 DEM workers for a total of 30 concurrent

workflows.

DEM Workers are installed on two virtual machines running Microsoft Windows Server 2012 R2.

Each virtual machine is sized to 4 vCPU, 8 GB of RAM and 60 GB HDD.


5.5.1.6 vRealize Automation IaaS Proxy Agent

The vRealize Automation IaaS Proxy Agent is a Windows program that caches and forwards

information gathering from vCenter Server back to vRealize Automation. The IaaS Proxy Agent

server provides the following functions.

vRealize Automation IaaS Proxy Agent can interact with different types of hypervisors

and public cloud services, such as Hyper-V and AWS. For this design, only the vSphere

agent is used.

vRealize Automation does not virtualize resources by itself, but works with vCenter

Server to provision and manage the virtual machines. It uses vSphere agents to send

commands to and collect data from vCenter Server.

Two vRealize Automation vSphere Proxy Agent virtual machines are deployed in the current

architecture. The virtual machines are deployed on a dedicated virtual network to decouple them

from the main vRealize Automation infrastructure allowing.

Each virtual machine runs Microsoft Windows Server 2012 R2. Each virtual machine is sized to 2

vCPU, 4 GB of RAM and 60 GB HDD.

5.5.1.7 Load Balancer

Session persistence of a load balancer allows the same server to serve all requests after a session is

established with that server. The session persistence is enabled on the load balancer to direct

subsequent requests from each unique session to the same vRealize Automation server in the load

balancer pool.

The load balancer also handles failover for the vRealize Automation Server (Manager Service).

Only one Manager Service is active at any one time. Manual failover of Manager Service is

necessary. Session persistence is not enabled because it is not a required component for the

Manager Service.

The following tables describe load balancer implementation for vRealize Automation components:

Table 37 Load Balancer Application Profile

Server Role Type Enable SSL

Pass-through

Persistence Expires in

(Seconds)

vRealize

Automation

vPostgres

TCP n/a None n/a

vRealize

Automation

HTTPS (443) Enabled Source IP 120

vRealize

Automation

IaaS Web


vRealize

Automation

IaaS Manager


vRealize

Automation

Orchestrator



Table 38 Load Balancer Service Monitoring Configuration

Monitor Interv

al

Timeou

t

Retries Type Method URL Receive

vRealize

Automation

vPostgres

3 9 3 TCP N/A N/A N/A

vRealize

Automation

3 9 3 HTTPS

(443)

GET /vcac/ser

vices/api

/status

REGIST

ERED

vRealize

Automation

IaaS Web

3 9 3 HTTPS

(443)

GET N/A N/A

vRealize

Automation

IaaS

Manager

3 9 3 HTTPS

(443)

GET /VMPS2 BasicHtt

pBinding

_VMPSP

roxyAge

nt_policy

vRealize

Automation

Orchestrator

3 9 3 HTTPS

(443)

GET /vco/api/

status

REGIST

ERED

Table 39 Load Balancer Pool Specifications

Server Role Algorithm Monitors Members Port Monitor

Port

vRealize

Automation

vPostgres

Round Robin <vRealize

Automation

vPostgres

monitor>

vRealize

Automation

Postgres

nodes

5432 5432

vRealize

Automation

Least

Connection

<vRealize

Automation

monitor>

vRealize

Automation

nodes

443 443

vRealize

Automation

IaaS Web

Least

Connection

<vRealize

Automation

IaaS Web

monitor>

IaaS web

nodes

443 443

vRealize

Automation

IaaS

Manager

Least

Connection

<vRealize

Automation

IaaS

Manager

monitor>

IaaS

Manager

nodes

443 443

vRealize

Automation

Orchestrator

Least

Connection

<vRealize

Automation

Orchestrator

monitor>

vRealize

Orchestrator

nodes

8281 8281

Table 40 Virtual Server Characteristics

Protocol Port Default Pool Application Profile

TCP 5432 vRealize Automation

vPostgres Pool

vRealize Automation

vPostgres Profile

HTTPS 443 vRealize Automation Pool vRealize Automation Profile

HTTPS 443 vRealize Automation IaaS

Web Pool

vRealize Automation IaaS

Web Profile


Protocol Port Default Pool Application Profile

HTTPS 443 vRealize Automation IaaS

Manager Pool

vRealize Automation IaaS

Manager Profile

HTTPS 8281 vRealize Automation

Orchestrator Pool

vRealize Automation

Orchestrator Profile

5.5.2 vRealize Automation Supporting Infrastructure

A number of supporting elements are required for vRealize Automation. The following sections describe

their configuration.

5.5.2.1 Microsoft SQL Server Database

vRealize Automation uses a Microsoft SQL Server database to maintain the vRealize Automation

IaaS elements and the policies. The database also maintains information about the machines it

manages.

For simple failover of the entire vRealize Automation instance from one site to another, the

Microsoft SQL server is running in a virtual machine inside the vRealize Automation virtual

network. The virtual machine is running Microsoft Windows Server 2012 R2 and configured with

8 vCPU, 16 GB of RAM, 80 GB HDD. The Microsoft SQL Server Database version is SQL

Server 2012.

5.5.2.2 Postgres SQL Server Database

The vRealize Automation appliance uses a PostgreSQL database server to maintain the vRealize

Automation portal elements and services, and the information about the catalog items it manages.

Embedded PostgreSQL within each virtual appliance is utilized.

5.5.2.3 Notifications

System administrators configure default settings for both the outbound and inbound emails servers

used to send system notifications. Current solution implements outbound SMTP server only. The

automation creates a global outbound email server to process outbound email notifications. The

server appears as the default for all tenants.

5.5.3 vRealize Automation Cloud Tenant Design

A tenant is an organizational unit within a vRealize Automation deployment, and can represent a business

unit within an enterprise, or a company that subscribes to cloud services from a service provider. Each

tenant has its own dedicated configuration, although some system-level configuration is shared across

tenants.

5.5.3.1 Single-Tenant and Multi-Tenant Deployments

vRealize Automation supports deployments with a single tenant or multiple tenants. System-wide

configuration is always performed using the default tenant, and can then be applied to one or more

tenants. System-wide configuration specifies defaults for branding and notification providers.

Infrastructure configuration, including the infrastructure sources that are available for

provisioning, can be configured in any tenant and is shared among all tenants. The infrastructure

resources, such as cloud or virtual compute resources or physical machines, can be divided into

fabric groups managed by fabric administrators. The resources in each fabric group can be

allocated to business groups within each tenant by using reservations.

Single-Tenant Deployment — In a single-tenant deployment, all configuration occurs in

the default tenant. Service provider actors can manage users and groups, and configure

tenant-specific branding, notifications, business policies, and catalog offerings. All users


log in to the vRealize Automation console at the same URL, but the features available to

them are determined by their roles.

Multi-Tenant Deployment — In a multi-tenant deployment, the system administrator

creates new tenants for each organization that uses the same vRealize Automation

instance. Tenant users log in to the vRealize Automation console at a URL specific to

their tenant. Tenant-level configuration is segregated from other tenants and from the

default tenant, although users with system-wide roles can view and manage configuration

across multiple tenants.

5.5.3.2 Tenant Design

This design deploys a single tenant containing two business groups.

The first business group is designated for production workloads provisioning.

The second business group is designated for development workloads.

Service provider actors manage users and groups, configure tenant-specific branding,

notifications, business policies, and catalog offerings. All users log in to the vRealize Automation

console at the same URL, but the features available to them are determined by their roles.

The following diagram illustrates the single region tenant design.

Figure 27 Tenant Design for Single Region


The following diagram illustrates the dual-region tenant design.

Figure 28 Tenant Design for Two Regions

The tenant has two business groups. A separate fabric group for each region is created. Each

business group can consume resources in both regions. Access to the default tenant is allowed

only by the system administrator and for the purposes of managing tenants and modifying system-

wide configurations.

The solution automatically configures vRealize Automation based on the requested deployment

type – single region or dual region.

5.5.3.3 Service Design

The service catalog provides a common interface for consumers of IT services to use to request

and manage the services and resources they need.

A service provider actor or service architect can specify information about the service catalog,

such as the service hours, support team, and change window.

The solution implements a service catalog provides the following services:

Central Cloud. Service catalog that is dedicated to the central cloud.

Additional Cloud Region. Service catalog that is dedicated to an additional cloud

region.

The solution is preinstalled with several catalog items. For a single site configuration, only the

central cloud service catalog is implemented.


5.5.3.4 Catalog Items

Users can browse the service catalog for catalog items they are entitled to request. Several generic

users will be automatically created and entitled to all items in the catalog. The users can be

disabled at a later stage or their permissions modified as appropriate.

For some catalog items, a request results in the provisioning of an item that the user can manage.

For example, the user can request a virtual machine with Windows 2012 preinstalled, and then

manage that virtual machine after it has been provisioned.

The service provider actor defines new catalog items and publish them to the service catalog. The

service provider actor can then manage the presentation of catalog items to the consumer and

entitle new items to consumers. To make the catalog item available to users, a service provider

actor must entitle the item to the users and groups who should have access to it.

A catalog item is defined in a blueprint, which provides a complete specification of the resource to

be provisioned and the process to initiate when the item is requested. It also defines the options

available to a requester of the item, such as virtual machine specifications or lease duration, or any

additional information that the requester is prompted to provide when submitting the request. The

blueprint also specifies custom properties that are applied to the requested resource.

5.5.3.5 Machine Blueprints

A machine blueprint is the complete specification for a virtual machine. A machine blueprint

determines the machine's attributes, how it is provisioned, and its policy and management settings.

Machine blueprints are published as catalog items in the service catalog.

Machine blueprints can be specific to a business group or shared among groups within a tenant. In

this design the preloaded machine blueprints are shared among business groups. Service provider

actors create shared blueprints that can be entitled to users in any business group within the tenant.

Business group managers create group blueprints that can only be entitled to users within a

specific business group. A business group manager cannot modify or delete shared blueprints.

Service provider actors cannot view or modify group blueprints unless they also have the business

group manager role for the appropriate group.

If a service provider actor sets a shared blueprint's properties so that it can be copied, the business

group manager can also copy the shared blueprint for use as a starting point to create a new group

blueprint.

5.5.3.6 Blueprint Design

The following sections provide details of each service definition that has been included as part of

the current phase of cloud platform deployment.

Table 41 Base Windows Server Blueprint

Service Name Description

Provisioning Method When users select this blueprint, vRealize Automation clones

a vSphere virtual machine template with preconfigured

vCenter customizations.

Entitlement Both Production and Development business group members.

Approval Process No approval (pre-approval assumed based on approved

access to platform)

Operating System and

Version Details

Windows Server 2012 R2

Configuration Disk: Single disk drive

Lease and Archival

Details

Lease:

Production Blueprints: No expiration date



Development Blueprints: Minimum 30 days –

Maximum 270 days

Archive: 15 days

Pre- and Post-

Deployment

Requirements

Email sent to manager confirming service request (include

description details)

Table 42 Base Windows Blueprint Sizing

vCPU Memory (GB) Storage (GB)

2 8 70

Table 43 Base Linux Server Blueprint


Provisioning Method When users select this blueprint, vRealize Automation clones

a vSphere virtual machine template with preconfigured

vCenter customizations.

Entitlement Both Production and Development business group members

Approval Process No approval (pre-approval assumed based on approved

access to platform)

Operating System and

Version Details

Red Hat Enterprise Server 6

Configuration Disk: Single disk drive

Lease and Archival

Details

Lease:

Production Blueprints: No expiration date

Development Blueprints: Minimum 30 days –

Maximum 270 days

Archive: 15 days

Pre- and Post-

Deployment

Requirements

Email sent to manager confirming service request (include

description details)

Table 44 Base Linux Blueprint Sizing

vCPU Memory (GB) Storage (GB)

2 8 70

5.5.3.7 Branding

The solution branding is preconfigured. The cloud admin can change the appearance of the

vRealize Automation console to meet site-specific branding guidelines by changing the logo, the

background color, or information in the header and footer.


5.5.4 vRealize Automation vSphere Integration Design

The following terms apply to vRealize Automation integrated with vSphere. These terms and their meaning

may vary from the way they are used when referring only to vSphere.

Table 45 vRealize Integration with vSphere

Element Description

vSphere (vCenter Server)

endpoint

Provides information required by vRealize Automation IaaS to

access vSphere compute resources.

It requires the appropriate permissions for the vSphere proxy agent

to manage the vCenter Server instance.

Compute resource Virtual object within vRealize Automation that represents a vCenter

Server cluster or resource pool, and datastores or datastore clusters.

vRealize Automation provisions the virtual machines requested by

business group members on the compute resource.

Note: Compute resources are CPU, memory, storage and networks.

Datastores and datastore clusters are part of the overall storage

resources.

Fabric groups vRealize Automation IaaS organizes compute resources into fabric

groups.

Fabric administrators Fabric administrators manage compute resources, which are

organized into fabric groups.

Compute reservation A share of compute resources (vSphere cluster, resource pool,

datastores, or datastore clusters), such as CPU and memory reserved

for use by a particular business group for provisioning virtual

machines.

Note: vRealize Automation uses the term reservation to define

resources (be they memory, storage or networks) in a cluster. This is

different than the use of reservation in vCenter Server, where a share

is a percentage of total resources, and reservation is a fixed amount.

Storage reservation Similar to compute reservation (see above), but pertaining only to a

share of the available storage resources. In this context, a storage

reservation in terms of gigabytes is specified from an existing LUN

or Datastore.

Business groups A collection of virtual machine consumers, usually corresponding to

an organization's business units or departments. Only users in the

business group can request virtual machines.

Reservation policy vRealize Automation IaaS determines its reservation (also

called virtual reservation) from which a particular virtual machine is

provisioned. The reservation policy is a logical label or a pointer to

the original reservation. Each virtual reservation can be added to one

reservation policy.

Build profile A set of user defined properties a user is able to apply to a virtual

machine when it is provisioned. For example, the operating system

used in a blueprint, or the available networks to use for connectivity

at the time of provisioning the virtual machine.

Build profile properties determine the specification of the virtual

machine, the manner in which it is provisioned, operations to

perform after it is provisioned, or management information

maintained within vRealize Automation.


Element Description

Blueprint The complete specification for a virtual machine, determining the

machine attributes, the manner in which it is provisioned, and its

policy and management settings.

Blueprint allows the users of a business group to create virtual

machines on a virtual reservation (compute resource) based on the

reservation policy, and using platform and cloning types. It also lets

a user specify or add machine resources and build profiles.


The following figure shows the logical design constructs discussed in the previous section as they apply to

the deployment of vRealize Automation integrated with vSphere in a single region design

Figure 29 vRealize Automation Integration with vSphere Endpoint – Central Cloud


The following figure shows the logical design constructs discussed in the previous section as they apply to

the deployment of vRealize Automation integrated with vSphere in a dual region design

Figure 30 vRealize Automation Integration with vSphere Endpoint – Central Cloud and a Cloud Region (Region

A)

The solution automatically implements the design presented in the picture above for a dual site

configuration. When single site deployment is requested, the automation deploys only the central cloud

configuration.


5.5.5 Infrastructure Source Endpoints

An infrastructure source endpoint is a connection to the infrastructure that provides a set (or multiple sets)

of resources, which can then be made available by IaaS administrators for consumption by users. vRealize

Automation IaaS regularly collects information about known endpoint resources and the virtual resources

provisioned therein. Endpoint resources are referred to as compute resources (or as compute pods— the

terms are often used interchangeably).

Infrastructure data is collected through proxy agents that manage and communicate with the endpoint

resources. This information about the compute resources on each infrastructure endpoint and the machines

provisioned on each computer resource is collected at regular intervals.

During solution deployment the proxy agents and define their associated endpoints are configured

automatically.

5.5.6 Virtualization Compute Resources

A virtualization compute resource is a vRealize Automation object that represents an ESXi host or a cluster

of ESXi hosts (vSphere cluster). When a group member requests a virtual machine, the virtual machine is

provisioned on these compute resources. vRealize Automation regularly collects information about known

compute resources and the virtual machines provisioned on them through the proxy agents. Each region has

one compute cluster. The compute cluster is selected automatically during deployment.

5.5.6.1 Fabric Groups

A fabric group is a logical container of several compute resources, and can be managed by fabric

administrators. A fabric group for each region is created and it includes all the compute resources

and edge resources in that region.

5.5.6.2 Business Groups

A Business group is a collection of machine consumers (users), often corresponding to a line of

business, department, or other organizational unit. To request machines, a vRealize Automation

user must belong to at least one Business group. Each group has access to a set of local blueprints

used to request machines.

Business groups have the following characteristics.

A group must have at least one business group manager, who maintains blueprints for the

group and approves machine requests.

Groups can contain support users, who can request and manage machines on behalf of

other group members.

A vRealize Automation user can be a member of more than one Business group, and can

have different roles in each group.

Two business groups are created, one for production users and one for the development users.

5.5.6.3 Reservations

A reservation is a share of one compute resource's available memory, CPU and storage reserved

for use by a particular fabric group. Each reservation is for one fabric group only but the

relationship is many-to-many. A fabric group might have multiple reservations on one compute

resource or reservations on multiple compute resources or both. The solution implements only one

fabric group per region.

Each resource cluster has two reservations, one for production and one for development, allowing

both production and development workloads to be provisioned. An edge reservation in each region

is created and allows NSX to deploy edge services gateways on demand and place them on the

edge cluster.


5.5.6.4 Reservation Policies

Each virtual reservation is added to one reservation policy. The reservation from which a

particular virtual machine is provisioned is determined by vRealize Automation based on the

reservation policy specified in the blueprint (if any), the priorities and current usage of the fabric

group's reservations, and other custom properties.

Two reservation policies are configured in each region, one for production and the other for

development. One edge reservation in each region is created for placement of the edge service

gateway.

5.5.6.5 Template Synchronization

In case of a single region, no template synchronization is made. Dual-region deployment allows

provisioning workloads across regions from the same portal using the same single-machine

blueprints.

vSphere Content Library is the synchronization mechanism for templates across regions this

design uses.

Figure 31 Template Synchronization

5.5.7 Process Orchestration

VMware vRealize Orchestrator is a development and process automation and orchestration platform that

provides a library of extensible workflows to allow a cloud admin to create and run automated,

configurable processes to manage the VMware vSphere infrastructure as well as other VMware and third-

party technologies.

5.5.7.1 Directory Services

vRealize Orchestrator instances will use Active Directory LDAP authentication. The only

configuration supported for multi-domain Active Directory is domain tree. Forest and external

trusts are not supported for process orchestration. Multiple domains that have two-way trust, but

are not in the same tree, are not supported and do not work with vRealize Orchestrator.

5.5.7.2 Network Ports

vRealize Orchestrator uses specific network ports to communicate with other systems. The ports

are configured with a default value, which is set by the automation at build time. It is

recommended that these values remain unchanged to ensure supportability of the system in the

future. Firewall ports within the solution will be opened to ensure communication to the

components and intra components. Firewalls not deployed by the solution need to be configured

appropriately.


Table 46 vRealize Orchestrator Default Configuration Ports

Port Number Protocol Source Target Description

HTTPS

Server port

8281 TCP End-user

external

system

vRealize

Orchestrator

server

The SSL

secured

HTTP

protocol used

to connect to

the vRealize

Orchestrator

REST API.

Web

configuration

HTTPS

access port

8283 TCP End-user

Web

browser

vRealize

Orchestrator

configuration

The SSL

access port

for the Web

UI for

vRealize

Orchestrator

configuration.

Table 47 vRealize Orchestrator Default External Communication Ports


LDAP

using SSL

636 TCP vRealize

Orchestrator

server

LDAP

server

Lookup port of

the Active

Directory

server for

secure LDAP

authentication

server.

LDAP

using

Global

Catalog

3268 TCP vRealize

Orchestrator

server

Global

Catalog

server

Port to which

Microsoft

Global Catalog

server queries

are directed.

DNS 53 TCP vRealize

Orchestrator

server

DNS

server

Name

resolution

VMware

vCenter

Single

Sign-On

server

(PSC)

443 TCP vRealize

Orchestrator

server

vCenter

Single

Sign-On

server

Port used to

communicate

with the

vCenter Single

Sign-On server.



SQL

Server

1433 TCP vRealize

Orchestrator

server

Microsoft

SQL server

Port used to

communicate

with the

Microsoft SQL

Server or SQL

Server Express

instances that

are configured

as the vRealize

Orchestrator

database.

SMTP

Server port

25 TCP vRealize

Orchestrator

server

SMTP

Server

Port used for

email

notifications.

vCenter

Server API

port

443 TCP vRealize

Orchestrator

server

VMware

vCenter

server

The vCenter

Server API

communication

port used by

vRealize

Orchestrator to

obtain virtual

infrastructure

and virtual

machine

information

from the

orchestrated

vCenter Server

instances.

vCenter

Server

80 TCP vRealize

Orchestrator

server

vCenter

Server

Port used to

tunnel HTTPS

communication.

VMware

ESXi

443 TCP vRealize

Orchestrator

server

ESXi hosts Workflows

using the

vCenter Guest

Operations API

need direct

connection

between

vRealize

Orchestrator

and the ESXi

hosts the VM is

running on.

5.5.7.3 vRealize Orchestrator Deployment

Two vRealize Orchestrator appliance instances are required within this solution with 2 CPUs, 4

GB memory, and 16 GB of hard disk each. This solution uses MSSQL database already installed


to support other components. In cluster mode, multiple vRealize Orchestrator instances with

identical server and plug-in configurations work together as a cluster, and share a single

database. The instances are installed behind a load balancer. Although there are 2 instances for

availability, the failover required in a disaster recovery scenario is manual. Please refer to the

vRealize user guide for the process to manually failover these components.

All vRealize Orchestrator server instances communicate with each other by exchanging heartbeats

at a certain time interval. Only active vRealize Orchestrator server instances respond

to client requests and run workflows. If an active vRealize Orchestrator server instance fails to

send heartbeats, it is considered to be non-responsive, and one of the inactive instances takes over

to resume all workflows from the point at which they were interrupted. The heartbeat is

implemented through the shared database, so there are no implications in the network design for a

vRealize Orchestrator cluster. If there are more than one active vRealize Orchestrator nodes in a

cluster, concurrency problems can occur if different users use the different vRealize Orchestrator

nodes to modify the same resource. The implementation uses an active-active cluster.

The following tables outline characteristics for this vRealize Orchestrator active-active cluster

design

Table 48 vRO Service Monitor Specifications

Monitor Interval Timeout Retries Type Send String Receive

String

vco-https-

8281

3

9 3 HTTPS

(443)

GET

/vco/api/status\r\n

REGISTERED

Table 49 vRO Service Pool Characteristics

Pool Name Algorithm Monitors Members Port Monitor

Port

vco-pool Leastconn vco-https-

8281

vRealize

Orchestrator

nodes

8281 8281

Table 50 vRO Virtual Server Characteristics

Name Type Service Port Source Address

Translation

Default Pool

Name

vco-lb-8281 Performance

(Layer 4)

8281 Automap vco-pool

5.5.7.4 SSL Certificates

The vRealize Orchestrator configuration interface uses a secure connection to communicate with

vCenter Server, relational database management systems (RDBMS), LDAP, vCenter Single Sign-

On, and other servers. The required SSL certificates are generated by the certification authority

deployed within the solution.

5.5.7.5 vRealize Orchestrator Plug-Ins

Plug-ins allow vRealize Orchestrator to access and control external technologies and applications.

Exposing an external technology in a vRealize Orchestrator plug-in allows incorporating objects

and functions in workflows that access the objects and functions of the external technology. The

external technologies that can be accessed using plug-ins include virtualization management tools,

email systems, databases, directory services, and remote control interfaces. vRealize Orchestrator

provides a set of standard plug-ins.

The following plug-ins are configured in this design:


vRealize Orchestrator NSX plug-in

vRealize Orchestrator vRealize Automation plug-in

vRealize Orchestrator vCenter Server plug-in

5.5.7.5.1 Multi-node plugin

vRealize Orchestrator comes as a single-site topology product. The multi-node plug-in creates a

primary-secondary relation between vRealize Orchestrator servers that extends the package

management and workflow execution features. This is only enabled when deploying a multi-

region topology. The plug-in contains a set of standard workflows for hierarchical orchestration,

management of vRealize Orchestrator instances, and the scale-out of vRealize Orchestrator

activities.

5.5.7.5.2 vRealize Orchestrator Client

The vRealize Orchestrator client is a desktop application that lets users import packages, create,

run, and schedule workflows, and manage user permissions.

vRealize Orchestrator Client can be installed standalone on a desktop system. Download the

vRealize Orchestrator Client installation files from the vRealize Orchestrator appliance

page: https://vRO_hostname:8281. Alternatively, vRealize Orchestrator Client can be run

using Java WebStart directly from the homepage of the vRealize Orchestrator appliance console.

5.5.7.5.3 vRealize Orchestrator Scalability

A single vRealize Orchestrator instance allows up to 300 concurrent workflow instances in the

running state. Workflow instances that are in the waiting or waiting-event states do not count

toward that number. You can design long running workflows in a way that preserves resources by

using the wait elements of the workflow palette. A single vRealize Orchestrator instance supports

up to 35,000 managed virtual machines in its inventory.

This architecture depicts a clustered vRealize Orchestrator environment. In a clustered

environment, workflows cannot be changed while other vRealize Orchestrator instances are

running. Stop all other vRealize Orchestrator instances before connecting the vRealize

Orchestrator client and changing or developing a new workflow. Failure to do so will result in

inconsistencies within the environment.

This architecture scale’s out a vRealize Orchestrator environment by having multiple independent

vRealize Orchestrator instances (each with their own database instance). This allows for the

increase in the number of managed inventory objects.

This solution implements an active-active cluster with two nodes.

5.5.8 Software Orchestration

This solution provides a centralized repository for software binaries and software orchestration templates

that are implemented on deployed resources. Main software orchestration engines installed are Chef Server

and Salt Stack. Each region hosts its own dedicated repository server and software orchestration stack.

Software binaries are replicated between regions using rsync tool. Software orchestration components use

internal specific replication mechanisms.


The following diagram shows a high level view of the software orchestration components.

Figure 32 Software Orchestration Logical Design

A single region design implements a central orchestration and a region orchestration.

5.5.8.1 Central Software Orchestration

Central software orchestration has no direct interaction with the resources, but provides a central

management point for maintaining the latest binaries and templates by the cloud administrator.

It is responsible for,

Maintaining the latest versions of software binaries

Maintaining the latest versions of software orchestration templates

Keeping the regions up to date with the latest software versions and templates

Inputs and Outputs

A central component pushes software binaries to the region component.

A Central component is accessed by the cloud administrator to update software and templates.

Design Rationale

In a hybrid cloud there may be many regions each with their own software and template

requirements. Plus, to avoid delays during deployment, the software binaries need to be as close to

the resource cluster as possible. This drives the need for a distributed repository of software

binaries. Maintaining these separately and keeping software versions on each up to date with the

latest versions can add considerable support cost. A central repository can regular or on change

update each of these regions without the need to update each one individually.

Implementation Approach

The central software orchestration components are co-located with the other central services to

minimize network access configuration for administrator access.

Other Functions

The central repository is also used as the backup location for the NSX Managers in the central

cloud.

5.5.8.2 Region Software Orchestration

Responsibilities

The region component provides the software orchestration engine that implements software to

deployed resources using defined software orchestration templates. The implementation includes

mounting the software installation media (software binaries), then installing and configuring the

software. The implementation may also involve other actions to configure the deployed resource

to the software orchestration template requirements such as setting password rules.


It is responsible for,

being a repository for software binaries

being a repository for software orchestration templates

providing software orchestration engine to implement software on deployed resources

Inputs and Outputs

A region pushes software and configurations to deployed resources.

A region is called from a cloud region to deploy the software.

Design Rationale

A cloud needs to be able to implement software on deployed resources. This requires software

binaries and software orchestration templates (patterns). The region is to provide these functions.

It is separate to the cloud region in order to allow for different flavors without impacting the cloud

region design.

Implementation Approach

The region should be co-located with the other region services to minimize network traffic and

response times when deploying software to resources.

Other Functions

The region repository is also used as the backup location for the NSX Managers in the cloud

region.

5.5.8.3 Software Orchestration Components Sizing

The following table presents sizing for central and region

Table 51 Software Orchestration Components Sizing

Server Role vCPU RAM (GB) Disk (GB)

Central repository and

Chef server

2 8 300

Central Salt master 2 8 100

Region repository and

Chef server

4 16 300

Region Salt master 4 16 100

5.5.9 Infrastructure Orchestration

Infrastructure orchestration is handled by vRealize Orchestration and vRealize Automation which is

covered in the prior sections.


Operational services provide the services to enable the support and maintenance of the cloud management

platform. This design does not include operational services for user related virtual resources.

5.6.1 Backup and Restore

Data backup protects the data of this design against data loss, hardware failure, accidental deletion, or other

disaster for each region. For consistent image-level backups, this design uses backup software based upon

the VMware Virtual Disk Development Kit (VDDK), such as vSphere Data Protection (VDP). For this

design, VDP is used to back up the infrastructure VMs with the exception of the NSX components, which

have their own backup and recovery procedure documented in the networking section.


5.6.1.1 Logical Design

vSphere Data Protection protects the virtual infrastructure at the VMware vCenter Server layer.

Because vSphere Data Protection is connected to the Management vCenter Server, it can access all

management ESXi hosts, and can detect the virtual machines that require backups.

Figure 33 vSphere Data Protection Logical Design

5.6.1.2 Backup Datastore

vSphere Data Protection uses deduplication technology to back up virtual environments at data

block level, which enables efficient disk utilization. To optimize backups and leverage the

VMware vSphere Storage APIs, all ESXi hosts must have access to the NFS datastore. The

backup datastore stores all the data that is required to recover services according to a Recovery

Point Objective (RPO).

5.6.1.3 Performance

vSphere Data Protection generates a significant amount of I/O operations, especially when

performing multiple concurrent backups. The storage platform must be able to handle this I/O. If

the storage platform does not meet the performance requirements, it might miss backup windows.

Backup failures and error messages might occur. Run the vSphere Data Protection performance

analysis feature during virtual appliance deployment or after deployment to assess performance.

For this design a dedicated volume on performance storage in SoftLayer is used as a backup

target. The backup volume is NFS mounted to all ESXi hosts in the management cluster.

Table 52 VMware vSphere Data Protection Performance

Total Backup Size Avg Mbps in 4 hours

0.5 TB 306 Mbps

1 TB 611 Mbps

2 TB 1223 Mbps


5.6.1.4 Volume Sizing

vSphere Data Protection can dynamically expand the destination backup store from 2 TB to 8 TB.

Using an extended backup storage requires additional memory on the vSphere Data Protection

appliance. For this design, an initial fixed NFS datastore size will be utilized. Additional NFS

space can be provisioned via the SoftLayer customer portal as required. Note that an existing

SoftLayer Endurance NFS storage allocation cannot be expanded once provisioned. If additional

storage is required, then a new export must be requested and migrated to.

For this design, 4TB of storage is initially sized with a 4 CPU and 12GB of memory for the vDP

appliance virtual machine for each region.

5.6.1.5 Other Considerations

vSphere Data Protection can protect virtual machines that reside on VMware Virtual SAN from

host failures. The virtual machine storage policy is not backed up with the virtual machine, but a

user is able to restore the storage policy after restoring the virtual machine.

Note: The default Virtual SAN storage policy is configured and includes Number Of

Failures To Tolerate = 1, which means that virtual machine data will be mirrored.

vSphere Data Protection is used to restore virtual machines that failed or need their data reverted

to a previous state.

The FTP server required for backing up NSX Manager will reside on the RDS central repository

server. Backups for NSX Manager setup with a daily schedule by the NSX Manager.

5.6.1.6 Backup Policies

Use vSphere Data Protection backup policies to specify virtual machine backup options, the

schedule window, and retention policies.

5.6.1.7 Virtual Machine Backup Options

vSphere Data Protection provides the following options for performing a backup of a virtual

machine:

Hot Add. Provides full image backups of virtual machines, regardless of the guest operating

system.

o The virtual machine base disk is attached directly to vSphere Data Protection to back

up data. vSphere Data Protection uses Changed Block Tracking to detect and back

up blocks that are altered.

o The backup and restore performance is faster because the data flow is through the

VMkernel layer instead of over a network connection.

o A quiesced snapshot can be used to redirect the I/O of a virtual machine disk .vmdk

file.

o Hot Add does not work in multi-writer disk mode.

Network Block Device (NBD). Transfers virtual machine data across the network to allow

vSphere Data Protection to back up the data.

o The performance of the virtual machine network traffic might be lower.

o NBD takes a quiesced snapshot. As a result, it might interrupt the I/O operations of

the virtual machine to swap the .vmdk file or consolidate the data after the backup

is complete.

o The time to complete the virtual machine backup might be longer than the backup

window.

o NBD does not work in multi-writer disk mode.

vSphere Data Protection Agent Inside Guest OS. Provides backup of certain applications

that are running in the guest operating system through an installed backup agent.


o Enables application-consistent backup and recovery with Microsoft SQL Server,

Microsoft SharePoint, and Microsoft Exchange support.

o Provides more granularity and flexibility to restore on the file level.

For this design, Hot Add will be used for all backup policies with the exception of the

infrastructure SQL server backups with will utilize the Guest OS Agent, in addition to local SQL

backups to disk.

5.6.1.8 Schedule Window

Even though vSphere Data Protection uses the Changed Block Tracking technology to optimize

the backup data, do not use a backup window when the production storage is in high demand to

avoid any business impact.

Warning: Do not perform any backup or other administrative activities during the vSphere Data

Protection maintenance window. Restore operations are allowed. By default, the vSphere Data

Protection maintenance window begins at 8 AM local server time and continues uninterrupted

until 8 PM or until the backup jobs are complete. Configure maintenance windows according to IT

organizational policy requirements.

The backup window is set to default and can be modified, based on customer requirements.

5.6.1.9 Retention Policies

Retention policies are properties of a backup job. If virtual machines are grouped by business

priority, it is possible to set the retention requirements according to the business priority.

This design requires that a dedicated NFS datastore be allocated for the vSphere Data Protection

appliance for the backup data in each region. The NFS datastore is allocated from IBM SoftLayer

Performance storage option with an initial allocation of 4TB.

5.6.1.10 Component Backup Jobs

Backups are configured for each of this design’s management components separately. No

requirement to back up the entire design exists, and this design does not imply such an operation.

Some products can perform internal configuration backups.

Separate from this, NSX is configured to backup to the FTP server within the design. The SQL

server will also be configured for local backups to disk on a daily basis.

5.6.1.11 Backup Jobs in Central Cloud

If multiple regions are deployed, create a single backup job for the components of a management

application according to the node configuration of the application in central cloud.

Table 53 Backup Jobs in Central Cloud

Product Image VM Backup Jobs in Central Cloud Application VM

Backup Jobs in

Central Cloud

ESXi N/A - No Backup

Platform

Services

Controller

Part of the vCenter Server backup job


vCenter Server Management Job

mgmt01vc01.central.tornado.loc

al

mgmt01psc01.central.tornado.lo

cal

Compute Job

comp01vc01.central.tornado.loc

al

comp01psc01.central.tornado.lo

cal

vRealize

Automation

vra01vro01a.tornado.local

vra01vro01b.tornado.local

vra01dem01.tornado.local

vra01dem02.tornado.local

vra01ias01.tornado.local


vra01ims01a.tornado.local

vra01ims01b.tornado.local

vra01iws01a.tornado.local

vra01iws01b.tornado.local

vra01svr01a.tornado.local

vra01svr01b.tornado.local

vra01mssql01.tornado.local

vra01ids01a.tornado.local

vra01mssql01.torna

do.local

vRealize Log

Insight

vrli-mstr-01

vrli-wrkr-01

vrli-wrkr-02

vRealize

Operations

Manager

vrops-mstrn-01

vrops-repln-02

vrops-datan-03

vrops-datan-04

vrops-rmtcol-01

vrops-rmtcol-02

vRealize

Orchestrator

Part of the vRealize

Automation backup job


5.6.1.12 Backup Jobs in Additional Cloud Region

Create a single backup job for the components of a management application according to

the node configuration of the application in an additional cloud region.

Table 54 Backup Jobs in Additional Cloud Region

Product Image VM Backups in Additional Cloud

Region

Application VM

Backup Jobs in

Additional Cloud

Region

ESXi N/A - No Backup

None

Platform

Services

Controller

Part of the vCenter Server backup job

vCenter Server Management Job

mgmt01vc51.regiona.tornado.local

mgmt01psc51.regiona.tornado.local

Compute Job

comp01vc51.regiona.tornado.local

comp01psc51.regiona.tornado.local

NSX for

vSphere Management Job

mgmt01nsxm51.regiona.tornado.local

Compute Job

comp01nsxm51.regiona.tornado.local

vRealize

Automation



vRealize Log

Insight

vrli-mstr-51

vrli-wrkr-51

vrli-wrkr-52

vRealize

Operations

Manager

vrops-rmtcol-51

vrops-rmtcol-52

vRealize

Orchestrator

Part of the vRealize Automation backup job

5.6.2 Disaster Recovery

A Site Recovery Manager instance is required for both the protected region and the recovery region. Site

Recovery Manager is installed after the installation and configuration of vCenter Server and the Platform

Services Controller in the region. Site Recovery Manager takes advantage of vCenter Server and Platform

Services Controller services such as storage management, authentication, authorization, and guest


customization. Site Recovery Manager uses the standard set of vSphere administrative tools to manage

these services. Site Recovery Manager is installed on a dedicated window host VM within each region the

design.

5.6.2.1 Networking Design for Disaster Recovery

Moving a service physically from one region to another represents a networking challenge,

especially if applications have hard-coded IP addresses. Network address space and IP address

assignment considerations require that either the same IP address or a different IP address at the

recovery region is used. In many situations, new IP addresses are assigned, because VLANs do

not typically stretch between regions.

While protecting the management applications, it is possible to simplify the problem of IP address

assignment. This design leverages a load balancer to separate a public network segment and a

private network segment. The private network can remain unchanged and only the external load

balancer interface has to be reassigned.

On the public network segment, the management application is accessible via one or more virtual

IP (VIP) addresses

On the isolated private network segment, the application's virtual machines are isolated

After a failover, the recovered application is available under a different IPv4 address (VIP). The

use of the new IP address requires changes to the DNS records. DNS records are either changed

manually or by using a script in the Site Recovery Manager recovery plan.

Figure 34 Logical Network Design for Cross-Region Deployment with Management Application

Network Container

The IPv4 subnets (orange networks) are routed within the vSphere management network of each


region. Nodes on these network segments are reachable from within the SDDC. IPv4 subnets, such

as the subnet for the vRealize Automation primary components, overlap across a region. Make

sure that only the active IPv4 subnet is propagated in the region and beyond. The public facing

Ext-Mgmt network of both regions (grey networks) are reachable by users and provide connection

to external resources, such as Active Directory or DNS.

Load balancing functionality is provided by NSX Edge services gateways. In each region, the

same configuration for the management applications and their Site Recovery Manager shadow will

be used. Active Directory and DNS services must be running in both the protected and recovery

regions.

5.6.2.2 vSphere Replication

In a VMware Virtual SAN environment, array-based replication cannot be used. This design

utilizes vSphere Replication instead to transfer VMs between regions.

Within the design, vSphere Replication uses a VMkernel management interface on the ESXi host

to send replication traffic to the replication site's vSphere Replication appliance. To isolate

vSphere Replication traffic so that it does not impact other vSphere management traffic, the

vSphere Replication network is configured in the following way.

vSphere Replication appliances and vSphere Replication servers are the target for the replication

traffic that originates from the vSphere Replication VMkernel ports.

5.6.2.2.1 Placeholder Virtual Machines

Site Recovery Manager creates a placeholder virtual machine on the recovery region for every

machine from the Site Recovery Manager protection group. Placeholder virtual machine files

contain virtual machine configuration metadata but not virtual machine disks, and the files are

very small. Site Recovery Manager adds the placeholder virtual machines as recovery region

objects in the Management vCenter Server.

5.6.2.2.2 Snapshot Space

To perform failover tests, additional storage is provided for the snapshots of the replicated VMs.

This storage is minimal in the beginning, but grows as test VMs write to their disks. Replication

from the protected region to the recovery region continues during this time. The snapshots created

during testing are deleted after the failover test is complete.

5.6.2.2.3 Messages and Commands for Site Recovery Manager

Site Recovery Manager has options to present users with messages that provide notification and

accept acknowledgement. Site Recovery Manager also provides a mechanism to run commands

and scripts as necessary when executing a recovery plan. Pre-power-on or post-power-on

messages and commands can be inserted to the recovery plans. These messages and commands are

not specific to Site Recovery Manager, but support pausing the execution of the recovery plan to

complete other procedures, or executing customer-specific commands or scripts to enable

automation of recovery tasks.

5.6.2.2.4 Site Recovery Manager Messages

Additional steps are required to be configured when building more than the central cloud (i.e.

adding cloud regions). For example, the environment should be setup such that a message appears

when a recovery plan is initiated, and that the cloud admin must acknowledge the message before

the recovery plan continues. Messages are specific to each IT organization.

Consider the following example messages and confirmation steps:

Verify that IP address changes are made on the DNS server and that the changes are

propagated.


Verify that the Active Directory services are available.

After the management applications are recovered, perform application tests to verify that

the applications are recovered correctly.

Additionally, confirmation steps can be inserted after every group of services that have a

dependency on other services. These conformations can be used to pause the recovery plan so that

appropriate verification and testing be performed before subsequent steps are taken. These

services are defined as follows:

Infrastructure services

Core services

Database services

Middleware services

Application services

Web services

Details on each message are specified in the workflow definition of the individual recovery plan.

5.6.2.2.5 Site Recovery Manager Commands

In this initial phase of the design, custom scripts are out of scope. However, custom scripts can

be running to perform infrastructure configuration updates or configuration changes on the virtual

machine environment. The scripts that a recovery plan executes are located on the Site Recovery

Manager server. The scripts can run against the Site Recovery Manager server or can impact a

virtual machine.

If a script must run in the virtual machine, Site Recovery Manager does not run it directly, but

instructs the virtual machine to do it. The audit trail that Site Recovery Manager provides does not

record the execution of the script because the operation is on the target virtual machine.

Scripts or commands must be available in the path on the virtual machine according to the

following guidelines:

Use full paths to all executables. For example, c:\windows\system32\cmd.exe

instead of cmd.exe.

Call only.exe or .com files from the scripts. Command-line scripts can call only

executables.

To run a batch file, start the shell command with

c:\windows\system32\cmd.exe.

The scripts that are run after powering on a virtual machine are executed under the Local Security

Authority of the Site Recovery Manager server. Store post-power-on scripts on the Site Recovery

Manager virtual machine. Do not store such scripts on a remote network share.

5.6.2.3 Recovery Plans for Site Recovery Manager

A recovery plan is the automated plan (runbook) for full or partial failover from the central cloud

to a cloud region.

5.6.2.3.1 Startup Order and Response Time

Virtual machine priority determines virtual machine startup order.


All priority 1 virtual machines are started before priority 2 virtual machines.




Additionally, a startup order of virtual machines within each priority group can be set.

The following timeout parameters are set:

Response time, which defines the time to wait after the first virtual machine powers on

before proceeding to the next virtual machine in the plan.

Maximum time to wait if the virtual machine fails to power on before proceeding to the

next virtual machine.

The response time values can be adjusted as necessary during execution of the recovery plan test

to determine the appropriate response time values.

5.6.2.3.2 Recovery Plan Test Network

When a recovery plan is created, the test network options must be configured. The following

options are available.

Isolated Network (Automatically Created). An isolated private network is created

automatically on each ESXi host in the cluster for a virtual machine that is being

recovered. Site Recovery Manager creates a standard switch and a port group on it.

A limitation of this automatic configuration is that a virtual machine connected to the

isolated port group on one ESXi host cannot communicate with a virtual machine on

another ESXi host. This option limits testing scenarios and provides an isolated test

network only for basic virtual machine testing.

Port Group. Selecting an existing port group provides a more granular configuration to

meet the client’s testing requirements. For virtual machines across ESXi hosts to

communicate, distributed switches with uplinks to the production network are used and a

port group is created on the switch that is tagged with a non-routable VLAN. In this way,

the network is isolated and cannot communicate with other production networks.

Because the isolated application networks are fronted by a load balancer, the recovery plan test

network is equal to the recovery plan production network and provides realistic verification of a

recovered management application.

5.6.2.3.3 Sizing

For each region, 1 SRM server and one vSphere Replication appliance are deployed. The SRM

application is installed on a Windows 2012R2 virtual machine and utilizes the built in PostgreSQL

DB. The vSphere Replication application is deployed as a virtual appliance.

Table 55 SRM Windows server sizing


VM size Medium

Number of CPUs 4

Memory 8 GB

Disk size 60 GB


Table 56 vSphere Replication Appliance


VM size Medium

Number of CPUs 2

Memory 4 GB

Disk size 18G

5.6.3 Monitoring

Monitoring and Operations Management is a required element of a software-defined datacenter. Monitoring

operations support in vRealize Operations Manager provides capabilities for performance and capacity

management of related infrastructure and cloud management components.

5.6.3.1 Single-Region Logical Design

In this single-region design, vRealize Operations Manager is deployed with the following

configuration:

4-node (large-size) vRealize Operations Manager analytics cluster that is highly available

(HA). This topology provides high availability, scale-out capacity up to eight nodes, and

failover.

2-node (large-size) remote collector cluster. The remote collectors communicate directly

with the data nodes in the vRealize Operations Manager analytics cluster. For a multi-

region design, deploy two remote collectors in each region.

Note that a single region has its own remote collectors whose role is to ease scalability by

performing the data collection from the applications that are not a subject of failover and

periodically sending collected data to the analytics cluster. In cases of multiple regions, the

analytics cluster can be failed over because the analytics cluster is the construct that analyzes and

stores monitoring data. The multi-region configuration supports failover of the analytics cluster by

using Site Recovery Manager. In the event of a disaster, Site Recovery Manager migrates the

analytics cluster nodes to the failover region.


Figure 35 Logical Design of vRealize Operations Manager Central Cloud and a Cloud Region (Region

A) Deployment

5.6.3.2 Physical Design

The vRealize Operations Manager nodes run on the management cluster in each region of this

design.

5.6.3.3 Data Sources

vRealize Operations Manager collects data from the following virtual infrastructure and cloud

management components:

Management vCenter Server

o Platform Services Controller

o vCenter Server

Compute vCenter Server


o vCenter Server

Management, Edge and Compute ESXi hosts

NSX for vSphere for the management and compute clusters

o NSX Manager

o NSX Controller Instances

o NSX Edge instances

vRealize Automation

o vRealize Orchestrator

o vRealize Automation Components

vRealize Log Insight

vRealize Operations Manager (Self Health Monitoring)

5.6.3.4 vRealize Operations Manager Nodes

The analytics cluster of the vRealize Operations Manager deployment contains the nodes that

analyze and store data from the monitored components.

5.6.3.4.1 Compute for vRealize Operations Manager Nodes

The four-node vRealize Operations Manager analytics is deployed in the management cluster with

an application virtual network. The analytics cluster consists of one master node, one master

replica node, and two data nodes to enable scale out and high availability. The size of the vRealize

Operations Manager analytics cluster is sized according to VMware KB article 2130551 "vRealize

Operations Manager 6.1 Sizing Guidelines" and includes the following management packs:

Management Pack for VMware vCenter Server (installed by default)

Management Pack for NSX for vSphere

Management Pack for Storage Devices

Management Pack for vRealize Log Insight

Management Pack for vRealize Automation

Note that each node in the analytics cluster will be sized identically to support scale-out, high

availability, and design guidance for central cloud or cloud region infrastructure. As a result, the

configuration for each node is as follows:

Table 57 Analytics Cluster Node Configurations

Node vCPU Memory

Master 16 48 GB

Master Replica 16 48 GB


Data 16 48 GB

Data 16 48 GB

5.6.3.4.2 Storage for vRealize Operations Manager Nodes

Each vRealize Operations Manager node in this design require 266GB of free space for data. To

collect the required number of metrics, a 1 TB VMDK will be added to each analytics cluster

node.

5.6.3.4.3 Network for vRealize Operations Manager Nodes

In this design, the clusters of vRealize Operations Manager will be placed in application isolated

networks for secure access, load balancing, portability, and functionality-specific subnet

allocation.

5.6.3.4.4 High Availability for vRealize Operations Manager Nodes

To protect the vRealize Operations Manager virtual machines from a host-level failure, this design

configures vSphere DRS to run the virtual machines for the analytics cluster and for the remote

collectors on different hosts in the management cluster.

Table 58 DRS Cluster Anti-Affinity Rule for vRealize Operations Manager Nodes

Rule Attribute Value

Name vropscluster-antiaffinity-rule

Enable rule Yes

Type Separate Virtual Machines

Members vrops master node

vrops master replica node

vrops data node 1

vrops-data node 2

5.6.3.5 vRealize Operations Remote Collector Nodes

Unlike the analytics cluster nodes, remote collector nodes have only the collector role. Deploying

two remote collector nodes in each region does not increase the number of monitored objects, but

removes the load from the analytics cluster by collecting metrics from applications that do not fail

over between regions in a multi-region environment. This means that collectors are assigned when

configuring the monitoring solution.

5.6.3.5.1 Compute for vRealize Operations Remote Collector Nodes

In this design, the remote collector nodes are deployed on the management cluster. The nodes are

identically sized and consist of the following compute resources in each region:

Table 59 Remote Collector Node Sizes

Node vCPU Memory

Remote Collector 1 4 16 GB

Remote Collector 2 4 16 GB


5.6.3.5.2 Storage for Remote Collector Nodes

In this design the remote collector nodes are with thin-provisioned disks. Because remote

collectors do not perform analytics operations or store data, the default VMDK size is sufficient

for this design.

5.6.3.5.3 High Availability for vRealize Operations Manager Nodes

To protect the vRealize Operations Manager virtual machines from a host-level failure, this design

configures vSphere DRS to run the virtual machines for the analytics cluster and for the remote

collectors on different hosts in the management cluster.

Table 60 DRS Cluster Anti-Affinity Rule for vRealize Operations Remote Collector Nodes

Rule Attribute Value

Name vropscollector-antiaffinity-rule

Enable rule Yes

Type Separate Virtual Machines

Members vrops remote collector 1

vrops remote collector 2

5.6.3.6 Networking for vRealize Operations Manager

In this design, the clusters of vRealize Operations Manager will be placed in application isolated

networks for secure access, load balancing, portability, and functionality-specific subnet

allocation.


Figure 36 Networking Design of the vRealize Operations Manager Deployment

5.6.3.6.1 Application Isolated Network Design

In the single-region design, the two logical entities of the vRealize Operations Manager

deployment, the analytics cluster and remote collectors in the central cloud is installed in an

application isolated network. For multi-region, a third entity (i.e., remote collectors in an

additional cloud region) is also deployed on an application isolated network. As part of this

configuration, an NSX Edge services gateway will be placed in front of the application isolated

network to provide routing and load balancing. This networking design contains the following

features:


Each application virtual network of vRealize Operations Manager has connection to the

application virtual networks of vRealize Automation and vRealize Log Insight through a

dedicated network calle


5.6.3.6.3 DNS Names

vRealize Operations Manager node name resolution uses a region-specific suffix, such as

central.tornado.local or regiona.tornado.local, the analytics nodes IP

addresses and the load balancer virtual IP address (VIP) are mapped to the root domain suffix

tornado.local. Access from the public network is provided through a VIP, the traffic to

which is handled by the NSX Edge service gateway. The following table shows example names

that can be used in a single- and multi-region design.

Table 62 DNS Names for the Application Virtual Networks

vRealize Operations Manager DNS

Name

Node Type

vrops-cluster-01.tornado.local Virtual IP of the analytics cluster

vrops-mstrn-01.tornado.local Master node in the analytics cluster

vrops-repln-02.tornado.local Master replica node in the analytics cluster

vrops-datan-03.tornado.local First data node in the analytics cluster

vrops-datan-04.tornado.local Second data node in the analytics cluster

vrops-rmtcol-01.central.tornado.local First remote collector node in the central cloud

vrops-rmtcol-02.central.tornado.local Second remote collector node in the central

cloud

vrops-rmtcol-51.regiona.tornado.local First remote collector node in any additional

cloud region

vrops-rmtcol-52.regiona.tornado.local Second remote collector node in any additional

cloud region

5.6.3.6.4 Networking for Failover and Load Balancing

Each node in the vRealize Operations Manager analytics cluster runs a Tomcat server instance for

access to the product user interface. By default, vRealize Operations Manager does not provide a

solution for load-balanced UI users’ sessions across nodes in the cluster. The lack of load

balancing for users’ sessions results in the following limitations:

Cloud admin must know the URL of each node to access the UI. As a result, a single node

might be overloaded if all cloud admin actors access it at the same time.

Each node supports up to four simultaneous cloud admin sessions.

Taking a node offline for maintenance might cause an outage. Cloud admin’s cannot access

the UI of the node when the node is offline.

To avoid such problems, analytics cluster is placed behind an NSX load balancer that is

configured to allow up to four connections per node. The load balancer must distribute the load

evenly to all cluster nodes. In addition, the load balancer is configured to redirect service requests

from the UI on port 80 to port 443. Load balancing and access to and from the public network is

not required for the remote collector nodes.

5.6.3.7 Security and Authentication

vRealize Operations Manager can use several sources for authentication. These sources include an

Active Directory service, vCenter Server, and local user inventory. Active Directory is used as the

primary authentication and authorization method in this design.


5.6.3.8 Identity Sources

The cloud admin will authenticate in vRealize Operations Manager using Active Directory

authentication. This provides access to vRealize Operations Manager by using standard Active

Directory accounts and ensures that authentication is available even if vCenter Server becomes

unavailable

5.6.3.9 Encryption

Access to all vRealize Operations Manager Web interfaces requires an SSL connection. By

default, vRealize Operations Manager uses a self-signed certificate. In this design, the default self-

signed certificates is replaced with a Certificate Authority (CA) signed certificate to provide

secure access to the vRealize Operations Manager user interface.

5.6.3.10 Monitoring and Alerting

vRealize Operations Manager can monitor itself and display the following administrative alerts:

System alert. A component of the vRealize Operations Manager application has failed.

Environment alert. vRealize Operations Manager has stopped receiving data from one or

more resources. Such an alert might indicate a problem with system resources or network

infrastructure.

Log Insight log event. The infrastructure on which vRealize Operations Manager is running

has low-level issues. You can also use the log events for root cause analysis.

Custom dashboard. vRealize Operations Manager can show super metrics for datacenter

monitoring, capacity trends and single pane of glass overview.

In order to enable the aforementioned alerts and events, the defaults must be configured in

vRealize Operations Manager to enable SMTP outbound alerts. An SMTP service is included in

the design which utilizes the SoftLayer SMTP service if no client on premises SMTP service is

provided. If the client has an existing SMTP service, then this will be configured. Additionally, the

design will incorporate deeper root cause analysis and infrastructure alerting by containing the

management pack for vRealize Log Insight.

5.6.3.11 Management Packs

This design contains several VMware products for network, storage, and cloud management. In

order to monitor and perform diagnostics on all these items, the following management packs are

used:

Management Pack for VMware vCenter Server (installed by default)

Management Pack for NSX for vSphere

Management Pack for Storage Devices

Management Pack for vRealize Log Insight

Management Pack for vRealize Automation


5.6.4 Log Consolidation and Analysis

In each region of this design, a vRealize Log Insight cluster is configured with three nodes. This allows for

continued availability and increased log ingestion rates.

Figure 38 Logical Design of vRealize Log Insight

5.6.4.1 Sources of Log Data

vRealize Log Insight collects logs from the following virtual infrastructure and cloud management

components:

Management vCenter Server


o vCenter Server

Compute vCenter Server


o vCenter Server

Management, Edge and Compute ESXi hosts

NSX for vSphere for the management and for compute and edge clusters

o NSX Manager

o NSX Controller instances

o NSX Edge instances

vRealize Automation

o vRealize Orchestrator

o vRealize Automation components

vRealize Operations Manager

o Analytics cluster nodes

5.6.4.2 Cluster Nodes

The vRealize Log Insight cluster consists of one master node and two worker nodes. The

Integrated Load Balancer (ILB) on the cluster is configured to have vRealize Log Insight to

balance incoming traffic fairly among available nodes. vRealize Log Insight clients, using both


the Web user interface, and ingestion through syslog or the Ingestion API, connect to vRealize

Log Insight at the ILB address.

5.6.4.3 Sizing

In this design, the vRealize Log Insight virtual appliance has 2 vCPUs, 4 GB of virtual memory,

and 144 GB of disk space provisioned. vRealize Log Insight uses 100 GB of the disk space to

store raw, index and metadata.

5.6.4.4 Sizing Nodes

To accommodate all of log data from the products in this design, the Log Insight nodes must be

sized properly.

Table 63 Node Sizing


Appliance size Medium

Number of CPUs 8

Memory 16 GB

IOPS 1,000 IOPS

Amount of processed log data 38 GB/day

Number of process log messages 7,500

Environment Up to 250 syslog connections per node

Disk size 450 GB (see below)

5.6.4.5 Sizing Storage

For this design we will retain 7 days of data. Using the following calculations for the disk space

needs:

For 250 syslog sources at a rate of 150 MB of logs ingested per-day per-source over 7 days:

250 sources * 150 MB of log data ≈ 37 GB log data per-day

37 GB * 7 days ≈ 260 GB log data per vcraRealize Log Insight node

260 GB * 1.7 overhead index ≈ 450 GB

Note: vRealize Log Insight supports virtual hard disks of up to 2 TB. If more capacity is needed,

add another virtual hard disk. Do not extend existing retention virtual disks.


5.6.4.6 Networking Design

In a multi-region deployment of the design, the vRealize Log Insight instances are connected to

both the vSphere management (gray in the network diagram) and the external management (blue

in the network diagram) networks. Each vRealize Log Insight instance is deployed within its own

application isolated network (gray boxes in the network diagram).

Figure 39 Networking Design for the vRealize Log Insight Deployment

5.6.4.7 Application Isolated Network Design

Each of the two instances of the vRealize Log Insight deployment is installed in its own isolated

application network. An NSX Edge appliance is configured at the front of each isolated

application network to provide the network isolation. NSX Edge services gateway is deployed in

front of the application isolated network to provide routing and load balancing. This networking

design has the following features:

Each application virtual network of vRealize Log Insight has connection to the application

virtual networks of vRealize Automation and vRealize Operations Manager through a


dedicated network called networkExchange. The role of networkExchange is to support

transit traffic and the exchange of routing tables.

All nodes have routed access to the vSphere management network through the Management

NSX Edge for the home region.

Routing to the vSphere management network and the external network is dynamic, and is

based on the Open Shortest Path First (OSPF) protocol.

The NSX Edge instances for the vRealize Log Insight are configured to use Source NAT

(SNAT) address translation when the vRealize Log Insight nodes access the public network.

The NSX Edge instances for the vRealize Log Insight provide access to vRealize Log Insight

from the public network over Destination NAT (DNAT).

Figure 40 Application Virtual Networks in the vRealize Log Insight Topology

5.6.4.8 IP Subnets

The following example subnets are allocated to the vRealize Log Insight deployment:

Table 64 IP Subnets in the Application Isolated Networks

vRealize Log Insight Cluster IP Subnet

Central Cloud 192.168.31.0/24

Cloud Region 192.168.32.0/24

5.6.4.9 DNS Names

vRealize Log Insight node name resolution uses a region-specific suffix, such as

central.tornado.local or lax01.tornado.local, including the load balancer virtual

IP addresses (VIPs). The Log Insight components in both regions have the following node names:

Table 65 Example DNS names of Log Insight nodes

DNS Name Role Region

vrli-cluster-01.central.tornado.local Log Insight ILB VIP A


vrli-mstr01.central.tornado.local Master node A

vrli-wrkr01.central.tornado.local Worker node A

vrli-wrkr02.central.tornado.local Worker node A

vrli-cluster-51.regiona.tornado.local Log Insight ILB VIP B

vrli-mstr51.regiona.tornado.local Master node B

vrli-wrkr51.regiona.tornado.local Worker node B

vrli-wrkr52.regiona.tornado.local Worker node B

5.6.4.10 Retention and Archiving

In vRealize Log Insight, configure log retention for one week and archiving on storage sized for

90 days according to the vRealize Log Insight Design document.

5.6.4.10.1 Retention

vRealize Log Insight virtual appliances contain three default virtual disks and can use addition

virtual disks for storage, for example, hard disk 4.

Table 66 Virtual Disk Configuration in the vRealize Log Insight Virtual Appliance

Hard Disk Size Usage

Hard disk 1 12.125 GB Root file system

Hard disk 2 270 GB for

medium-size

deployment

Contains two partitions:

/storage/var System logs

/storage/core Storage for

Collected logs.

Hard disk 3 256 MB First boot only

Hard disk 4

(additional virtual

disk)

190 GB Storage for collected logs. The capacity from

this disk is added to /storage/core.

Calculate the storage space that is available for log data in the following way:

/storage/core = hard disk 2 space + hard disk 4 space - system

logs space on hard disk 2

Based on the size of the default and additional virtual disks, the storage core is equal to

/storage/core = 270 GB + 190 GB - 20 GB = 440 GB

Retention = /storage/core – 3% * /storage/core

If /storage/core is 425 GB, vRealize Log Insight can use 413 GB for retention.

Retention = 440 GB - 3% * 440 ≈ 427 GB

Configure a retention period of 7 days for the medium-size vRealize Log Insight appliance.


5.6.4.10.2 Archiving

vRealize Log Insight archives log messages as soon as possible. In the same time, they remain

retained on the virtual appliance until the free local space is almost over. Data exists on both the

vRealize Log Insight appliance and the archive location for most of the retention period. The

archiving period must be longer than the retention period.

The archive location must be on an NFS version 3 shared storage. The NFS export is mounted

directly to the vRealize Log Insight nodes, configured via the management web interface.

This design will apply an archive policy of 90 days for the medium-size vRealize Log Insight

appliance which takes up approximately about 1 TB of shared storage.

5.6.4.11 Alerting

vRealize Log Insight supports alerts that trigger notifications about its health. The following types

of alerts exist in vRealize Log Insight:

System Alerts. vRealize Log Insight generates notifications when an important system

event occurs, for example when the disk space is almost exhausted and vRealize Log

Insight must start deleting or archiving old log files.

Content Pack Alerts. Content packs contain default alerts that can be configured to send

notifications, these alerts are specific to the content pack and are disabled by default.

User-Defined Alerts. Administrators and users can define their own alerts based on data

ingested by vRealize Log Insight.

vRealize Log Insight handles alerts in two ways:

Send an e-mail over SMTP

Send to vRealize Operations Manager

5.6.4.12 SMTP Notification

E-mail notifications are enabled for alerts in vRealize Log Insight to point to the SMTP relay

service residing on the active directory server VMs.

5.6.4.13 Integration with vRealize Operations Manager

vRealize Log Insight integrates with vRealize Operations Manager to provide a central location

for monitoring and diagnostics.

vRealize Log Insight integrates with vRealize Operations Manager in the following ways:

Notification Events. Forward notification events from vRealize Log Insight to vRealize

Operations Manager.

Launch in Context. Launch vRealize Log Insight from the vRealize Operation Manager user

interface. This requires the vRealize Log Insight management pack installed in vRealize

Operations Manager.

5.6.4.14 Security and Authentication

vRealize Log Insight deployment utilizes a centralized role-based authentication using Microsoft

Active Directory as the authority.

5.6.4.15 Authentication

Role-based access control is enabled in vRealize Log Insight by using the existing tornado.local

Active Directory domain.


5.6.4.16 Encryption

The default self-signed certificates are replaced with a CA-signed certificate to provide secure

access to the vRealize Log Insight Web user interface from the designs AD CA generated certs.

5.6.4.17 Configuration for Collecting Logs

Client applications send logs to vRealize Log Insight in one of the following ways:

Directly to vRealize Log Insight over the syslog protocol.

By using vRealize Log Insight agents.

Both of these are supported in this design for the different applications that provide the cloud

management platform.

5.6.4.18 Time Synchronization

Time synchronization is critical for the core functionality of vRealize Log Insight. vRealize Log

Insight synchronizes time with the SoftLayer NTP service.

Consistent NTP sources on all systems are configured that send log data (vCenter Server, ESXi,

vRealize Operation Manager). See Time Synchronization under Common Services.

5.6.4.19 Connectivity in the Cluster

This design requires that all vRealize Log Insight cluster nodes within a region are connected to

the same LAN with no firewall or NAT between the nodes.

5.6.4.20 External Communication

vRealize Log Insight receives log data over the syslog TCP, syslog TLS/SSL, or syslog UDP

protocol. The default syslog UDP protocol is used in this design, because security is already

designed at the level of the management network.

5.6.4.21 Event Forwarding between Regions

vRealize Log Insight supports event forwarding to other clusters and standalone instances. While

forwarding events, the vRealize Log Insight instance still ingests, stores and archives events

locally.

5.6.4.22 Event Forwarding Protocol

Forwarding of syslog data in vRealize Log Insight is achieved by using the Ingestion API or a

native syslog implementation.

The vRealize Log Insight Ingestion API uses TCP communication. In contrast to syslog, the

forwarding module supports the following features for the Ingestion API:

Forwarding to other vRealize Log Insight instances

Both structured and unstructured data, that is, multi-line messages.

Metadata in the form of tags

Client-side compression

Configurable disk-backed queue to save events until the server acknowledges the ingestion.

5.6.4.23 Disaster Recovery

Each region is configured to forward log information to the vRealize Log Insight instance in the

other region. Configuration of failover is not required.


5.6.5 Patching

VMware vSphere Update Manager (vUM) automates patch management and eliminates manual tracking

and patching of vSphere hosts and virtual machines. It compares the state of vSphere hosts with baselines,

then updates and patches to enforce compliance. Although vSphere Update Manager can be installed on the

same server as a vCenter Server, this design utilizes a vUM server installed on its own Windows Server

virtual machine due to the size of the environment and the use of vCenter Server Appliance. Additionally,

this design deploys two unique vUM servers in each region that is associated and registered to a unique

vCenter Server. This means that one vUM server is registered to the vCenter Server managing the

management clusters and another vUM server registered to the vCenter Server managing the capacity and

edge clusters.

In addition to associating vSphere Update Manager with a unique vCenter Server, vUM requires the use of

a dedicated database. In this design, the vUM associated with updating the capacity and edge clusters

connect to an instance of the Microsoft SQL Server 2012 database running on the same virtual machine.

The vUM associated with the management cluster uses the Microsoft SQL Server 2012 database that is

installed on the same machine as Update Manager.

5.6.5.1 vUM for vCenter Managing the Management Cluster

A vUM instance is linked to a vCenter Server instance. The following section describes the

configuration the vUM instance for the vCenter that manages the Management Cluster.

5.6.5.1.1 Compute and Storage Design

In this design, vUM is installed on a virtual machine running Windows 2012 Server with an

instance of Microsoft SQL Server 2012 installed. The virtual machine resides on the management

cluster and utilize a VSAN-backed datastore for storage. The resources for this virtual machine are

as follows in Table 67 Compute Resources for vUM vCenter Managing the Management Cluster.

Note that disk space was calculated using the vSphere Update Manager Sizing Estimator and

includes a 1.5 GB monthly utilization rate.

Table 67 Compute Resources for vUM vCenter Managing the Management Cluster

vCPU Memory Disk Space Disk Type

2 4 GB 60 GB Thin

5.6.5.1.2 Network Design

In this design, vUM is placed on the management network which will provide appropriate access

for it to upgrade and patch ESXi hosts, install and update third-party software on hosts, and

upgrade virtual machine hardware, VMware tools, and virtual appliances.

5.6.5.2 vUM for vCenter Managing Compute and Edge Clusters

A vUM instance is linked to a vCenter Server instance. The following section describes the

configuration the vUM instance for the vCenter that manages the Edge and Compute Clusters.

5.6.5.2.1 Compute and Storage Design

In this design, vUM is installed on a virtual machine running Windows 2012 Server and is

connected to instance of Microsoft SQL Server 2012 installed on the same VM. The virtual

machine resides on the management cluster and utilize a VSAN-backed datastore for storage. The

resources for this virtual machine are as follows in Table 68 Compute Resources for vUM vCenter

Managing the Compute and Edge Clusters. Note that disk space was calculated using the vSphere

Update Manager Sizing Estimator and includes a 1.5 GB monthly utilization rate.

Table 68 Compute Resources for vUM vCenter Managing the Compute and Edge Clusters

vCPU Memory Disk Space Disk Type


2 4 GB 60 GB Thin

5.6.5.2.2 Network Design

In this design, vUM is placed on the management network which provides appropriate access for

it to upgrade and patch ESXi hosts, install and update third-party software on hosts, and upgrade

virtual machine hardware, VMware tools, and virtual appliances.


This design uses vRealize Business Standard Edition to provide metering and chargeback functionality for

the cloud. vRealize Business provides director of cloud operations capabilities that allow them to gain

greater visibility into financial aspects of their IaaS services delivery and enable them to optimize and

improve these operations.

5.7.1.1 vRealize Business Standard Design

The following figure presents the design of vRealize Business Standard:

Figure 41 vRealize Business Logical Design

Data Collector

This component is responsible for connecting to vCenter Server instances and retrieving both

inventory information (servers, virtual machines, clusters, storage devices, and associations

between them) and usage (CPU and memory) statistics.

Reference Database

This component is responsible for providing default, out-of-the-box costs for each of the

supported cost drivers. References are updated periodically. For this solution, the reference

database is downloaded and installed manually. The new values affect cost calculation. Reference

data that is used depends on currency selected during installation. USD is used by default.

Communication between Data Collector and the Server

Data collector and the server communicate through a database. Data collector writes to the

database and the server reads the data. Data collector keeps inventory information time stamped,

so it is possible to retrieve and view the inventory back in time. The architecture of data collector

tables is flexible, and stores properties retrieved from the vCenter Server in key-value pairs.


5.7.1.2 vRealize Business Scalability

vRealize Business Standard Edition scales up to 20,000 virtual machines across four VMware

vCenter Server instances. This design implements one vRealize Business appliance for each

deployed vCenter.

vRealize Business Standard is deployed as a single virtual appliance hosted in the management

network of vRA. The appliance is configured with 2 vCPU, 4 GB of RAM and 50 GB disk.

5.7.1.3 vRealize Business Standard Integration

vRealize Business Standard Edition is integrated with vCenter Server and extracts the inventory

list from it. The inventory list contains information related to virtual machines configuration

information, ESXi hosts and cluster capacity, storage profiles and capacity, vCenter Server

attributes and tags.

vRealize Business Standard Edition has tight integration with vRealize Automation. It uses the

common services of vRealize Automation framework like SSO authentication and authorization.

The Infrastructure as a Service (IaaS) component of vRealize Automation consumes the base rate

APIs of vRealize Business Standard Edition to compute blueprint price of virtual machine.


Appendix A – Bare Metal Summary

The bare metal servers used in this design consist of the following components found in the tables for

management, compute, and edge clusters below.

Management Cluster Nodes

The following table shows the bill of materials for the SoftLayer bare metal servers in the management

cluster.

Table 69 Management - Bare Metal Bill of Materials

Component Manufacturer / Model

Chassis SuperMicro PIO-628U-TR4T+-ST031

Motherboard SuperMicro X10DRU-i+_R1.02b

Processor 2 x 2.6GHz Intel Xeon-Haswell (E5-2690-V3-DodecaCore)

Memory 16 x 16GB

Network Interface Card 4 x Intel Ethernet Controller 10 Gigabit X540-AT2

Disk Controller Avago Technologies MegaRAID SAS 9361-8i

Disks 2 x 1 TB Seagate Constellation ES

2 x 1.2 TB Intel S3710

8 x 2 TB Western Digital

Compute Cluster Nodes

The following table shows the bill of materials for the SoftLayer bare metal servers in the compute cluster.

Table 70 Compute - Bare Metal Bill of Materials





Memory 16 x 32GB







Edge Cluster Nodes

The following table shows the bill of materials for the SoftLayer bare metal servers in the Edge cluster.

Table 71 Edge - Bare Metal Bill of Materials





Memory 8 x 16GB







Appendix B – Software Bill of Materials The following software products and versions are used in this design for the cloud management platform.

This does not include any software products that are deployed by users when utilizing the cloud

management platform.

Table 72 Software Bill of Materials

Cloud Component Product Item Version

Virtual Infrastructure ESXi 6.0 U1b

vCenter Server Appliance (VIMISO) 6.0 U1

Virtual SAN 6.0 U1

vSphere Replication 6.1

VMware vCenter Site Recovery Manager 6.1

NSX for vSphere 6.2.1

Cloud Management vRealize Automation Appliance 6.2.3

vRealize Automation Identity Appliance 6.2.3

vRealize Orchestrator 6.0.3

vRealize Orchestrator Plug-in for NSX 1.0.2

Service Management vRealize Operations Manager Appliance 6.1.0

Management Pack for NSX for vSphere 2.0

Management Pack for vRealize Log Insight 1.0

Manager Management Pack for vRealize

Automation

1.0

Management Pack for Storage Devices 1.0

vRealize Log Insight 3.0

Business Continuity vSphere Data Protection 6.1

Business Management vRealize Business Standard 6.2.3


Cloud Component Product Item Version

Patching vSphere Update Manager 6.0U1b

Infrastructure Microsoft SQL Server 2012 R2

Microsoft Windows Server 2012 R2

Ubuntu Server 14.04 LTS

Software Orchestration Chef Server 12

Salt Stack 2015.8.1


Appendix C – Management Virtual Machine Summary The following virtual machines are configured in the management cluster for the cloud management

platform by default.

Table 73 List of Management Cluster Virtual Machines and Sizes

Function vCPU

vRAM

(GB)

vDisk

(GB)

Analytics Edge #1 6 8 4.5

Analytics Edge #2 6 8 4.5

Analytics Master 8 32 1024

Analytics Replica 8 32 1024

Certificate Authority Server - Master CA 2 4 60

Certificate Authority Server - Subordinate 2 4 60

Chef Server and Software Binaries 2 4 300

Collector Edge #1 6 8 4.5

Collector Edge #2 6 8 4.5

Data Node #1 8 32 1024

Data Node #2 8 32 1024

DEM Worker #1 4 8 60

DEM Worker #2 4 8 60

IaaS Web Server Appliance #1 4 4 60

IaaS Web Server Appliance #2 4 4 60

Identity Appliance 1 2 10

Log Insight Edge #1 4 1 0.5

Log Insight Edge #2 4 1 0.5

Log Insight - Master 8 16 450

Log Insight - Slave #1 8 16 450

Log Insight - Slave #2 8 16 450

Management North-South Edge #1 6 8 4.5

Management North-South Edge #2 6 8 4.5

Model Manager and DEM Orchestrator 2 4 60

MS SQL Server 8 16 80

Primary AD, Master DNS and Master NTP 4 8 80

RDS Edge #1 4 1 0.5

RDS Edge #2 4 1 0.5

Remote Collector #1 4 16 250

Remote Collector #2 4 16 250


Function vCPU

vRAM

(GB)

vDisk

(GB)

Salt Stack Master 2 4 50

Secondary AD, Master DNS and Master NTP 4 8 80

Site Recovery Manager 4 8 60

vCenter - Compute and Edge 16 32 295

vCenter - Management 4 16 136

VMware Data Protection Appliance 4 12 8

vRealize Business Service 2 4 50

vRealize Automation Appliance #1 4 16 30

vRealize Automation Appliance #2 4 16 30

vRealize Automation Proxy Agent #1 2 4 60

vRealize Automation Proxy Agent #2 2 4 60

vRealize Edge #1 6 8 4.5

vRealize Edge #1 6 8 4.5

vRealize Orchestrator #1 2 4 16

vRealize Orchestrator #1 2 4 16

vSphere Disk Replicator 2 4 18

VMware Update Manager Edge #1 4 1 0.5

VMware Update Manager Edge #1 4 1 0.5

VMware Update Manager – Management 2 4 60

VMware Update Manager – Compute &

Edge 2 4 60

NSX Manager - Management 4 16 60

NSX Controller #1 – Management 4 4 20

NSX Controller #2 - Management 4 4 20

NSX Controller #3 - Management 4 4 20

NSX Manager – Compute and Edge 4 16 60

PSC – Management 2 2 30

PSC – Compute and Edge 2 2 30

TOTAL 255 536 8144


The following virtual machines are configured in the Edge cluster by default;

Table 74 List of Default Edge Cluster Virtual Machines

Function vCPU

vRAM

(GB)

vDisk

(GB)

Edge North-South Compute #1 6 8 4.5

Edge North-South Compute #1 6 8 4.5

Edge East-West #1 4 1 0.5

Edge East-West #2 4 1 0.5

NSX Controller #1 Compute 4 4 20



Grand Total 32 30 70


Appendix D – Maximum Configurations The Advanced VMware SDDC on IBM Cloud solution supports the following;

One Central Cloud

One business entity (although multiple tenants are supported, the design is not intended for resale

purposes)

Up to four Cloud Regions

A Central Cloud of up to 10,000 VMs or 1,000 compute nodes

Each Cloud Region up to 10,000 VMs or 1,000 compute nodes

Up to a total of 50,000 VM’s can be supported by a single central cloud portal. This includes any

virtual machines on a user’s premise and management VMs.

Up to 100 concurrent transactions

Up to 2,500 catalog items on the self service portal across all users and groups

Up to 48 nodes per compute cluster per central cloud or cloud region

Up to 3 compute clusters per central cloud or cloud region


Appendix E – Compatibility Guide

Browsers

The following web browsers are supported by the design,

Internet Explorer 10

Internet Explorer 11

Google Chrome

Mozilla Firefox

Guest Operating Systems

The following operating systems are supported as provisioned virtual machines,

Windows 7

Windows 8

Windows 8.1


Windows Server 2012


RHEL 5.9

RHEL 5.10

RHEL 6.1 Server

RHEL 6.4 Server

RHEL 6.5 Server

RHEL 7.0 Server

SLES 11 SP 2

SLES 11 SP 3

CentOS 5.10

CentOS 6.4

CentOS 6.5

CentOS 7.0

Ubuntu 12.04 LTS

Ubuntu 13.10

ESX 4.1 Update 2

ESXi 4.1 Update 2

ESXi 5.1 and updates

ESXi 5.5 and updates

ESXi 6.0


Appendix F – VMware VSAN Supported Configuration VMware has validated the VSAN configuration described in this architecture and has provided the

following statement of support as RPQ1107:

Supported Configuration VMware will provide support for the following supported configuration:

Support for Virtual SAN using Avago (LSI) 9361-8i IO Controller deployment with the recommendation

given below:

ESXi Version: VMware ESXi 6.0.0 Update 2 (bld#3800324)

Driver version: megaraid_sas Version 6.610.15.00 (bld# 2494585)

Firmware version: 4.650.00-6383

SSD Model: INTEL SSDSC2BA012T4

HDD Model: WDC WD2000FYYZ-01UL1B2

Mode supported: RAID-0 (Refer KB article http://kb.vmware.com/kb/2111266 for creating RAID-

0)

Checksum must remain enabled (enabled by default in VSAN 6.2)

Refer to the VSAN IO timeout settings mentioned in KB article:

http://kb.vmware.com/kb/2144936

To avoid high latencies on HDDs, keep the working set that fits the caching device

Supported with Write cache turned Off on HDD

Supported with no VMs on VMFS

Maximum number of drives tested with: 10

Hot-Plug feature has not been tested

Restrictions The VMware support policy purchased by SoftLayer will now cover the configuration described above.

Deviations from this configuration or any other VMware supported configuration will be considered

unsupported.

Please see https://www.vmware.com/support/policies/policy_index.html for VMware support policy

details.

Installation and Upgrade Support No upgrade or patching is supported to the recommended configuration without VMware consent or

recommendation.

Support Duration Support shall commence on the 26-Jul-2016 and would be supported for a period of 1 year. If additional

support is desired for this RPQ, a request stating your desired support period will need to be submitted to

VMware for consideration.

Transition from one-off (RPQ) to general support Currently there are no plans to include this in any GA release. But as and when the above functionality is

included in a future GA VMware build, VMware requires that SoftLayer transition to this new release in

order to obtain ongoing support. VMware will support SoftLayer in the use of the “supported

configuration” for a period of no more than 90 days following the GA release. This grace period is

provided to facilitate transition to the new release. Once the transition has occurred, SoftLayer may obtain

support through the standard support processes and policies. At this time, or at the conclusion of the 90 day

grace period, whichever comes first, support under this RPQ will terminate.

Date post:	19-Jun-2018
Category:	Documents
Upload:	nguyenanh
View:	217 times
Download:	0 times

VMware SDDC on IBM Cloud - Advanced Detailed Design 4 of 133 Copyright IBM and VMware 5.5.6...

Documents