Copyright IBM and VMware Page 1 of 133
VMware SDDC on IBM Cloud - Advanced
Detailed Design
Date: 16th March 2016
Version: 1.1
Page 2 of 133 Copyright IBM and VMware
Table of Contents
1 Introduction ............................................................................................................................... 8
1.1 Pre-requisites...................................................................................................................... 8
1.2 Summary of Changes ......................................................................................................... 9
2 System Context ...................................................................................................................... 10
2.1 Actors ................................................................................................................................ 10
2.2 Systems ............................................................................................................................ 11
3 Architecture Overview ............................................................................................................ 12
3.1 Physical Infrastructure ...................................................................................................... 12
3.2 Virtual Infrastructure ......................................................................................................... 13
3.3 Infrastructure Management .............................................................................................. 13
3.4 Common Services ............................................................................................................ 13
3.5 Cloud Management Services ........................................................................................... 13
3.6 Operational Services ........................................................................................................ 13
3.7 Business Services ............................................................................................................ 14
4 Logical Operational Model ...................................................................................................... 15
4.1 Logical Operational Model Structure ................................................................................ 15
4.2 Central Cloud .................................................................................................................... 17
4.3 Physical Infrastructure ...................................................................................................... 17
4.3.1 Cluster Architecture ................................................................................................. 17
4.3.2 Physical Network ..................................................................................................... 19
4.3.3 Physical Storage ...................................................................................................... 19
4.4 Virtual Infrastructure ......................................................................................................... 20
4.4.1 Compute Virtualization ............................................................................................. 20
4.4.2 Storage Virtualization ............................................................................................... 21
4.4.3 Network Virtualization .............................................................................................. 21
4.5 Infrastructure Management .............................................................................................. 26
4.5.1 Compute Management ............................................................................................ 26
4.5.2 Storage Management .............................................................................................. 26
4.5.3 Network Management .............................................................................................. 27
4.6 Common Services ............................................................................................................ 27
4.6.1 Identity and Access Services ................................................................................... 27
4.6.2 Domain Name Services ........................................................................................... 28
4.6.3 NTP Services ........................................................................................................... 28
4.6.4 SMTP Services ........................................................................................................ 28
4.6.5 Certificate Authority Services ................................................................................... 28
4.7 Cloud Management Services ........................................................................................... 28
Copyright IBM and VMware Page 3 of 133
4.7.1 Service Catalog ........................................................................................................ 28
4.7.2 Self-Service Portal ................................................................................................... 28
4.7.3 Infrastructure and Process Orchestration ................................................................ 29
4.7.4 Software Orchestration ............................................................................................ 29
4.8 Operational Services ........................................................................................................ 29
4.8.1 Backup and Restore ................................................................................................ 29
4.8.2 Disaster Recovery .................................................................................................... 30
4.8.3 Monitoring ................................................................................................................ 32
4.8.4 Log Consolidation and Analysis ............................................................................... 33
4.8.5 Patching ................................................................................................................... 34
4.9 Business Services ............................................................................................................ 34
4.9.1 Business Management ............................................................................................ 34
4.9.2 IT Financials ............................................................................................................. 34
4.9.3 IT Benchmarking ...................................................................................................... 34
4.10 Cloud Region ................................................................................................................ 35
5 Physical Operational Model .................................................................................................... 36
5.1 Physical Layer .................................................................................................................. 36
5.1.1 Compute .................................................................................................................. 36
5.1.2 Storage .................................................................................................................... 38
5.1.3 Network .................................................................................................................... 38
5.2 Virtual Infrastructure ......................................................................................................... 39
5.2.1 Compute Virtualization ............................................................................................. 39
5.2.2 Storage Virtualization ............................................................................................... 39
5.2.3 Network Virtualization .............................................................................................. 42
5.3 Infrastructure Management .............................................................................................. 63
5.3.1 vCenter Server Instances ........................................................................................ 63
5.4 Common Services ............................................................................................................ 66
5.4.1 Identity and Access Services ................................................................................... 66
5.4.2 Domain Name Services ........................................................................................... 68
5.4.2.2 DNS Configuration Requirements ....................................................................... 69
5.4.3 NTP Services ........................................................................................................... 70
5.4.4 SMTP Services ........................................................................................................ 70
5.4.5 Certificate Authority Services ................................................................................... 71
5.5 Cloud Management Services ........................................................................................... 72
5.5.1 Cloud Management Physical Design ....................................................................... 74
5.5.2 vRealize Automation Supporting Infrastructure ....................................................... 80
5.5.3 vRealize Automation Cloud Tenant Design ............................................................. 80
5.5.4 vRealize Automation vSphere Integration Design ................................................... 85
5.5.5 Infrastructure Source Endpoints .............................................................................. 89
Page 4 of 133 Copyright IBM and VMware
5.5.6 Virtualization Compute Resources .......................................................................... 89
5.5.7 Process Orchestration ............................................................................................. 90
5.5.8 Software Orchestration ............................................................................................ 94
5.5.9 Infrastructure Orchestration ..................................................................................... 96
5.6 Operational Services ........................................................................................................ 96
5.6.1 Backup and Restore ................................................................................................ 96
5.6.2 Disaster Recovery .................................................................................................. 101
5.6.3 Monitoring .............................................................................................................. 106
5.6.4 Log Consolidation and Analysis ............................................................................. 114
5.6.5 Patching ................................................................................................................. 121
5.7 Business Services .......................................................................................................... 122
Appendix A – Bare Metal Summary ............................................................................................ 124
Management Cluster Nodes .................................................................................................... 124
Compute Cluster Nodes ........................................................................................................... 124
Edge Cluster Nodes ................................................................................................................. 125
Appendix B – Software Bill of Materials ...................................................................................... 126
Appendix C – Management Virtual Machine Summary............................................................... 128
Appendix D – Maximum Configurations ...................................................................................... 131
Appendix E – Compatibility Guide ............................................................................................... 132
Browsers .................................................................................................................................. 132
Guest Operating Systems ........................................................................................................ 132
Table of Figures
Figure 1 VMware SDDC on IBM Cloud Introduction 8
Figure 2 VMware SDDC on IBM Cloud System Context 10
Figure 3 VMware SDDC on IBM Cloud Architecture Overview 12
Figure 4 Logical Structure View 15
Figure 5 Component Interaction Diagram 16
Figure 6 Logical Operational Model 17
Figure 7 Logical Cluster Structure 18
Figure 8 Network Virtualization 22
Figure 9 Dual-Region Data Protection Architecture 30
Figure 10 Disaster Recovery Architecture 31
Figure 11 vRealize Operations Manager Architecture 32
Figure 12 Physical Operational Model - Virtual Servers, Networking and Clusters 36
Copyright IBM and VMware Page 5 of 133
Figure 13 Network connections per physical node 39
Figure 14 VSAN concept 40
Figure 15 Network Switch Design for Management Hosts 44
Figure 16 Network Switch Design for Edge Hosts 47
Figure 17 Network Switch Design for Compute Hosts 49
Figure 18 Network Virtualization Conceptual Design 54
Figure 19 Cluster Design for NSX for vSphere 55
Figure 20 Virtual Application Network Components and Design 59
Figure 21 vRA Virtual Network Design 60
Figure 22 Virtual Application Network Configuration in Central Cloud and Cloud Region 62
Figure 23 vCenter Server and PSC Deployment Model 64
Figure 24 vRealize Automation Conceptual Design 72
Figure 25 vRealize Automation Design Overview for Central Cloud 75
Figure 26 vRealize Automation Design Overview for Additional Cloud Regions 76
Figure 27 Tenant Design for Single Region 81
Figure 28 Tenant Design for Two Regions 82
Figure 29 vRealize Automation Integration with vSphere Endpoint – Central Cloud 87
Figure 30 vRealize Automation Integration with vSphere Endpoint – Central Cloud and a Cloud
Region (Region A) 88
Figure 31 Template Synchronization 90
Figure 32 Software Orchestration Logical Design 95
Figure 33 vSphere Data Protection Logical Design 97
Figure 34 Logical Network Design for Cross-Region Deployment with Management Application
Network Container 102
Figure 35 Logical Design of vRealize Operations Manager Central Cloud and a Cloud Region
(Region A) Deployment 107
Figure 36 Networking Design of the vRealize Operations Manager Deployment 110
Figure 37 Application Virtual Networks in the vRealize Operations Manager Topology 111
Figure 38 Logical Design of vRealize Log Insight 114
Figure 39 Networking Design for the vRealize Log Insight Deployment 116
Figure 40 Application Virtual Networks in the vRealize Log Insight Topology 117
Figure 41 vRealize Business Logical Design 122
List of Tables
Table 1 VMware SDDC on IBM Cloud Interfaced Actors 10
Table 2 VMware SDDC on IBM Cloud Interfaced Systems 11
Table 3 vRealize Operations Manager Logical Node Architecture 33
Table 4 NFS Configuration for vSphere Data Protection and vRealize Log Insight 38
Page 6 of 133 Copyright IBM and VMware
Table 5 VSAN disk table 40
Table 6 VSAN policies 41
Table 7 VSAN object policy defaults 42
Table 8 VLAN Mapping to Traffic Types 42
Table 9 Management Cluster Distributed Switch 43
Table 10 Management Cluster Distributed Switch Port Group Configuration Settings 43
Table 11 Management Virtual Switch Port Groups and VLANs 45
Table 12 Management VMkernel Adapter 45
Table 13 Edge Cluster Distributed Switch 46
Table 14 Management Cluster Distributed Switch Port Group Configuration Settings 46
Table 15 Edge Virtual Switch Port Groups and VLANs 48
Table 16 Edge VMkernel Adapter 48
Table 17 Compute Cluster Distributed Switch 48
Table 18 Compute Cluster Distributed Switch Port Group Configuration Settings 49
Table 19 Compute Virtual Switch Port Groups and VLANs 50
Table 20 Compute VMkernel Adapter 50
Table 21 NSX Components Sizing 53
Table 22 Load Balancer Features 57
Table 23 Management Applications IP Addressing 61
Table 24 OSPF Area ID 62
Table 25 Specifications for Management vCenter Server Appliance 65
Table 26 Specifications for Platform Service Controller for Management Cluster 65
Table 27 Specifications for Compute and Edge vCenter Server Appliance 65
Table 28 Specifications for Platform Service Controller for Management Cluster 65
Table 29 Requirements for Active Directory Service 67
Table 30 Authentication types used 68
Table 31 Server Sizing 68
Table 32 Domain Naming Example 69
Table 33 SoftLayer DNS servers 70
Table 34 Time sources 70
Table 35 Root CA and Subordinate CA sizing 72
Table 36 Cloud Management Services Components 73
Table 37 Load Balancer Application Profile 78
Table 38 Load Balancer Service Monitoring Configuration 79
Table 39 Load Balancer Pool Specifications 79
Table 40 Virtual Server Characteristics 79
Table 41 Base Windows Server Blueprint 83
Copyright IBM and VMware Page 7 of 133
Table 42 Base Windows Blueprint Sizing 84
Table 43 Base Linux Server Blueprint 84
Table 44 Base Linux Blueprint Sizing 84
Table 45 vRealize Integration with vSphere 85
Table 46 vRealize Orchestrator Default Configuration Ports 91
Table 47 vRealize Orchestrator Default External Communication Ports 91
Table 48 vRO Service Monitor Specifications 93
Table 49 vRO Service Pool Characteristics 93
Table 50 vRO Virtual Server Characteristics 93
Table 51 Software Orchestration Components Sizing 96
Table 52 VMware vSphere Data Protection Performance 97
Table 53 Backup Jobs in Central Cloud 99
Table 54 Backup Jobs in Additional Cloud Region 101
Table 55 SRM Windows server sizing 105
Table 56 vSphere Replication Appliance 106
Table 57 Analytics Cluster Node Configurations 107
Table 58 DRS Cluster Anti-Affinity Rule for vRealize Operations Manager Nodes 108
Table 59 Remote Collector Node Sizes 108
Table 60 DRS Cluster Anti-Affinity Rule for vRealize Operations Remote Collector Nodes 109
Table 61 IP Subnets in the Application Virtual Network of vRealize Operations Manager 111
Table 62 DNS Names for the Application Virtual Networks 112
Table 63 Node Sizing 115
Table 64 IP Subnets in the Application Isolated Networks 117
Table 65 Example DNS names of Log Insight nodes 117
Table 66 Virtual Disk Configuration in the vRealize Log Insight Virtual Appliance 118
Table 67 Compute Resources for vUM vCenter Managing the Management Cluster 121
Table 68 Compute Resources for vUM vCenter Managing the Compute and Edge Clusters 121
Table 69 Management - Bare Metal Bill of Materials 124
Table 70 Compute - Bare Metal Bill of Materials 124
Table 71 Edge - Bare Metal Bill of Materials 125
Table 72 Software Bill of Materials 126
Table 73 List of Management Cluster Virtual Machines and Sizes 128
Table 74 List of Default Edge Cluster Virtual Machines 130
Page 8 of 133 Copyright IBM and VMware
1 Introduction VMware Software Defined Data Center (SDDC) on IBM Cloud allows existing VMware virtualized
datacenter clients to extend into the IBM Cloud. This permits uses like capacity expansion into the cloud
(and contraction when not needed), migration to the cloud, disaster recovery to the cloud, backup into the
cloud and the ability to stand up a dedicated cloud environment for development, test, training or lab.
This document details the design of the Advanced version of the VMware SDDC on IBM Cloud which
targets designs requiring high levels of scalability and multiple regions.
Figure 1 VMware SDDC on IBM Cloud Introduction
1.1 Pre-requisites
The design requires the following pre-requisites:
Client is required to acquire all necessary software licenses and/or keys for all products used in
this design prior to commencement of implementation
Client is required to provide SoftLayer account
Client is responsible for SoftLayer related charges as a result of this design’s implementation
Client is responsible for connectivity from this design to any on premises environment or systems
Client is responsible for connectivity into this design for access by administrators and end users
Client is responsible to acquire and provide domain name
Client is responsible to provide hostname prefixes for the SoftLayer bare metal devices
provisioned through this design
Client to provide connection details and necessary credentials for any systems external to this
design that are to be integrated (refer to system context for options)
Client is responsible for licensing of any software products provisioned with the design
Copyright IBM and VMware Page 9 of 133
1.2 Summary of Changes
This section records the history of significant changes to this document. Only the most significant changes
are described here.
Version Date Author Description of Change
1.0 16th Feb
2016
Simon Kofkin-Hansen
Richard Ehrhardt
Razvan Ionescu
Daniel de Araujo
Frank Chodacki
Bob Kellenberger
Bryan Buckland
Christopher Moss
Daniel Arrieta Alvarez
Initial Release of Document
1.1 16th March
2016
Minor reported spelling and grammar corrections.
1.2 30th Sept
2016
Corrected NIC model number in Appendix A,
added Appendix F
Page 10 of 133 Copyright IBM and VMware
2 System Context When depicting the VMware SDDC on IBM Cloud design as a single object, the following are the external
actors and systems that interface with the design.
SoftLayer
VMware on IBM Cloud
ClientOn PremisevSphere
Connects To
ManagesCloud
ConsumesCloud Services
ProvidesBare Metal Compute,
Storage, andNetwork
SendsEmails
TrustRelationship
for CloudServices
ExternalDomain
for CloudServices
TimeSynchronisation
for Cloud Services
BuildsEnvironment
ManagesServices
NTP
ClientDNS
ClientAuth.
SMTPRelay
PatchRepo
Checksfor Updates
RecipesPatches
End UsersService Provider
Cloud Admin
Figure 2 VMware SDDC on IBM Cloud System Context
2.1 Actors
The actors that interface with the design are described in the following table. There is not a direct
correlation between actors and persons. An actor role may be performed by one or more persons.
Alternately, one person may perform more than one actor role.
Table 1 VMware SDDC on IBM Cloud Interfaced Actors
Actor Description
Cloud Admin The cloud admin or administrator is responsible
for maintaining the cloud services.
This includes,
Assigning virtual resources to groups
Maintaining Cloud Software Platform
System Administrator roles
Service Provider Manages the cloud services that are provided to
the client users.
This includes,
Service Catalog configuration
Defining roles
Defining groups
Configuring user access
Tenant Administrator roles
Copyright IBM and VMware Page 11 of 133
User Consumes the services that the cloud admin
allows access to. This typically includes,
Provisioning VMs
De-provisioning VMs
Provisioning Patterns
De-provisioning Patterns
Start / Stop / restart VMs and Patterns
2.2 Systems
The systems that interface with the design are described in the following table.
Table 2 VMware SDDC on IBM Cloud Interfaced Systems
System Description
SoftLayer SoftLayer provides the bare metal, physical
networking and NFS storage in addition to the
automation to build the design when ordered.
Client On Premises vSphere The design is able to connect to an existing
vSphere environment on a client premises to
enable Hybrid capabilities.
Client SMTP Relay The design connects its SMTP server to a client’s
SMTP relay service to provide notifications on
aspects such as the process orchestration.
Client Authentication The design is able to connect to an existing client
authentication system to establish a trust
relationship which extends the client’s
authentication system into the cloud for use by the
cloud management platform.
Client DNS The design is able to connect to a client’s domain
name service (DNS) to extend the domain service
into the cloud for use by the cloud management
platform.
NTP Service The design requires an external NTP service to
provide time synchronization services for use by
the cloud management platform.
Patch Repo There are a number of internet based patch
repositories that the cloud management platform
applications need to connect to in order to
maintain the security and stability of the cloud
environment.
Page 12 of 133 Copyright IBM and VMware
3 Architecture Overview
VMware SDDC on IBM Cloud provides VMware automation technology on SoftLayer. This includes
virtual networking, virtual storage, process orchestration, infrastructure orchestration and software
orchestration. It also provides the tools for management of the services providing these functions. The
architecture consists of at least one central cloud region built on SoftLayer, which provides the main portal
for users and administration, plus it can include one or more cloud regions which are managed by the
central cloud and provide additional functionality for remote locations. The architecture is scaled out within
a region, or by adding regions.
Figure 3 VMware SDDC on IBM Cloud Architecture Overview
3.1 Physical Infrastructure
The physical infrastructure consists of three main components, physical compute, physical network and
physical storage. The physical compute provides the physical processing and memory that is used by the
virtualization infrastructure. The physical network provides the network connectivity into the environment
that is then consumed by the network virtualization. The physical storage provides the raw storage capacity
Copyright IBM and VMware Page 13 of 133
consumed by the virtualization infrastructure. For this design the physical infrastructure components are
provided by SoftLayer bare metal and all components are supported on the VMware Hardware
Compatibility Guide (HCG).
3.2 Virtual Infrastructure
The physical infrastructure is consumed by the virtual infrastructure. The virtual infrastructure reflects the
physical with three different components, compute virtualization, storage virtualization and network
virtualization. Each of these interface with the respective component in the physical infrastructure. The
virtual infrastructure is installed on each physical device to form a node, for example a compute node. All
the virtual resources interface to the virtual infrastructure for access to the physical infrastructure. The
virtual infrastructure is accessed by either the cloud admin or the infrastructure management component..
3.3 Infrastructure Management
Infrastructure management provides the logic to ensure the maximum benefit derived for the virtual
infrastructure. This includes functions such as pooling virtual infrastructure and moving virtual resources
off a node for maintenance or in the case of node failure. It controls placement of virtual resources on
nodes to balance load along with placement based on business rules. It is only accessed by the cloud admin.
All other access to this component is through API access from other components.
3.4 Common Services
Common services provides the services which are consumed by the other cloud management services. It
includes, identity and access services, SMTP services, NTP services, domain name services and certificate
authority. This component is also the primary interface to external systems. Common services can connect
to the client’s DNS for requests outside the domain managed by the cloud services. It connects to the
external NTP service to synchronize it’s NTP service with an outside stratum. A trust relationship can be
established between the common services and the client’s authentication service for common authentication
to the cloud services.
3.5 Cloud Management Services
The cloud management services provide the primary interface to the users to consume cloud services in
addition to the orchestration engines to process the service requests. The self-service portal is used as the
primary interface to view the available cloud services (the service catalog) as well as to obtain a view of
existing cloud resources that are deployed. The service catalog is the list of available services that are
managed by the service provider. The service provider is able to determine which services are available to
specific users or groups. The process orchestration engine controls the steps required to perform a service.
This includes actions such as, obtaining an approval or connecting to an operational service system as part
of the process. The process orchestration engine calls the infrastructure orchestration engine to orchestrate
the build of the virtual resources for a service. The software orchestration engine builds the software that
runs on the virtual resources.
3.6 Operational Services
Operational services provide monitoring, patching, log consolidation, log analysis, disaster recovery and
backup services for the cloud management platform. The monitoring looks for issues with the cloud
management platform and notifies cloud admin via alerts on the operations console as well as emails via
the external SMTP relay. The patching connects to the external patch repository in order to obtain update
information in support of the security or stability of the cloud management platform. Log consolidation
collects the logs from the cloud management platform into a central repository which the log analysis
service then operates on to provide the cloud admin with diagnostic information. The backup service keeps
copies of the cloud management platform outside of the virtual infrastructure so it can be restored in the
event of failure or corruption. The disaster recovery service needs at least one cloud region separate to the
Page 14 of 133 Copyright IBM and VMware
central cloud to which the cloud management platform are replicated to. In the event of failure at the
primary site, the cloud management platform is restarted at the cloud region.
3.7 Business Services
The business services component provides the service provider with analytics on IT financials, business
management and benchmarking aspects of the cloud. The IT financials provides the service provider with
details of the total cost of cloud ownership. The business management functions provide metering and
chargeback capabilities for the service provider by user or group. The benchmarking functions provide the
ability for service providers to analyze where the IT spend for the cloud is going, where it could be in the
future and paths that need to be taken to improve.
Copyright IBM and VMware Page 15 of 133
4 Logical Operational Model
The logical operational model provides guidance as to the design elements required to meet the functional
requirements.
4.1 Logical Operational Model Structure
The design consists of two distinct elements. A central cloud, through which the user and service provider
manage the entire cloud, and optionally, one or more associated cloud regions. Only the central cloud
contains the self-service portal. Additional regions (cloud regions) are added to provide remote sites, or
additional capacity beyond that of a single central cloud within the same site. Each cloud region is
configured into the central cloud for management. On premises vSphere environments are connected to
SoftLayer via either a VPN connection over the internet or dedicated links to form additional cloud regions.
The design of on premises vSphere environments is outside the scope of this document.
Figure 4 Logical Structure View
Page 16 of 133 Copyright IBM and VMware
Within a central cloud, the components interact with each other as follows:
Figure 5 Component Interaction Diagram
Both the central cloud and any additional cloud regions are built on SoftLayer.
Copyright IBM and VMware Page 17 of 133
4.2 Central Cloud
The central cloud hosts the primary portal through which users access the cloud services. It has connections
to all remote regions.
The functions in a central cloud map to the following software products described in more detail in this
section.
Physical Infrastructure
Cloud ManagementServices
Common Services Operational Services
Virtual Infrastructure
Business Services
Infrastructure Management
Compute Management Storage Management
Software Orchestration
Process Orchestration
Network Management
Physical StoragePhysical Compute Physical Network
Storage VirtualisationCompute Virtualisation Network Virtualisation
Service Catalog
Self Service Portal
IT Benchmarking
Infrastructure Orchestration
Domain Name Services
Identity & Access Services
NTP Services
SMTP Services
Certificate Authority Services
vRLI
Log Consolidation & Analysis
vRO
PS
Monitoring
vUM
Patching
SRM Disaster
Recovery
vDP Backup &
Restore
IT FinancialsBusiness Management
Mic
roso
ft A
ctiv
e D
irec
tory
+ s
ervi
ces
VM
war
e v
Rea
lize
Suit
e
vCenter
ESXi VSAN NSX
SoftLayer
vRealize Business
Figure 6 Logical Operational Model
4.3 Physical Infrastructure
The physical Infrastructure is broken up into compute, storage and network. The compute and storage areas
are combined in the cluster architecture. The network area is described in the network transport section.
4.3.1 Cluster Architecture
This design splits the physical layer into clusters. A cluster represents the aggregate of the compute and
memory resources of all the hosts in the cluster. All hosts in the cluster share network and storage
resources. The use of clusters allows workloads (user or management) to be placed onto specific hardware.
Page 18 of 133 Copyright IBM and VMware
Each cluster is managed as a single entity, so user workloads can be managed separately from management
workloads.
The design differentiates between the following types of clusters:
Compute cluster (one or more)
Management cluster
Edge cluster
Storage cluster
Figure 7 Logical Cluster Structure
4.3.1.1.1 Compute Clusters
Compute clusters host the VMware SDDC on IBM Cloud users’ virtual machines (sometimes
referred to as workloads or payloads). Each compute cluster is built using SoftLayer bare metal.
The environment is scaled by adding nodes to the initial compute cluster up the maximum number
of nodes per cluster (refer to the physical operational model for details). Once the maximum has
been reached, additional compute clusters are added to the environment.
4.3.1.1.2 Management Cluster
The management cluster houses the virtual machines that manage the cloud. Like the compute
clusters, the management cluster is built using SoftLayer bare metal. These servers host vCenter
Server, NSX Manager, NSX Controller, vRealize Operations Management, vRealize Log Insight,
vRealize Automation, and other shared management components.
4.3.1.1.3 Edge Cluster
Edge clusters connect the virtual networks (overlay networks) provided by NSX for vSphere and
the external networks. This includes both north-south (into the environment from outside) and
east-west (between management and compute clusters) communications.
Edge clusters provide the following main functions:
Copyright IBM and VMware Page 19 of 133
Support on-ramp and off-ramp connectivity to physical networks
Connect to client on premises environments
4.3.1.1.4 Storage Cluster
A storage cluster provides network-accessible storage via NFS. This is used for backup and log
archive purposes. The compute, management and edge clusters utilize Virtual Storage Area
Network (VSAN) which aggregates disks located in each node of the clusters.
4.3.2 Physical Network
The physical and layer 2 networking is handled by SoftLayer. The SoftLayer physical fabric provides a
robust IP transport layer with the following characteristics:
Simplicity
Scalability
High bandwidth
Fault-tolerant transport
4.3.2.1.1 Simplicity
The network infrastructure at SoftLayer is simplified and standardized on three physical networks
containing public, private, and out of band management (IPMI) traffic. Both the private and public
networks are deployed to utilize up to 20Gbps bandwidth per physical host. The out of band
management network is connected with a 1Gbps link per host.
Upon ordering infrastructure components within SoftLayer, VLANs are provisioned for each of
the three networks mentioned. If existing VLANs exist within an environment and there is enough
space for a bare metal device to be placed in the same pod, SoftLayer will automatically assign the
new device an IP on the same VLAN. The design incorporates SoftLayer portable subnets to
provide IP addressing for virtual machines as well as IP addresses for the bare metal hosts.
SoftLayer has also standardized the networking infrastructure using best-of-breed networking
vendors. As a result, SoftLayer is able to implement and reuse automation patterns to setup,
configure, and monitor the network infrastructure. Some of this automation has been exposed via
API and is used by this design to simplify management tasks.
4.3.2.1.2 Scalability
The SoftLayer network is designed in a multi-tier model. Each rack in a SoftLayer datacenter
contains 2 frontend customer switches (FCS) and 2 backend customer switches (BCS) connected
to the public and private networks, respectively. These customer switches then connect to separate,
peered aggregation switches; the aggregated switches are then attached to a pair of separate routers
for L3 networking. This multi-tier design allows the network to scale across racks, rows, and pods
within the SoftLayer datacenter.
4.3.2.1.3 High Bandwidth
Every upstream network port in the SoftLayer datacenter has multiple 10Gbps or 40Gbps
connections. Every rack is terminated with multiple 10Gbps or 40Gbps connections to the public
Internet and multiple 10Gbps or 40Gbps connections to the private network.
4.3.2.1.4 Fault-tolerant transport
Redundancy is provided at the server level using 2x10Gbps NICs. Additionally, the backend
server, frontend server, aggregation switches and routers are redundantly connected.
4.3.3 Physical Storage
There are two types of storage used within this design: VSAN and NFS.
Copyright IBM and VMware Page 21 of 133
operating system and any applications running on it could be impacted. By using vSphere
virtualization, virtual machines can be re-started on remaining hosts in a cluster in the event of the
catastrophic failure of a host. This can also be used by the cloud admin to take a host offline for
maintenance without affecting workloads on the cluster.
4.4.1.4 Performance
The amount of physical resources available to virtual machines can be controlled through vCenter
resource pools. This allows for different resource pools which can have higher or lower priority of
physical resources based on share allocation.
4.4.1.5 Workload Movement
By linking a central cloud with one or more cloud regions, vCenter Server allows a virtual
machine to be migrated to a remote cloud region or from cloud region to cloud region. This is also
possible from an on premises installation attached to the same environment. This enables
workloads to be migrated from client premises into the cloud and back again..
4.4.2 Storage Virtualization
Storage virtualization provides two levels of virtualization. The first is the virtualization of the storage
arrays and the second is the virtualization of the block storage used by virtual machines.
4.4.2.1 Virtual Storage Area Network (VSAN)
Virtual Storage Area Networking (VSAN) emulates a physical storage area network entirely
within the virtualization layer. Each host in the cluster contains local drives that are combined in
software to behave as a single disk array that is shared between all the hosts in the cluster as a
shared datastore.
Since there is no physical storage area network, VSAN has the advantage of fewer components
(no external drive array, fiber cabling, etc.). It allows ease of scaling when adding new compute
nodes with less administration like performing tasks such as, LUN allocation which are no longer
necessary. Plus, VSAN provides high performance since local disk is used and disk I/O is spread
out across all hosts within a cluster.
Storage policies are used to define storage attributes such as performance and protection levels.
The policy is set per virtual machine allowing great flexibility with the service levels available.
4.4.2.2 Virtual Machine Disks (VMDK)
Each virtual machine has at least one virtual machine disk (VMDK). Additional disks can be
added to a virtual machine. The virtual disks are provisioned on to the datastores provided by
VSAN. All virtual disks are thin provisioned, so unused disk space within a single virtual disk
does not take up datastore disk capacity.
4.4.3 Network Virtualization
Network virtualization provides a network overlay that exist within the virtual layer. As a result, it can
provide more rapid provisioning, deployment, re-configuration and destruction over physical devices.
Page 22 of 133 Copyright IBM and VMware
4.4.3.1 Network Virtualization Components
The network virtualization architecture of this design utilizes VMware NSX for vSphere and
vSphere Distributed Switches (vDS). The virtualized network is organized hierarchically, with the
following components from bottom to top:
Data plane with the NSX vSwitch and additional components
Control plane with the NSX Controller
Management plane with the NSX Manager
Consumption plane with a Cloud management portal
Figure 8 Network Virtualization
4.4.3.1.1 Distributed Virtual Switches
This design implements vSphere distributed switches. vSphere Distributed Switch (vDS) offers
several enhancements over standard virtual switches.
Centralized management. Because distributed switches are created and managed
centrally on a vCenter Server system, they make the switch configuration more consistent
across ESXi hosts. Centralized management saves time, reduces mistakes, and lowers
operational costs.
Additional features. Distributed switches offer features that are not available on
standard virtual switches. Some of these features can be useful to the applications and
services that are running in the organization’s infrastructure. For example, NetFlow and
port mirroring provide monitoring and troubleshooting capabilities to the virtual
infrastructure.
Copyright IBM and VMware Page 23 of 133
Distributed virtual switch implements health checks. The health check service helps identify and
troubleshoot configuration errors in vSphere distributed switches.
Health check helps identify the following common configuration errors:
Mismatched VLAN trunks between a vSphere distributed switch and physical switch.
Mismatched MTU settings between physical network adapters, distributed switches, and
physical switch ports.
Mismatched virtual switch teaming policies for the physical switch port-channel settings.
Health check monitors VLAN, MTU, and teaming policies:
VLANs. Checks whether the VLAN settings on the distributed switch match the trunk
port configuration on the connected physical switch ports.
MTU. For each VLAN, checks whether the physical access switch port MTU jumbo
frame setting matches the distributed switch MTU setting.
Teaming policies. Checks whether the connected access ports of the physical switch that
participate in an EtherChannel are paired with distributed ports whose teaming policy is
IP hash.
Health check is limited to the access switch port to which the distributed switch uplink connects.
With network I/O control, the distributed switch allocates bandwidth for the following system
traffic types:
vSphere vMotion traffic
Management traffic
VMware vSphere Replication traffic
NFS traffic
VMware Virtual SAN traffic
vSphere Data Protection backup traffic
Virtual machine traffic
Fault tolerance traffic
iSCSI traffic
Network I/O control details
The bandwidth for each network resource pool is controlled by setting the physical adapter shares
and host limits. The bandwidth for virtual machines is controlled by bandwidth reservation for an
individual VM, similar to the way memory and CPU reservation is used.
The physical adapter shares assigned to a network resource pool determine the share of the total
available bandwidth guaranteed to the traffic that is associated with that network resource
pool. The share of transmit bandwidth that is available to a network resource pool is determined
by these factors:
The network resource pool's shares.
Other network resource pools that are actively transmitting.
4.4.3.1.2 Data Plane
The NSX data plane consists of the NSX vSwitch, which is based on the vSphere Distributed
Switch (vDS) and includes additional components. These components include kernel modules
(VIBs), which run within the ESXi kernel and provide services such as virtual distributed
router (VDR) and distributed firewall (DFW). The NSX kernel modules also enable Virtual
Extensible LAN (VXLAN) capabilities.
The NSX vSwitch abstracts the physical network and provides access-level switching in the
hypervisor. It is central to network virtualization because it enables logical networks that
Page 24 of 133 Copyright IBM and VMware
are independent of physical constructs such as VLAN. The NSX vSwitch provides multiple
benefits.
Three types of overlay networking capabilities:
Creation of a flexible logical Layer 2 overlay over existing IP networks on
existing physical infrastructure.
Support for east/west and north/south communication while maintaining
isolation between tenants.
Support for application workloads and virtual machines that operate as if they
were connected to a physical Layer 2 network.
Support for VXLAN and centralized network configuration.
A comprehensive toolkit for traffic management, monitoring and troubleshooting within a
virtual network which includes port mirroring, NetFlow/IPFIX, configuration backup and
restore, network health check, Quality of Service (QoS), and Link Aggregation Control
Protocol (LACP)
In addition to the NSX vSwitch, the data plane also includes gateway devices (NSX Edge
gateways), which can provide Layer 2 bridging from the logical networking space (VXLAN) to
the physical network (VLAN). NSX Edge Gateway devices offer Layer 2, Layer 3, perimeter
firewall, load-balancing and other services such as Secure Socket Layer (SSL), Virtual Private
Network (VPN), and Dynamic Host Control Protocol (DHCP).
4.4.3.1.3 Control Plane
The NSX control plane runs in the NSX Controller, which enables unicast VXLAN and control-
plane programming of elements such as VDR (virtual distributed router). Unicast support is
necessary as the multicast IP range per each VLAN is limited within SoftLayer. The number of
multicast or unicast IPs determines the number of VXLANs that can be provisioned.
In all cases the controller is part of the control plane and does not have any data plane
traffic passing through it. The controller nodes are deployed in a cluster per NSX Manager to
enable high availability and scalability. A failure of one or all controller nodes does not impact
data plane traffic.
4.4.3.1.4 Management Plane
The NSX management plane consists of the NSX Manager, which is the single point
of configuration, and the REST API entry-points. NSX Manager integrates with vCenter. There is
one NSX Manager per vCenter Server.
4.4.3.1.5 Consumption Plane
Different actors interact with NSX for vSphere to access and manage the associated services in
different ways:
Cloud admin can manage the NSX environment from the vSphere Web Client.
Users can consume the network virtualization capabilities of NSX for vSphere through the
CMP (vRealize Automation) UI when deploying applications.
4.4.3.2 Network Virtualization Services
Network virtualization services include logical switches, logical routers, logical firewall, and other
components of NSX for vSphere.
Copyright IBM and VMware Page 25 of 133
4.4.3.2.1 Logical Switches
Cloud deployments have a variety of applications that are used across multiple tenants. These
applications and tenants require isolation from each other for security, fault isolation, and
overlapping IP addresses. The NSX for vSphere logical switch creates logical broadcast domains
or segments to which an application or tenant virtual machine can be logically wired. This allows
for flexibility and speed of deployment while still providing all the characteristics of a physical
network's broadcast domains (VLANs) without physical Layer 2 sprawl or spanning tree issues.
A logical switch is distributed and can span arbitrarily large compute clusters. This allows for
virtual machine mobility (migration with vMotion) within a region and between regions, without
limitations of the physical Layer 2 (VLAN) boundary.
4.4.3.2.2 Logical Routers
Dynamic routing provides the necessary forwarding information between Layer 2 broadcast
domains, thereby allowing the cloud admin to decrease the size of Layer 2 broadcast domains and
improve network efficiency and scale. NSX for vSphere extends this intelligence to where the
workloads reside for east/west routing. This allows more direct VM-to-VM communication
without the costly need to extend hops. At the same time, logical routers provide north/south
connectivity, thereby enabling users to access public networks.
4.4.3.2.3 Logical Firewall
NSX for vSphere Logical Firewall provides security mechanisms for dynamic virtual datacenters.
The Distributed Firewall component of Logical Firewall allows a cloud admin to segment
virtual datacenter entities like virtual machines based on VM names and attributes, user
identity, vCenter objects like datacenters, and hosts, or based on traditional networking
attributes like IP addresses, port groups, and so on.
The Edge Firewall component helps a cloud admin to meet key perimeter security
requirements, such as building DMZs based on IP/VLAN constructs, tenant-to-tenant
isolation in multi-tenant virtual datacenters, Network Address Translation (NAT), partner
(extranet) VPNs, and user-based SSL VPNs.
The Flow Monitoring feature displays network activity between virtual machines at the
application protocol level. The cloud admin can use this information to audit network traffic,
define and refine firewall policies, and identify threats to a client’s network.
4.4.3.2.4 Logical Virtual Private Networks (VPN’s)
SSL VPN-Plus allows remote users to access private corporate applications. IPSec VPN offers
site-to-site connectivity between an NSX Edge instance and remote sites. L2 VPN allows users to
extend their datacenter by allowing virtual machines to retain network identity across geographical
boundaries.
4.4.3.2.5 Logical Load Balancers
The NSX Edge load balancer enables network traffic to follow multiple paths to a specific
destination. It distributes incoming service requests evenly among multiple servers in such a way
that the load distribution is transparent to users. Load balancing thus helps in achieving optimal
resource utilization, maximizing throughput, minimizing response time, and avoiding overload.
NSX Edge provides load balancing up to Layer 7.
4.4.3.2.6 Service Composer
Service Composer helps provision and assign network and security services to applications in a
virtual infrastructure. The service provider maps these services to a security group, and the
services are applied to the virtual machines in the security group.
Page 26 of 133 Copyright IBM and VMware
4.5 Infrastructure Management
The infrastructure management element manages the compute, network and storage virtual resources
provided by the lower layer. It also provides consolidation services to the upper layers for operational
services. These functions are provided by VMware vCenter Server.
4.5.1 Compute Management
In this design, VMware vCenter is employed to centralize the management of the compute resources within
each ESXi host. While the ESXi hosts can be managed individually, placing them under vCenter control
enables the following capabilities:
Centralized control and visibility of all aspects within managed ESXi hosts and virtual machines.
Provides the single pane of glass interface view via the vCenter web client for compute, network
and storage management.
Proactive Optimization. Enables allocation and optimization of resources for maximum efficiency
across the ESX hosts. See section 6.4.1 Compute Virtualization for optimization features enabled
by vCenter Server.
Extended management function for other integrated products and services such as VMware NSX,
VMware Data Protection, VMware Update Manager and others as “snap-ins” extending the
vCenter web interface.
Monitoring, alerting, scheduling. Cloud admins can view events, alerts within the vCenter web
client, and configure scheduled actions.
Automation engine. VMware vCenter is the engine which performs the tasks given to it via the
vSphere API web interface. VMware vRealize Automation and vRealize Orchestration are
examples of applications that drive vCenter actions via the API.
4.5.2 Storage Management
VMware vCenter enables centralized storage management within this design which allows for
configuration and management of the following storage types:
Local disk storage. Local hard disk drives (HDD) or solid state drives (SDD) that are attached to
the local ESXi hosts.
Storage area attached storage (SAN). Remote block storage that is attached to the ESXi host via
fiber channel or TCPIP protocols.
Network attached storage. (NAS) File based storage that is attached to the ESXi hosts via the NFS
protocol.
Virtual SAN Storage. Configured within the cluster object in vCenter, enables the aggregation of
local disk storage across ESXi hosts into a shared pool of storage across all ESX hosts within a
given cluster. Once configured, an outage of the vCenter server does not affect the availability of
VSAN storage to the cluster.
Within this design vCenter management of storage is primarily focused on NAS and VSAN storage as
SAN is not employed. Only the ESXi host OS and swap space are used on local non VSAN disk storage.
4.5.2.1 NFS Storage management
vCenter is responsible for configuring the mounting of NFS data stores to each ESXi host within a
cluster. This ensures its access availability to any virtual machine with virtual disk files (VMDK)
Copyright IBM and VMware Page 27 of 133
residing on the NFS based data store, should a vMotion of the virtual machine from one ESXi host
to another occur within a cluster.
4.5.2.2 VSAN Storage management
Here again, the vCenter interface or web API has the capability of configuring VSAN data stores
for a particular cluster at the cluster object level within the vCenter interface. Configuring VSAN
within vCenter involves the following areas of configuration:
Licensing. Prior to enabling VSAN, a valid license within vCenter licensing section is
required.
VSAN network. Used to configure the network VSAN will use for its back plane
network. Virtual machines are made storage fault tolerant across ESXi hosts local disks
on this network).
Disk group configuration. On each ESXi host that contributes its local disks to a Virtual
SAN cluster, disks are organized into disk groups. A disk group is a main unit of storage
on a host. Each disk group includes one SSD and one or multiple HDDs.
VSAN Policies. Storage policies define the virtual machine storage characteristics.
Storage characteristics specify different levels of service for different virtual machines.
4.5.3 Network Management
vCenter Server is used to create standard and distributed virtual switches. The virtual switches connect
virtual machine (VM) network interfaces to portgroups allowing for communication between VM’s hosted
on the same host or different hosts. To establish communication between hosts, virtual switches need to be
connected to physical uplinks which are the network interfaces of the ESXi hosts. VM’s connected on the
same virtual switch and hosted on the same host can communicate directly without the need of an external
uplink.
vCenter Server enables creation of distributed portgroups for virtual machines (aggregated virtual ports
with a particular set of specifications).
4.6 Common Services
Common services provide the services used by other services in the cloud management platform. This
includes identity and access services, domain name services, NTP services, SMTP services and Certificate
Authority Services.
4.6.1 Identity and Access Services
In this design, Microsoft (MS) Active Directory (AD) is employed to provide authentication and directory
services back end to the VMware Platform Service Controller (PSC) and VMware identity appliance.
Within this design the VMware software components authenticate against the identity appliance which in
turn authenticates against the MS AD service. The AD in this design can be extended to other regions by
adding an additional AD server for that particular region’s subdomain.
Page 28 of 133 Copyright IBM and VMware
4.6.2 Domain Name Services
Domain Name Services (DNS) within this design are for the cloud management and infrastructure
components only. DNS in this design serves as host name to IP resolution for the cloud management
platform and service resolution for the AD components. When an instance of this design is tied to a
customer on premises solution, this design’s DNS servers are referenced by the on premises DNS
infrastructure in addition to acting as a proxy for the customer’s DNS infrastructure.
4.6.3 NTP Services
This design’s NTP servers are a sub stratum of the SoftLayer infrastructure NTP server. They serve to
ensure that all components are in time synchronization for the needs of authentication, replication,
clustering, log synchronization and certificate services. This includes physical and virtual components.
4.6.4 SMTP Services
Simple Mail Transfer Protocol (SMTP), is utilized within this design by various components for the needs
of outbound notification only. For inbound email requirements (vRealize Automation, vRealize Business),
the customer email servers need to be configured.
4.6.5 Certificate Authority Services
An Enterprise Certificate Authority (CA) based on Microsoft (MS) CA services built into MS Windows is
employed in this solution to replace self-signed certificates for web interfaces within this design.
4.7 Cloud Management Services
The cloud management services provide the service catalog, self-service portal and orchestration. This is
provided by VMware vRealize Automation, vRealize Orchestrator and Rapid Deployment Services (RDS)
pattern automation.
4.7.1 Service Catalog
The service catalog is published through the self service catalog and allows users to request the provided
services which can include provisioning new virtual machines from templates, provisioning new
environments consisting of one or more virtual machines with software products as blueprints (also known
as patterns), or managing existing deployed resources. Advanced services are also available through the
service catalog by calling the orchestration component for process orchestration.
The service provider role is able to customize the services available to users as well as publish additional
services.
4.7.2 Self-Service Portal
The self service portal provides a single point of access for users to the VMware SDDC on IBM Cloud
solution. Authentication to the portal is performed against the Active Directory service.
Copyright IBM and VMware Page 29 of 133
4.7.3 Infrastructure and Process Orchestration
Orchestration is provided by vRealize Orchestrator. It allows for tasks and remediation actions to be
automated including integration with third party IT operations software.
vRealize Orchestrator consists of:
Workflow designer which incorporates an easy-to-use drag and drop interface to assemble
workflows. The designer runs on Windows, Linux and Mac OS desktops.
Scripting designer which allows for new building blocks to be created or imported for the vRealize
Orchestrator platform.
Orchestration engine which runs the workflows and associated scripts.
The default implementation includes a built-in workflow library with common tasks. Workflows are able to
be versioned and packaged to assist with change management.
4.7.4 Software Orchestration
Software Orchestration is provided by a Rapid Deployment Services (RDS) solution with IBM Open
Patterns. RDS implements a distributed file repository and the configuration management tools to deliver
IBM Open Patterns on deployed workloads. IBM Open Patterns describe the pre-defined architecture of
an application. For each component of the application (i.e. database, web server, etc.), the pattern defines:
Pre-installation on an operating system Pre-integration across components Pre-configured & tuned Pre-configured Monitoring Pre-configured Security Lifecycle Management
4.8 Operational Services
Operational services provide management of the cloud services. This includes backup & restore, disaster
recovery, monitoring, log consolidation & analysis and patching functions.
4.8.1 Backup and Restore
The data protection service protects the infrastructure that provides the virtualization, operations, security
and cloud services. It does not protect any deployed user virtual machines.
Data protection solutions provide the following functions in the design:
Back up and restore virtual machines and database applications.
Store data according to company retention policies.
Page 30 of 133 Copyright IBM and VMware
Inform administrators about backup and restore activities through reports.
vSphere Data Protection provides the data protection service in each region. This is separate to disaster
recovery and applies even if only the central cloud exists.
An FTP server is used to backup NSX Manager. The FTP server supports SFTP and FTP protocols.
Figure 9 Dual-Region Data Protection Architecture
4.8.2 Disaster Recovery
The disaster recovery service adds to the data protection service by protecting the management services in
the case of a complete site failure. It is an optional service to provide additional protection.
Since this requires more than one site, it is only applicable where a central cloud and at least one cloud
region has been included.
VMware Site Recovery Manager (SRM) and vSphere Replication are used to provide this service, together
with keeping the same IP addressing of the cloud management services at both sites.
Note: Each central cloud or cloud region in this design is equivalent to the site construct in Site Recovery
Manager.
Since the central cloud contains the portal and manages the services in all the regions, the following
applications are in scope of disaster recovery protection:
Copyright IBM and VMware Page 31 of 133
vRealize Automation together with VMware vRealize Orchestrator
Analytics cluster of vRealize Operations Manager
The services that support the services at each site do not require disaster recovery protection. This includes;
vSphere, NSX and vCenter services which manage the services at the local site only.
Authentication, DNS and NTP which is distributed to the cloud regions anyway.
vRealize Log Insight and Software Orchestration which is replicated to all cloud regions
Figure 10 Disaster Recovery Architecture
Page 32 of 133 Copyright IBM and VMware
4.8.3 Monitoring
vRealize Operations Manager is used to track and analyze the operation of multiple data sources within the
design by using specialized analytics algorithms. These algorithms help vRealize Operations Manager to
learn and predicts the behavior of every object it monitors. Users access this information by using views,
reports, and dashboards.
vRealize Operations Manager contains functional elements that collaborate for data analysis and storage,
and support creating clusters of nodes with different roles.
Figure 11 vRealize Operations Manager Architecture
For high availability and scalability, several vRealize Operations Manager instances are deployed in the
management cluster where they have the following roles:
Master Node. Required initial node in the cluster. In large-scale environments the master node
manages all other nodes. In small-scale environments, the master node is the single standalone
vRealize Operations Manager node.
Master Replica Node. Enables high availability of the master node.
Data Node. Enables scale-out of vRealize Operations Manager in larger environments. Data
nodes have adapters installed to perform collection and analysis. Data nodes also host vRealize
Operations Manager management packs.
Remote Collector Node. Enables navigation through firewalls, interfaces with a remote data
source, reduces bandwidth across regions, or reduces the load on the vRealize Operations
Manager analytics cluster. Remote collector nodes only gather objects for the inventory and
forward collected data to the data nodes. Remote collector nodes do not store data or perform
analysis. In addition, they can be installed on a different operating system than the rest of the
cluster nodes.
The master and master replica nodes are data nodes with extended capabilities.
Copyright IBM and VMware Page 33 of 133
vRealize Operations Manager forms two types of cluster according to the nodes that participate in a cluster:
Analytics clusters. Tracks, analyzes, and predicts the operation of monitored systems. Consists of
a master node, data nodes, and master replica node.
Remote collectors cluster. Only collects diagnostics data without storage or analysis. Consists
only of remote collector nodes.
The functional components of a vRealize Operations Manager instance interact to provide analysis
of diagnostics data from the datacenter and visualize the result in the Web user interface.
Table 3 vRealize Operations Manager Logical Node Architecture
Architecture Component Diagram Description
Admin / Product UI server. The UI server is a Web
application that serves as both user and administration
interface.
REST API / Collector. The Collector collects data
from all components in the datacenter.
Controller. The Controller handles the data flow the
UI server, Collector, and the analytics engine.
Analytics. The Analytics engine creates all
associations and correlations between various data sets,
handles all super metric calculations, performs all
capacity planning functions, and is responsible for
triggering alerts.
Persistence. The persistence layer handles the read and
write operations on the underlying databases across all
nodes.
FSDB. The File System Database (FSDB) stores
collected metrics in raw format. FSDB is available in
all the nodes.
xDB (HIS). The xDB stores data from the Historical
Inventory Service (HIS). This component is available
only on the master and master replica nodes.
Global xDB. The Global xDB stores user preferences,
alerts, and alarms, and customization that is related to
the vRealize Operations Manager. This component is
available only on the master and master replica nodes.
4.8.4 Log Consolidation and Analysis
Log consolidation and analysis provides consolidation of the logs that are produced by each of the cloud
services together with analysis of those logs. For this design, this function is provided by vRealize Log
Insight.
vRealize Log Insight provides real-time log management and log analysis with machine learning-based
intelligent grouping, high-performance searching, and troubleshooting across physical, virtual, and cloud
environments.
Page 34 of 133 Copyright IBM and VMware
vRealize Log Insight collects data from ESXi hosts using the syslog protocol. It connects to vCenter Server
to collect events, tasks, and alarms data, and integrates with vRealize Operations Manager to send
notification events and enable launch in context.
4.8.5 Patching
Patching of the VMware software components is achieved with VMware Update Manager. This includes
the VMware ESXi hosts, virtual appliances and management tooling in the design. It connects to the
internet to obtain the latest vulnerability patches and automatically applies user-defined patches to the
relevant components to eliminate the vulnerabilities.
4.9 Business Services
Business services are those services that provides business functions. This includes business management,
IT financials and IT benchmarking. vRealize Business (VRB) is configured to provide financial
information, reporting and modeling. VRB integrates with vRealize Automation.
4.9.1 Business Management
vRealize Business provides the following business management capabilities:
Automatic private cloud metering
costing and pricing
4.9.2 IT Financials
vRealize Business provides the following capability for financial management:
Automatic service catalog pricing (Integrated with vRealize Automation)
Private cloud consumption analysis
Out-of-the-box reporting (Exportable data set)
4.9.3 IT Benchmarking
Additionally, vRealize Business can assist in modeling cost projections across cloud environments.
Private cloud and public cloud cost comparison
Copyright IBM and VMware Page 35 of 133
4.10 Cloud Region
The cloud region is a child instance of the design. It is not standalone and requires a central cloud to
provide the cloud management services. Provisioning and management of virtual resources is done through
the central cloud.
The cloud management services do not exist for a cloud region since this is handled by the central cloud
and the operational management services which contain collectors/relays to pass information back to the
central cloud.
Page 36 of 133 Copyright IBM and VMware
5 Physical Operational Model The physical operational model elaborates by applying the non-functional requirements to the logical
operational model.
Figure 12 Physical Operational Model - Virtual Servers, Networking and Clusters
5.1 Physical Layer
5.1.1 Compute
The design leverages SoftLayer to provide the compute. This allows for flexibility in provisioning bare
metal. Compute nodes are able to be deployed rapidly without needing orders to be delivered and the same
nodes are able to be decommissioned without needing to wait for depreciation schedules or reselling.
SoftLayer offers a variety of bare metal Intel based hardware from 1U to 4U chassis sizes, 2GB of memory
up to 3TB and from 4 CPU cores up to 48 cores. For this design, a 2U server has been selected to allow for
the lowest cost of entry, while still allowing for scaling up to 10,000 deployed VMs in a single central
cloud or cloud region.
For security and to isolate management, network and user workloads (resources) this design branches out to
3 vSphere cluster types with the following functions:
Management Cluster
Edge Cluster
Compute Clusters
This allows the scaling of one function independent of the other as the deployment is scaled out, making for
a more effective use of resources.
Copyright IBM and VMware Page 37 of 133
5.1.1.1 Management Cluster
The management cluster is the heart of operation and control of the design. It is sized from onset
of deployment to allow for Compute cluster expansion and additional feature expansion without
requiring additional sever nodes.
It consists of 4 nodes of the following specification:
2 x 12 core CPUs (24 cores total), plus Hyperthreading
256GB RAM
~6.3TB Usable VSAN storage
~1TB Usable Local Disk for operating system
Refer to Appendix A – Bare Metal Summary for hardware details.
5.1.1.2 Edge Cluster
VMware NSX is an integral part of the design. “Edge” virtual appliances are used where VPN end
points or load balancing is required. Edge virtual appliances are controlled and can be dynamically
provisioned as user applications are “spun up” with patterns or are preconfigured to support
management functions. In either case they are deployed to the Edge cluster. This ensures that
network connectivity and performance are not affected by varying workloads in the other clusters.
It is sized from onset of deployment to allow for Compute cluster expansion without requiring
additional servers. As the Edge virtual appliances are small, VSAN storage requirements are held
to a minimum while maintaining redundancy and performance.
It consists of 4 nodes of the following specification:
2 x 6 core CPUs (12 cores total), plus Hyperthreading
128GB RAM
~ 3.2TB Usable VSAN storage
~ 1TB Usable Local Disk for operating system
Refer to Appendix A – Bare Metal Summary for hardware details.
5.1.1.3 Compute Cluster
As the users or administrators within the customer’s organization deploy whatever applications
they require via vRealize Automate, the compute workloads requested get deployed to this cluster.
With “scale up” as well as “out” in mind, high resource intensive applications and other mixed
workloads are able to be absorbed. Additional clusters are provisioned when the capacity of the
each cluster is reached.
It consists of 4 nodes of the following specification:
2 x 12 core CPUs (24 cores total), plus Hyperthreading
512GB RAM
~6.3TB Usable VSAN storage
~1TB Usable Local Disk for operating system
Refer to Appendix A – Bare Metal Summary for hardware details.
Each node supports 80 VMs of 2 vCPU, 8GB RAM and 70GB Disk. Taking into account the
following:
CPU over-commit 7
90% CPU usage limit on ESXi
Memory over-commit is 1.6
80% memory usage limit on ESXi
Page 38 of 133 Copyright IBM and VMware
No disk over-commit
Maximum of 48 nodes per cluster / Cluster and 3 clusters for 10,000 VM support.
Cluster sizing based upon 1 node failure per 15 nodes. So up to 15 nodes, a single node is reserved
in case of failure. Between 15 and 30 nodes, there are two nodes kept in reserve and so forth up to
45 nodes where 3 nodes are in reserve.
5.1.2 Storage
Network File System (NFS) is a file system protocol that allows a user on a client computer to access files
over a network much like local storage is accessed. In this case, the client computer is an ESXi host, and
the storage is provided by a NFS-capable external storage array.
The management cluster uses VMware Virtual SAN for primary storage and NFS for secondary storage.
For compute clusters, the decision on which technology to use is based on the performance, capacity, and
capabilities (replication, deduplication, compression, etc.) required by the workloads that are running in the
clusters.
For this design, additional storage for the ESXi management cluster is connected to SoftLayer File Storage
on Performance Storage via 10 Gbps links with Jumbo Frames (MTU 9000) enabled. For the virtual
machines that provide the backup service and log archiving (i.e., vSphere Data Protection and vRealize Log
Insight) the NFS datastores are configured in the following manner:
Table 4 NFS Configuration for vSphere Data Protection and vRealize Log Insight
Product Configuration
vSphere Data Protection vSphere Data Protection is I/O intensive and is placed on its own,
unique NFS datastore sized at 4TB
vRealize Log Insight vRealize Log Insight uses NFS datastores sized at 1TB for archive
storage and can be shared will other virtual machines.
5.1.3 Network
The design leverages SoftLayer physical networking and this is broken up into server connections, VLANs
and MTU packet sizing.
5.1.3.1 Server connections
Each compute node (physical server) within the design has two 10Gb Ethernet connection into
each SoftLayer Top of Rack (ToR) switch (public and private) setup as individual connections
(un-bonded) for a total of 4 x 10Gbps connections. This allows each networking interface card
(NIC) connection to work independent of one another.
Copyright IBM and VMware Page 39 of 133
Figure 13 Network connections per physical node
5.1.3.2 VLANs
Four VLANs are included per design including the public network. They are as listed for this
design:
Private VLAN 1 (default / untagged)
Private VLAN 2 (trunked / tagged)
Private VLAN 3 (trunked / tagged )
Public VLAN 4 (default / untagged)
5.1.3.3 MTU Sizing
The private network connections are configured to use jumbo frame MTU size of 9000. This is
maximum MTU allowed within VMware and SoftLayer limit which improves performance for
large data transfers such as storage and vMotion. The public network connections use a standard
Ethernet MTU frame size of 1500. This needs to be maintained as any changes may affect packet
fragmentation over the internet.
5.2 Virtual Infrastructure
Virtual infrastructure consists of compute, storage and network virtualization.
5.2.1 Compute Virtualization
This design uses VMware vSphere ESXi version 6.0 u1 to virtualize the management, compute, and edge
servers. The ESXi hypervisor is installed on the 2x1TB RAID-1 disk array contained on each server.
RAID-1 is used in this design to provide redundancy for the vSphere hypervisor.
5.2.2 Storage Virtualization
For this design, VMware Virtual Storage Area Network (VSAN) storage is employed for all storage needs
with the exception of vRealize Log Insight log archival storage and VMware Data Protection backup
storage. VSAN allows for the local storage across multiple ESXi hosts within a vSphere cluster to be
represented as a single virtual machine datastore. VSAN supports only SATA, SAS HDD, and PCIe
storage. Two 1TB SATA drives are excluded in each node regardless of which cluster it belongs to for the
purpose of housing the ESXi installation.
Page 40 of 133 Copyright IBM and VMware
Figure 14 VSAN concept
5.2.2.1 RAID Controller
As of this version of the design, only the Avago MegaRAID 9361-8i RAID controller within
SoftLayer hardware is supported by VMware VSAN. Disk caching need is disabled and the
controller is set in JBOD mode for all VSAN drives. Disks and disk groups
Depending on the cluster function (Management, Edge, Compute) the number and sizes of the
disks changes. Each VSAN disk group requires a solid state disk (SSD) for a cache layer. Within
all clusters, a 2U server type is employed with a total of 12 drive slots maximum capacity.
Excluding the ESXi OS drives, the following is the drive layout for each cluster type:
Table 5 VSAN disk table
Cluster type Number
of VSAN
disk
groups
Number
of SSDs
# of SSD
+ HDD
per disk
group
Size of
SSDs
Number
of HDDs
Size of
HDDs
Management 2 2 1+4 1,200GB 8 2,000GB
Edge 1 1 1+4 1,200GB 4 2,000GB
Compute 2 2 1+4 1,200GB 8 2,000GB
5.2.2.2 Virtual network setup
For this design the VSAN traffic will traverse between ESXi hosts on a dedicated private VLAN.
No other traffic will occupy the VSAN VLAN. The two network adapters attached to the private
network switch and configured in within vSphere as a virtual distributed switch (vDS) with both
network adapters as uplinks. A dedicated VSAN kernel port configured for the VSAN VLAN
resides within the vDS. Jumbo frame are enabled for the private vDS.
5.2.2.3 Virtual SAN Policy Design
Once VMware Virtual SAN is enabled and configured, create storage policies that define the
virtual machine storage characteristics. Storage characteristics specify different levels of service
Copyright IBM and VMware Page 41 of 133
for different virtual machines. The default storage policy tolerates a single failure and has a single
disk stripe. Use the default unless a client’s environment requires policies with non-default
behavior. If a custom policy is configured, Virtual SAN will guarantee it; however, if Virtual SAN
cannot guarantee a policy, it is not possible to provision a virtual machine that uses the policy
unless it is enabled to force provisioning. This design will use the default policy.
Table 6 VSAN policies
Capability Use Case Value Comments
Number of
failures to
tolerate
Redundancy Default
1
Max 3
A standard RAID 1 mirrored configuration that
provides redundancy for a virtual machine disk. The
higher the value, the more failures can be tolerated.
For n failures tolerated, n+1 copies of the disk are
created, and 2n+1 hosts contributing storage are
required.
A higher n value indicates that more replicas of
virtual machines are made, which can consume more
disk space than expected.
Number of disk
stripes per
object
Performance Default
1
Max 12
A standard RAID 0 stripe configuration used to
increase performance for a virtual machine disk.
This setting defines the number of HDDs on which
each replica of a storage object is striped.
If the value is higher than 1, increased performance
can result. However, an increase in system resource
usage might also result.
Flash read
cache
reservation (%)
Performance Default
0
Max
100%
Flash capacity reserved as read cache for the storage
is a percentage of the logical object size that will be
reserved for that object.
Only use this setting for workloads if a client must
address read performance issues. The downside of
this setting is that other objects cannot use a reserved
cache.
VMware recommends not using these reservations
unless it is absolutely necessary because unreserved
flash is shared fairly among all objects.
Object space
reservation (%)
Thick
provisioning
Default
0
Max
100%
The percentage of the storage object that will be thick
provisioned upon VM creation. The remainder of the
storage will be thin provisioned.
This setting is useful if a predictable amount of
storage will always be filled by an object, cutting
back on repeatable disk growth operations for all but
new or non-predictable storage use.
Force
provisioning
Override
policy
Default:
No
Force provisioning allows for provisioning to occur
even if the currently available cluster resources
cannot satisfy the current policy.
Force provisioning is useful in case of a planned
expansion of the Virtual SAN cluster, during which
provisioning of VMs must continue. Virtual SAN
Page 42 of 133 Copyright IBM and VMware
Capability Use Case Value Comments
automatically tries to bring the object into compliance
as resources become available.
By default, policies are configured based on application requirements. However, they are applied
differently depending on the object.
Table 7 VSAN object policy defaults
Object Policy Comments
Virtual machine
namespace
Failures-to-Tolerate: 1 Configurable. Changes are not recommended.
Swap Failures-to-Tolerate: 1 Configurable. Changes are not recommended.
Virtual disk(s) User-Configured
Storage Policy
Can be any storage policy configured on the
system.
Virtual disk
snapshot(s)
Uses virtual disk policy Same as virtual disk policy by default. Changes
are not recommended.
If a user-configured policy is not specified, the default system policy of 1 failure to tolerate and 1
disk stripe is used for virtual disk(s) and virtual disk snapshot(s). Policy defaults for the VM
namespace and swap are set statically and are not configurable to ensure appropriate protection for
these critical virtual machine components. Policies must be configured based on the application’s
business requirements. Policies give Virtual SAN its power because it can adjust how a disk
performs on the fly based on the policies configured.
5.2.3 Network Virtualization
This design uses vSphere virtual distributed switches and VMware NSX for vSphere to implement virtual
networking. By using NSX, this design implements software-defined networking.
5.2.3.1 Virtual Distributed Switch Design
The design uses a minimum number of switches. Clusters connected to public network is
configured with two distributed virtual switches. Clusters restricted to private network have only
one distributed virtual switch.
Separating different types of traffic is required to reduce contention and latency. Separate
networks are also required for access security. VLANs are used to segment physical network
functions. This design uses four (4) VLANs. Three (3) for private network traffic and one (1) for
public network traffic. Traffic separation is detailed in the section below.
Table 8 VLAN Mapping to Traffic Types
VLAN Traffic Type
VLAN1 ESXi Management, VXLAN (VTEP)
VLAN2 VSAN
VLAN3 vMotion, NFS, vSphere Replication
VLAN4 All – Internet access
Traffic from workloads will travel on VXLAN backed logical switches.
Copyright IBM and VMware Page 43 of 133
5.2.3.1.1 Management Cluster Distributed Switches
The management cluster uses a dual vSphere Distributed Switch with the configuration settings
shown in this section.
Table 9 Management Cluster Distributed Switch
vSphere
Distributed
Switch
Name
Function Network
I/O
Control
Load
Balancing
Mode
Number of
Physical
NIC Ports
MTU
vDS-Mgmt-
Priv ESXi management
Network IP Storage
(NFS)
Virtual SAN
vSphere vMotion
VXLAN Tunnel
Endpoint (VTEP)
vSphere
Replication/vSphere
Replication NFC
Enabled Based on
source
MAC hash
2 9,000
(Jumbo
Frame)
vDS-Mgmt-
Pub External
management traffic
(North-South)
Enabled Based on
source
MAC hash
2 1,500
(default)
Table 10 Management Cluster Distributed Switch Port Group Configuration Settings
Parameter Setting
Load balancing Route based on the source MAC hash
Failover detection Link status only
Notify switches Enabled
Failback No
Failover order Active uplinks: Uplink1, Uplink2
Management cluster hosts are connected to both private and public networks. There are 2
distributed virtual switches: one for private network and one for public network. Each switch has a
dedicated pair of 10Gbps network adapters.
Page 44 of 133 Copyright IBM and VMware
Figure 15 Network Switch Design for Management Hosts
Copyright IBM and VMware Page 45 of 133
Table 11 Management Virtual Switch Port Groups and VLANs
vSphere
Distributed
Switch
Port Group
Name
Teaming Uplinks VLAN ID
vDS-Mgmt-Priv
vDS-Mgmt-Priv-
Management
Source MAC
hash
Active: 0, 1 VLAN1
vDS-Mgmt-Priv vDS- Mgmt-Priv -
vMotion
Source MAC
hash
Active: 0, 1 VLAN3
vDS-Mgmt-Priv vDS- Mgmt-Priv -
VSAN
Source MAC
hash
Active: 0, 1 VLAN2
vDS-Mgmt-Priv
Page 46 of 133 Copyright IBM and VMware
5.2.3.1.2 Edge Cluster Distributed Switches
The edge cluster uses a dual vSphere Distributed Switch with the configuration settings shown in
this section.
Table 13 Edge Cluster Distributed Switch
vSphere
Distributed
Switch Name
Function Network
I/O
Control
Load
Balancing
Mode
Number of
Physical
NIC Ports
MTU
vDS-Edge-Priv ESXi
management
Virtual SAN
vSphere
vMotion
VXLAN
Tunnel
Endpoint
(VTEP)
Enabled Based on
source
MAC hash
2 9,000
(Jumbo
Frame)
vDS-Edge-Pub External
user traffic
(North-
South)
Enabled Based on
source
MAC hash
2 1500
(default)
Table 14 Management Cluster Distributed Switch Port Group Configuration Settings
Parameter Setting
Load balancing Route based on the source MAC hash
Failover detection Link status only
Notify switches Enabled
Failback No
Failover order Active uplinks: Uplink1, Uplink2
Copyright IBM and VMware Page 47 of 133
Edge cluster hosts are connected to both private and public networks. There are 2 distributed
virtual switches: one for private network and one for public network. Each switch has a dedicated
pair of 10Gbps network adapters.
Figure 16 Network Switch Design for Edge Hosts
Page 48 of 133 Copyright IBM and VMware
Table 15 Edge Virtual Switch Port Groups and VLANs
vSphere
Distributed
Switch
Port Group
Name
Teaming Uplinks VLAN ID
vDS-Edge-Priv
vDS-Edge-Priv-
Management
Source MAC hash Active: 0, 1 VLAN1
vDS-Edge-Priv vDS-Edge-Priv -
vMotion
Source MAC hash Active: 0, 1 VLAN3
vDS-Edge-Priv vDS-Edge -Priv-
VSAN
Source MAC hash Active: 0, 1 VLAN2
vDS-Edge-Priv vDS-Edge-Priv-
VTEP
Source MAC hash Active: 0, 1 VLAN1
vDS-Edge-Pub vDS-Edge-Pub-
External
Source MAC hash Active: 0, 1 VLAN4
Table 16 Edge VMkernel Adapter
vSphere
Distributed
Switch
Network Label Connected Port
Group
Enabled Services MTU
vDS-Edge-Priv
Management vDS-Edge-Priv -
Management
Management
Traffic
1,500 (default)
vDS-Edge-Priv vMotion vDS-Edge-Priv -
vMotion
vMotion Traffic 9,000
vDS-Edge-Priv VTEP vDS-Edge-Priv -
VTEP
- 9,000
vDS-Edge-Priv VSAN vDS-Edge-Priv -
VSAN
VSAN 9,000
5.2.3.1.3 Compute Cluster Distributed Switches
The compute cluster uses a single vSphere Distributed Switch with the configuration settings
shown in this section.
Table 17 Compute Cluster Distributed Switch
vSphere
Distributed
Switch Name
Function Network
I/O
Control
Load
Balancing
Mode
Number of
Physical
NIC Ports
MTU
vDS-Compute-
Priv ESXi
management
Virtual SAN
vSphere
vMotion
VXLAN
Tunnel
Enabled Based on
source
MAC hash
2 9,000
(Jumbo
Frame)
Copyright IBM and VMware Page 49 of 133
vSphere
Distributed
Switch Name
Function Network
I/O
Control
Load
Balancing
Mode
Number of
Physical
NIC Ports
MTU
Endpoint
(VTEP)
Table 18 Compute Cluster Distributed Switch Port Group Configuration Settings
Parameter Setting
Load balancing Route based on the source MAC hash
Failover detection Link status only
Notify switches Enabled
Failback No
Failover order Active uplinks: Uplink1, Uplink2
Compute cluster hosts are connected to the private network. The switch has a dedicated pair of
10Gbps network adapters.
Figure 17 Network Switch Design for Compute Hosts
Page 50 of 133 Copyright IBM and VMware
Table 19 Compute Virtual Switch Port Groups and VLANs
vSphere
Distributed
Switch
Port Group
Name
Teaming Uplinks VLAN ID
vDS-Compute-
Priv
vDS-Compute-
Priv-
Management
Source MAC
hash
Active: 0, 1 VLAN1
vDS- Compute
-Priv
vDS-Compute-
Priv-vMotion
Source MAC
hash
Active: 0, 1 VLAN3
vDS- Compute
-Priv
vDS-Compute-
Priv-VSAN
Source MAC
hash
Active: 0, 1
VLAN2
vDS- Compute
-Priv
vDS-Compute-
Priv-VTEP
Source MAC
hash
Active: 0, 1 VLAN1
Table 20 Compute VMkernel Adapter
vSphere
Distributed
Switch
Network Label Connected
Port Group
Enabled
Services
MTU
vDS-Compute-
Priv
Management vDS-Compute-
Priv-
Management
Management
Traffic
1,500 (default)
vDS-Compute-
Priv
vMotion vDS-Compute-
Priv-vMotion
vMotion Traffic 9,000
vDS-Compute-
Priv
VTEP vDS-Compute-
Priv-VTEP
- 9,000
vDS-Compute-
Priv
VSAN vDS-Compute-
Priv-VSAN
VSAN 9,000
5.2.3.1.4 NIC Teaming
With the exception of VSAN, this design uses NIC teaming to avoid single point of failure and
provide load balancing. To accomplish this, the design uses an active-active configuration with a
route that is based on source MAC hash for teaming.
5.2.3.1.5 Network I/O Control
This design uses Network I/O control enabled on all distributed switches with default shares.
Network I/O control increases resiliency and performance of the network.
Utilizing network I/O control, this design uses the distributed switch to allocate bandwidth for the
following system traffic types:
vSphere vMotion traffic
Management traffic
VMware vSphere Replication traffic
NFS traffic
VMware Virtual SAN traffic
vSphere Data Protection backup traffic
Virtual machine traffic
Copyright IBM and VMware Page 51 of 133
5.2.3.1.6 VXLAN
This design uses VXLAN to create isolated, multi-tenant broadcast domains across datacenter
fabrics and to enable customers to create elastic, logical networks that span physical network
boundaries.
VXLAN works by creating Layer 2 logical networks that are encapsulated in standard Layer 3 IP
packets. A Segment ID in every frame differentiates the VXLAN logical networks from each other
without any need for VLAN tags. As a result, large numbers of isolated Layer 2 VXLAN
networks can coexist on a common Layer 3 infrastructure.
5.2.3.2 Software-Defined Network Design
NSX offers the following Software-Defined Network (SDN) capabilities crucial to support the
cloud management platform operations.
load balancing
firewalls
routing
logical switches
VPN access
Because NSX for vSphere is tied to a vCenter Server domain, this design uses two separate NSX
instances. One instance is tied to the management vCenter Server, and the other instance is tied to
the compute and edge vCenter Server.
SDN capabilities are consumed via the cloud management platform, vSphere web client and API.
The design uses API’s to automate the deployment and configuration of NSX components by
users and cloud admin actors.
5.2.3.3 NSX for vSphere Components
This section describes the NSX for vSphere component configuration.
5.2.3.3.1 NSX Manager
NSX Manager provides the centralized management plane for NSX for vSphere and has a one-to-
one mapping to a vCenter Server instance. This design uses two NSX managers (one for each
vCenter Server in the design).
NSX Manager performs the following functions.
Provides the single point of configuration and the REST API entry-points in a vSphere
environment for NSX for vSphere.
Is responsible for deploying NSX Controller clusters, Edge distributed routers, and Edge
service gateways in the form of OVF appliances, guest introspection services, and so on.
Is responsible for preparing ESXi hosts for NSX for vSphere by installing VXLAN,
distributed routing and firewall kernel modules, and the User World Agent (UWA).
Communicates with NSX Controller clusters via REST and with hosts via the RabbitMQ
message bus. The internal message bus is specific to NSX for vSphere and does not
require setup of additional services.
Generates certificates for the NSX Controller instances and ESXi hosts to secure control
plane communications with mutual authentication.
5.2.3.3.2 NSX Controller
The NSX Controllers perform the following functions:
Provide the control plane to distribute VXLAN and logical routing information to ESXi
hosts.
Include nodes that are clustered for scale-out and high availability.
Slice network information across cluster nodes for redundancy.
Page 52 of 133 Copyright IBM and VMware
Remove requirement of VXLAN Layer 3 multicast in the physical network.
Provide ARP suppression of broadcast traffic in VXLAN networks.
NSX for vSphere control plane communication occurs over the management network.
This design implements a cluster of 3 NSX controllers for each NSX manager enabling high
availability for the controllers.
5.2.3.3.3 NSX vSwitch
The NSX for vSphere data plane consists of the NSX vSwitch. This vSwitch is based on the
vSphere Distributed Switch (vDS) with additional components for add-on services. The add-on
NSX for vSphere components include kernel modules which run within the hypervisor kernel and
provide services such as distributed logical router (DLR) and distributed firewall (DFW), and
enable VXLAN capabilities.
This design uses NSX vSwitch.
5.2.3.3.4 NSX Logical Switching
NSX for vSphere logical switches create logically abstracted segments to which tenant virtual
machines can be connected. A single logical switch is mapped to a unique VXLAN segment and is
distributed across the ESXi hypervisors within a transport zone. It allows line-rate switching in the
hypervisor without the constraints of VLAN sprawl or spanning tree issues.
This design uses NSX Logical Switching for handling compute workloads and connectivity
between different network zones.
5.2.3.3.5 Distributed Logical Router
The NSX for vSphere Distributed Logical Router (DLR) is optimized for forwarding in the
virtualized space, that is, forwarding between VMs on VXLAN- or VLAN-backed port groups.
This design does not use distributed logical routers.
5.2.3.3.6 User World Agent
The User World Agent (UWA) is a TCP (SSL) client that facilitates communication between the
ESXi hosts and the NSX Controller instances as well as the retrieval of information from the NSX
Manager via interaction with the message bus agent. UWA is installed on each ESXi host.
5.2.3.3.7 VXLAN Tunnel Endpoint
VXLAN Tunnel Endpoints (VTEPs) are responsible for encapsulating VXLAN traffic as frames
in UDP packets and for the corresponding de-encapsulation. VTEPs take the form of VMkernel
ports with IP addresses and are used both to exchange packets with other VTEPs.
This design uses a single VTEP per host.
5.2.3.3.8 Edge Services Gateway
The primary function of the NSX for vSphere Edge services gateway is north-south
communication, but it also offers support for Layer 2, Layer 3, perimeter firewall, load balancing
and other services such as SSL-VPN and DHCP-relay.
In this design, the edges also ensure east-west communication.
5.2.3.3.9 Distributed Firewall
NSX for vSphere includes a distributed kernel-level firewall known as a distributed firewall.
Security enforcement is done at the kernel and VM network adapter level. This enables firewall
Copyright IBM and VMware Page 53 of 133
rule enforcement in a highly scalable manner without creating bottlenecks on physical appliances.
The distributed firewall has minimal CPU overhead and can perform at line rate.
This design does not automatically implement distributed firewall. The cloud admin actor is able
to enable this feature post implementation if required.
5.2.3.3.10 Logical Load Balancer
The NSX for vSphere logical load balancer provides load balancing services up to Layer 7
(application), allowing distribution of traffic across multiple servers to achieve optimal resource
utilization and availability. The logical load balancer is a service provided by the NSX Edge
services gateway.
This design implements load balancing for management virtual machines.
5.2.3.4 NSX for vSphere Physical Network Requirements
VXLAN packets cannot be fragmented. Since VXLAN adds its own header information the MTU
needs to be 1,600.
SoftLayer has a limitation of 136 VXLAN addresses. As such, the VXLAN control plane in this
design uses unicast mode to circumvent this limitation.
The NSX Manager synchronizes with the same NTP server as the rest of the vSphere
environment. This avoids time drift which can cause problems with authentication. The NSX
Manager must be in sync with the vCenter Single Sign-On server.
5.2.3.5 NSX for vSphere Specifications
The following table lists the components involved in the NSX for vSphere solution and the
requirements for installing and running them. The compute and storage requirements have been
taken into account when sizing resources to support the NSX for vSphere solution.
Table 21 NSX Components Sizing
VM
vCPU Memory Storage Quantity per
Deployment by
Type
NSX Manager 4 12 GB 60 GB 1
NSX Controller 4 4 GB 20 GB 3
NSX Edge
services
gateway - Quad
Large
4 1 GB
512 MB
2
NSX Edge
services
gateway – X-
Large
6 8 GB 4.5 GB (+4 GB
for SWAP)
2
The Quad Large model is utilized for high performance firewall. X-Large is utilized for both high
performance load balancing and routing.
Page 54 of 133 Copyright IBM and VMware
5.2.3.6 Network Virtualization Conceptual Design
The following diagram depicts the conceptual tenant architecture components and their
relationship.
Figure 18 Network Virtualization Conceptual Design
The conceptual design has the following key components.
External Networks. Connectivity to and from external networks is through a perimeter
firewall. The main external network is the Internet.
Perimeter Firewall. The logical firewall exists at the perimeter of the datacenter. Each
tenant receives either a full instance or partition of an instance to filter external traffic.
This is the primary access method for user data.
Provider Logical Router (PLR). The PLR exists behind the perimeter firewall and
handles north/south traffic that is entering and leaving a tenant.
Tenant Edge services gateways. Edge services gateways that provide routing and
firewalling capabilities.
Internal Non-Tenant Networks. A single management network, which sits behind a
perimeter firewall but not behind the PLR. Enables cloud admin to manage the cloud
environment.
Internal Tenant Networks. Connectivity for the main tenant workload.
Copyright IBM and VMware Page 55 of 133
5.2.3.7 Cluster Design for NSX for vSphere
Management Stack
In the management stack, the underlying hosts are prepared for NSX for vSphere. The
management stack has these components:
NSX Manager instances for both stacks (management stack and compute/edge stack),
NSX Controller cluster for the management stack,
NSX Edge service gateways for the management stack.
Compute/Edge Stack
In the compute/edge stack, the underlying hosts are prepared for NSX for vSphere. The
compute/edge stack has these components:
NSX Controller cluster for the compute stack
All NSX Edge service gateways of the compute stack that are dedicated to handling the
north/south traffic in the datacenter. A separate edge stack helps prevent VLAN sprawl
because any external VLANs need only be trunked to the hosts in this cluster.
Multiple compute clusters that run the tenant workloads and have the underlying hosts
prepared for NSX for vSphere.
Figure 19 Cluster Design for NSX for vSphere
Page 56 of 133 Copyright IBM and VMware
5.2.3.8 High Availability of NSX for vSphere Components
The NSX Manager instances of both stacks run on the management cluster. vSphere HA protects
the NSX Manager instances by ensuring that the VM is restarted on a different host in the event of
primary host failure.
The NSX Controller nodes of the management stack run on the management cluster and the NSX
for vSphere Controller nodes of the compute stack run on the edge cluster. In both clusters,
vSphere Distributed Resource Scheduler (DRS) rules ensure that NSX for vSphere Controller
nodes do not run on the same host.
The data plane remains active during outages in the management and control planes although the
provisioning and modification of virtual networks is impaired until those planes become available
again.
The NSX Edge services gateways and distributed logical router control VMs of the compute stack
are deployed on the edge cluster. The NSX Edge service gateways and distributed logical router
control VMs of the management stack run on the management cluster.
All NSX Edge components are deployed in NSX for vSphere HA pairs. NSX for vSphere HA
provides better availability than vSphere HA. By default, the VMs fail over within 15 seconds
versus a potential 5 minutes for a restart on another host under vSphere HA.
5.2.3.9 Logical Switch Control Plane Mode Design
The control plane decouples NSX for vSphere from the physical network and handles the
broadcast, unknown unicast, and multicast (BUM) traffic within the logical switches. The control
plane is on top of the transport zone and is inherited by all logical switches that are created within
it, although it is possible to override aspects of the control plane. The following options are
available:
Multicast Mode. The control plane uses multicast IP addresses on the physical network.
Use multicast mode only when upgrading from existing VXLAN deployments. In this
mode, PIM/IGMP must be configured on the physical network.
Unicast Mode. The control plane is handled by the NSX Controllers and all replication
occurs locally on the host. This mode does not require multicast IP addresses or physical
network configuration.
Hybrid Mode. This mode is an optimized version of the unicast mode where local traffic
replication for the subnet is offloaded to the physical network. Hybrid mode requires
IGMP snooping on the first-hop switch and access to an IGMP querier in each VTEP
subnet. Hybrid mode does not require PIM.
Due to SoftLayer constraints on the number of available multicast addresses, this design uses
unicast mode.
5.2.3.10 Transport Zone Design
A transport zone is used to define the scope of a VXLAN overlay network which can span one or
more clusters within a vCenter Server domain. One or more transport zones can be configured in
an NSX for vSphere solution. A transport zone is not meant to delineate a security boundary.
The design implements a single transport zone per NSX instance. Each zone will be scaled to
respect the limit of 100 DLRs per ESXi host. A single transport zone per NSX instance supports
extending networks across resource stacks.
Copyright IBM and VMware Page 57 of 133
5.2.3.11 Routing Logical Design
The routing logical design has to consider different levels of routing in the environment:
North-south. The Provider Logical Router (PLR) handles the north/south traffic to and
from a tenant
East-west. Internal east-west routing at the layer beneath the PLR deals with the
application workloads.
NSX Edge services gateways is deployed for each management application to provide routing for
the application. Virtual application network is attached directly to the edge services gateway. No
distributed logical routers are deployed beneath.
5.2.3.12 Firewall Logical Design
In this design, firewall functions are controlled at NSX Edge services gateway firewall. The Edge
services gateway virtual machines act as the entry point to a tenant’s virtual network space. NSX
Edge services gateways are configured with the firewall enabled to support functionality that is
required by virtual application networks. External access via load balancer virtual IPs when public
services are provided by redundant management application VMs. External access is via a DNAT
rule when service is provided by a single VM.
5.2.3.13 Load Balancer Design
The Edge services gateway implements load balancing within NSX for vSphere; it has both a
Layer 4 and a Layer 7 engine that offer different features, summarized in the following table.
Table 22 Load Balancer Features
Feature Layer 4 Engine Layer 7 Engine
Protocols TCP TCP
HTTP
HTTPS (SSL Pass-through)
HTTPS (SSL Offload)
Load Balancing Method Round Robin
Source IP Hash
Least Connection
Round Robin
Source IP Hash
Least Connection
URI
Health Checks TCP TCP
HTTP (GET, OPTION, POST)
HTTPS (GET, OPTION, POST)
Persistence (keeping client
connections to the same
back-end server)
TCP: SourceIP TCP: SourceIP, MSRDP
HTTP: SourceIP, Cookie
HTTPS: SourceIP, Cookie,
ssl_session_id
Connection Throttling No Client Side: Maximum
concurrent connections,
Maximum new connections per
second
Server Side: Maximum
concurrent connections
Page 58 of 133 Copyright IBM and VMware
Copyright IBM and VMware Page 59 of 133
3-tier client/server architecture with a presentation tier (user interface), functional process logic
tier, and data tier. This architecture requires a load balancer for presenting end-user facing
services. This design implements each of these management applications as their own trust zone
and isolates management applications from each other.
Management applications are placed on isolated networks (VXLAN backed networks). The virtual
application network is fronted by an NSX Edge services gateway to keep applications
isolated. The use of load balancer interfaces is required for inbound access. The use of SNAT is
required for outbound access. Direct access to virtual application networks is by connecting to
Windows machines that are connected to the management networks directly.
Unique addressing is required for all management applications. This approach to network
virtualization service design improves security and mobility of the management applications, and
reduces the integration effort with existing customer networks.
From a software-defined networking design perspective, each management application is treated
as a separate management tenant that requires one or more logical networks via VXLAN segment.
The VXLAN segments are connected to the outside world by using a pair of NSX Edge services
gateways.
Figure 20 Virtual Application Network Components and Design
The NSX Edge services gateway associated with a management application is connected via an
uplink connection to a public accessible network and contains at least one IPv4 address on this
network.
As a result, this device offers the following capabilities.
If the IPv4 range that is used for the application-internal network does not overlap with
any other existing IPv4 range, the central DNS service can create DNS entries for nodes
on this internal network. Split DNS is not necessary.
Inbound access to services, such as Web UI, is supported by the load balancer capabilities
of the NSX Edge services gateway.
Application nodes can access the corporate network or the Internet via Source NAT
(SNAT) on the NSX Edge services gateway or via the vSphere management network.
Page 60 of 133 Copyright IBM and VMware
Routed (static or dynamic) access to the vSphere management network is available for
access to the vCenter Server instances.
5.2.3.17 Virtual Network Design Example
The following figure presents the design for vRealize Automation that is implemented by the
architecture. This design is utilized for other management applications and can be implemented for
workloads deployed on compute cluster.
Figure 21 vRA Virtual Network Design
The design is set up as follows:
vRealize Automation is deployed onto a single Layer 2 segment, which is provided by a
VXLAN virtual wire. Micro segmentation between NSX components is not required and
therefore not used.
The network used by vRealize Automation connects to external networks through NSX
for vSphere. NSX Edge services gateways route traffic between management application
virtual networks and the public network.
Access to the isolated vSphere-Mgmt network is available through the MgmtCentral-
Edge services gateway that provides region-specific routing.
Copyright IBM and VMware Page 61 of 133
All Edge services gateways are connected over the networkExchange network that acts as
a transit network and as an interface to exchange routing information between the Edge
services gateways. To provide easy mobility of the network used by vRealize
Automation during recovery in another region, this network uses an RFC 1918 isolated
IPv4 subnet and uses Source NAT (SNAT) to access external networks such as the
Internet.
Services such as a Web GUI, which must be available to the users of vRealize
Automation, are accessible via the NSX Edge load balancer on the IPv4 address residing
on the external network.
Each application must use a unique IPv4 range for the application internal network(s). The unique
IPv4 range supports use of the central DNS service for creating DNS entries for nodes on this
internal network. The following table shows an example of how a mapping from management
applications to IPv4 subnets might look.
Table 23 Management Applications IP Addressing
Management Application Internal IPv4 Subnet
vRealize Automation (includes vRealize Orchestrator) 192.168.11.0/24
vRealize Automation Proxy Agents 192.168.12.0/24
192.168.13.0/24
vRealize Operations Manager 192.168.21.0/24
vRealize Operations Remote Collector 192.168.22.0/24
192.168.23.0/24
vRealize Log Insight 192.168.31.0/24
192.168.32.0/24
The management applications vRealize Automation, vRealize Operations Manager, and vRealize
Log Insight divert from the above described setup slightly:
vRealize Automation uses three network containers.
o One container is for the main vRealize Automation application cluster that can
be failed over by using Site Recovery Manager to a secondary region.
o Two additional network containers - one in each region - hold vRealize
Automation proxy agents.
vRealize Operations Manager uses three network containers.
o One container is for the main vRealize Operations Manager analytics cluster that
can be failed over to a secondary region by using Site Recovery Manager.
Two additional network containers - one in each region - are for connecting remote
collectors.
o vRealize Log Insight does not use Site Recovery Manager to fail over between
regions. Instead, a dedicated instance of Log Insight is deployed in each region.
To support this configuration, an additional cloud region is required.
Page 62 of 133 Copyright IBM and VMware
5.2.3.18 Routing and Region Connectivity Design
This multi-region design is an extension of the single-region design that takes into account
management component failover. The figure below is an example of how virtual application
networks are built for vRealize Automation in the central cloud and cloud regions. The same
virtual application network configuration is valid for all the other virtual application networks.
Figure 22 Virtual Application Network Configuration in Central Cloud and Cloud Region
5.2.3.18.1 Dynamic Routing
Routing is handled by NSX Edge. Management network Edge services gateways,
MgmtCentral-Edge and MgmtRegionA-Edge, work in conjunction with Edge services
gateways that are configured to create virtual application network for each central cloud or cloud
region management component. A dedicated network, networkExchange, facilitates the exchange
of transit network traffic and the exchange of routing tables. All NSX Edge services gateways that
are attached to the networkExchange network segment run OSPF to exchange routing information.
OSPF route information exchange depends on the area definition, which is based on the local
vSphere-Mgmt network:
Table 24 OSPF Area ID
Network Region IP Addressing Area ID
vSphere-Mgmt Central Cloud 172.16.11.0/24 16
vSphere-Mgmt Region A
(Cloud Region)
172.17.11.0/24 17
Copyright IBM and VMware Page 63 of 133
Virtual application network Edge services gateway receives an OSPF default route from
the Management Edge services gateway which has access to the public network. Components
directly attached to the vSphere-Mgmt network use the Management Edge services gateway as the
default gateway to reach all components in the environment.
All VMs within virtual application networks can route to VMs in other virtual application
networks. Base OSPF Area IDs on the second octet of the vSphere management network, and use
MD5 encryption for authentication.
NSX Edge appliance size is X-Large and is deployed in High Availability (HA) mode.
A dedicated network for exchange of routing information and traffic between different gateways is
configured. Access to public networks uses SNAT.
5.2.3.18.2 Region Connectivity and Dynamic Routing
There is one Management Edge services gateway in each region. The role of the Management
Edge services gateway is to provide connectivity between regions and to exchange region-specific
routes with other connected Management Edge services gateways. The Management Edge services
gateway is configured to exchange OSPF routing data with the entire region-specific virtual
application network. Edge services gateways are attached to their respective networkExchange
networks. These routes are consolidated and exchanged with all connected Management Edge
services gateways using iBGP. All Management Edge services gateways belong to the same AS
(Autonomous System) with regards to BGP routing.
In this design, connectivity between two regions uses the direct backend connectivity available
between sites.
5.2.3.18.3 Virtual Application Networks and Dynamic Routing Across Regions
For the components that fail over to another region in the event of a disaster, route redistribution is
disabled in NSX Edge. This prevents the same routes from being advertised in both regions for
their virtual application networks. This makes it possible for the virtual application network Edge
services gateway to participate in region-specific OSPF route exchanges without announcing its
virtual application network route.
To facilitate failover of the virtual application network from one region to the other for some
components, this design creates a shadow virtual application network in the recovery region, an
additional cloud region in this design. This shadow virtual application network is configured
identically, with the same SNAT and DNAT rules and load balancer VIP as needed. The only
difference is the IP addressing used on the networkExchange network and the optional public
interface.
Component virtual application networks can be moved between regions either for testing or during
failover. When a component is moved the location of that component’s virtual application
network needs to be changed in the VPN configuration. The virtual application network Edge
services gateway also needs to be reconfigured to either start or stop redistributing connected
virtual application networks. The decision to start or stop redistribution depends on whether
the virtual application network Edge services gateway is now the active Edge services gateway or
the failed over Edge services gateway.
5.3 Infrastructure Management
Infrastructure management is provided by VMware vCenter Server. The configuration of vCenter Server is
described in the following sections.
5.3.1 vCenter Server Instances
The vCenter Server design includes both the vCenter Server instances and the VMware Platform Services
Controller instances.
Page 64 of 133 Copyright IBM and VMware
A VMware Platform Services Controller (PSC) groups a set of infrastructure services including vCenter
Single Sign-On, License service, Lookup Service, and VMware Certificate Authority. Although the
Platform Services controller and the associated vCenter Server system can be deployed on the same virtual
machine (embedded Platform Services Controller), this design using a separate PSC and vCenter Server.
This design includes two vCenter Server instances per central cloud or cloud region; one managing the
management cluster, the other managing the compute and edge clusters. This separation of the management
and compute stack not only provides additional security, but also supports scalability.
The vCenter Server is deployed as the Linux-based vCenter Server Appliance which is functionally
equivalent to the Windows based application, but easier to deploy and maintain.
The two external Platform Services Controller (PSC) instances and two vCenter Server instances are both
housed in the management cluster. The external PSC instances are used to enable replication of data
between PSC instances. To achieve redundancy, the design joins the two Platform Services Controller
instances to the same vCenter Single Sign-On domain, and connects each vCenter Server instance to their
respective Platform Services Controller instance. Joining all Platform Services Controller instances into a
single vCenter Single-Sign-On domain provides this design the capability to share authentication and
license data across all components and regions. Additionally, the vCenter Servers in each region are
configured to utilize Enhanced Linked Mode so that all inventories and actions are available from a each
vCenter Server in the region.
In terms of configuration, each vCenter Server system is configured with a static IP address and host name
from the management private portable SoftLayer subnet. The IP address will have a valid (internal) DNS
registration including reverse name resolution. The vCenter Server systems will also maintain network
connections to the following components:
All VMware vSphere Client and vSphere Web Client user interfaces.
Systems running vCenter Server add-on modules, such as NSX or vUM.
Each ESXi host.
Figure 23 vCenter Server and PSC Deployment Model
Copyright IBM and VMware Page 65 of 133
5.3.1.1 vCenter Server Resiliency
Protecting the vCenter Server system is important because it is the central point of management
and monitoring for the environment. In this design, the vCenter Servers and PSCs are protected
via vSphere HA.
5.3.1.2 Size of vCenter Server Appliances
The size of the vCenter Service appliances is described in the following sections.
5.3.1.2.1 vCenter Server Appliance for Management Cluster
The number of hosts that are managed by the Management vCenter Server is less than 100 and the
number of virtual machines to be managed is less than 1,000. As a result, this design includes a
“small” vCenter Server configured with the following specifications and still allows for cloud
admin’s to include additional services within the cluster:
Table 25 Specifications for Management vCenter Server Appliance
vCPUS Memory Disk Space Disk Type
4 16 GB 136 GB Thin
5.3.1.2.2 Platform Services Controller for Management Cluster
The Platform Services controller is deployed onto the management cluster with the following
configuration:
Table 26 Specifications for Platform Service Controller for Management Cluster
vCPUS Memory Disk Space Disk Type
2 2 GB 30 GB Thin
Additionally, the PSC connects to the Active Directory server within this design for common
authentication.
5.3.1.2.3 vCenter Server Appliance for Compute and Edge Clusters
The Compute and Edge vCenter Server in this design manages up to 1,000 hosts or 10,000 virtual
machines whichever comes first. As a result, this design includes a “large” vCenter Server
configured with the following specifications:
Table 27 Specifications for Compute and Edge vCenter Server Appliance
vCPUS Memory Disk Space Disk Type
16 32 GB 295 GB Thin
5.3.1.2.4 Platform Services Controller for Compute and Edge Clusters
The Platform Services controller for vCenter managing the Compute and Edge clusters are
deployed onto the management cluster and contain the following configuration:
Table 28 Specifications for Platform Service Controller for Management Cluster
vCPUS Memory Disk Space Disk Type
2 2 GB 30 GB Thin
Additionally, the PSC conencts to the Active Directory server within this design for common
authentication.
Page 66 of 133 Copyright IBM and VMware
5.3.1.3 vCenter Database Design
A vCenter Server Appliance can use either a built-in local PostgreSQL database or an external
Oracle database. Both configurations support up to 1,000 hosts or 10,000 virtual machines. This
design will utilize the built-in PostgreSQL to reduce both overhead and Microsoft or Oracle
licensing costs. This also avoids problems with upgrades and support since the ability to attach to
an external database is deprecated for vCenter Server Appliance in the next release.
5.3.1.4 vCenter Networking Design
The vCenter Servers and Platform Services Controllers in each region will be placed onto the
management network to access physical ESXi hosts within the region.
5.3.1.5 Cluster Configuration
Each of the clusters will be configured per the following sections.
5.3.1.5.1 vSphere DRS
This design will utilize vSphere Distributed Resource Scheduling (DRS) in each cluster to initially
place and dynamically migrate virtual machines to achieve balanced compute clusters. The
automation level is set to fully automated so that initial placement and migration recommendations
are executed automatically by vSphere. Note that power management via the Distributed Power
Management feature is not used in this design. Additionally, the migration threshold is set to
partially aggressive so that vCenter will apply priority 1, 2, 3, and 4 recommendations to achieve
at least a moderate improvement in the cluster’s load balance.
5.3.1.5.2 Affinity Rules
In this design, affinity and anti-affinity rules are used to ensure placement of virtual machines that
are in a high availability cluster are not placed on the same ESXi host.
5.3.1.5.3 vSphere HA
This design will use vSphere HA in each cluster to detect compute failures and recover virtual
machines running within a cluster. The vSphere HA feature in this design is configured with both
host monitoring and admission control enabled within the cluster. Additionally, each cluster will
reserve 25% of cluster resources as spare capacity for the admission control policy.
By default, the VM restart priority is set to medium and the host isolation response is set to “leave
powered on”. Additionally, VM monitoring is disabled and the datastore heart-beating feature is
configured to include any of the cluster datastores. In this design, the datastore is a VSAN-backed
datastore.
5.3.1.5.4 vSphere vMotion
vSphere vMotion is enabled in this design by the use of shared storage and network configuration.
5.3.1.5.5 vSphere FT
vSphere FT is not utilized or available in this design.
5.4 Common Services
5.4.1 Identity and Access Services
Microsoft Active Directory (AD) is used as the identity provider (IDP) in this design. In a single region
deployment of this design, two MS Windows 2012 R2 Active Directory server VMs are configured for a
Copyright IBM and VMware Page 67 of 133
single AD forest. Naming of the domain is derived from input data prior to design provisioning. Each
server has the MS Windows DNS service installed and configured with all forward and reverse lookup
zones as type AD integrated multi master enabled. Subsequent regions contain only one AD / DNS server
in each region setup as a peer multi master within its own MS AD site configured within the AD Sites
management console.
5.4.1.1 Forrest and subdomains
This design uses Microsoft Active Directory for authentication and authorization to resources
within the tornado.local domain. For a multi-region deployment, the design utilizes a
domain and forest structure to store and manage Active Directory objects per region.
Table 29 Requirements for Active Directory Service
Requirement Domain
Instance
Domain Name Description
Active
Directory
configuration
Parent
Active
Directory
tornado.local Contains Domain Name System
(DNS) server, time server, and
universal groups that contain global
groups from the child domains and
are members of local groups in the
child domains.
Central
child Active
Directory
central.tornado.local Contains DNS records which
replicate to all DNS servers in the
forest. This child domain contains
all design management users, and
global and local groups.
Region-A
child Active
Directory
regiona.tornado.local Contains DNS records which
replicate to all DNS servers in the
forest. This child domain contains
all design management users, and
global and local groups.
Active
Directory users
and groups
- The default AD internal LDAP
directory structure will be used for
users and computer objects.
ou=computers, DC=central,
DC=tornado, DC=local
ou=users, DC=central, DC=tornado,
DC=local
5.4.1.2 AD Functional level
This design’s AD functional level will be set at “Windows 2008”, which is the lowest level which
is allowed by the other components in this design.
5.4.1.3 AD Trusts
External trusts to customer AD environments are configured as required.
5.4.1.4 Authentication
Within this design, authentication is provided by a mix of the following:
direct AD integration
SAML authentication via the VMware Platform Services Controller (PSC), backed by
AD
Page 68 of 133 Copyright IBM and VMware
local user accounts within the application.
The VMware Platform Services Controller is configured to use the AD servers within this design
for its identity management directory. The following chart describes which authentication
method is used for user / administrator access for each component
Table 30 Authentication types used
Component Authentication Context Authentication authority
vRA Administration Active Directory
Tenant Active Directory
vROps Administration / Roles Active Directory
vCenter Server Administration / Roles Active Directory
vRO Administration / Roles Active Directory
ESXi Administration Active Directory
NSX Administration Active Directory
vRI Administration Active Directory
vRB Administration Active Directory
Tenant Active Directory
vUM Administration Active Directory
NSX Administration Active Directory
VPN Active Directory
vDP Administration Active Directory
5.4.1.5 AD VM server sizing
Table 31 Server Sizing
Attribute Specification
Number of CPUs 4
Memory 8 GB
Disk size 80 GB
Number of servers in
central cloud
4
Number of servers per
additional cloud regions
2
5.4.2 Domain Name Services
Domain name services (DNS) is a fundamental function of any multi-component network
infrastructure design. DNS not only stores host name to IP address information but also service
type information required by Microsoft Active Directory and other applications. Dynamic DNS
capability is also required for MS AD deployments for the registration of the AD service record
types as subzone creation. The AD integrated MS DNS server is utilized for this design.
Copyright IBM and VMware Page 69 of 133
5.4.2.1 DNS Design
In a single region deployment of this design, four MS Windows 2012 R2 Active Directory server
VMs are configured. Two for the root of the forest domain and 2 for the region subdomain. Each
server has the MS Windows DNS service installed and configured with all forward and reverse
lookup zones as type AD integrated multi master enabled. Subsequent regions contain only one
AD / DNS server in each region setup as a peer multi master within its own MS AD site
configured within the AD Sites management console. The following applies to all services and
hosts within this design.
All nodes must use static IP addresses.
As a DNS best practice, the IP address, FQDNs and short names of all nodes must be
forward and reverse resolvable.
All nodes must be accessible from the vCenter Server instances and from the machine
that hosts the vSphere Client (if vSphere Client is used instead of vSphere Web Client).
All nodes in the vRealize Operations Manager analytics cluster must have unique host
names.
NTP source must be used for all cluster nodes
DNS is an important component for the operation of this design. For a multi-region deployment,
other regions must provide a root and child domains which contain separate DNS records.
5.4.2.2 DNS Configuration Requirements
The following table shows an example of the domain naming that can be implemented by the
design. The specific domain naming is determined by the client requirements.
Table 32 Domain Naming Example
Requirement Domain Instance Description
DNS host
entries
tornado.local Resides in the tornado.local
domain.
central.tornado.local
and regiona.tornado.local
DNS servers reside in the
central.tornado.local and
regiona.tornado.local
domains.
Configure both DNS servers with
the following settings:
Dynamic updates for the
domain set to Nonsecure
and secure.
Zone replication scope for
the domain set to All DNS
server in this forest.
Create all hosts listed in the
DNS Names
documentation.
Once properly configured, all nodes within this design are resolvable by FQDN.
Page 70 of 133 Copyright IBM and VMware
5.4.2.3 DNS Forwarding
For name resolution outside of the design, the forest root DNS domain servers will have
forwarders configured to the internal SoftLayer DNS servers if no client on premises DNS is
available. If a client on premises DNS is available, then that becomes the DNS forwarding address
or addresses.
Table 33 SoftLayer DNS servers
DNS hostname IP v4 address
rs1.service.softlayer.com 10.0.80.11
rs2.service.softlayer.com 10.0.80.12
5.4.3 NTP Services
Time synchronization is critical for the functionality of many of the services that comprise this design. For
this purpose, Microsoft Windows Active Directory (AD) servers within the solution are configured as NTP
sources for all services within the design with the exception of VMware Data Protection (vDP), vRealize
Log Insight (vRI) and the ESXi hosts. The Windows AD servers are configured to utilize the internal
SoftLayer time server “servertime.service.softlayer.com” as the authoritative time source. The vDP
application server, is configured to utilize VMware tools as its time source per the vDP documentation
recommendation. The ESXi hosts and vRI are configured to point to the SoftLayer NTP server.
Table 34 Time sources
Service / Application Time Source
Active Directory servers servertime.service.softlayer.com
vDP VMware tools
ESXi servertime.service.softlayer.com
vRealize Log Insight servertime.service.softlayer.com
All else AD Servers
5.4.4 SMTP Services
SMTP services are required for outbound notifications and may be used for inbound communications to
vRealize Automation. The following sections details the configuration of SMTP in this design.
5.4.4.1 SMTP outbound
Within this design, email notifications are sent using SMTP for the following products:
vRealize Automation
vRealize Business
vRealize Operations
vRealize Log Insight
vRealize Orchestrator
vCenter Server
VMware Data Protection
Copyright IBM and VMware Page 71 of 133
An SMTP server service is configured on each windows AD server which is utilized as an SMTP
relay for all outbound SMTP notifications. This service is configured to use the default SMTP port
(25) where the connection to the customer’s email servers is over direct or VPN connection. As
SoftLayer blocks port 25 outbound, a secure SSL port (465, 587), or custom port is configured on
the SMTP server based on the destination email provider, provided by the customer. This is only
required where a direct connection or VPN from the customer site into the management network is
not available.
5.4.4.2 SMTP inbound
For vRealize Automation, an inbound email server to handle inbound email notifications, such as
approval responses, is required to be configured. This is typically the customer email server with
email accounts created for vRA. Only one, global inbound email server, which appears as the
default for all tenants, is needed. The email server provides accounts that are able to be customized
for each user, providing separate email accounts, usernames, and passwords. Each tenant can
configure an override to change these settings. If service provider actors do not override these
settings before enabling notifications, vRealize Automation uses the globally configured email
server.
The connection to the customer email system requires that this email system be network reachable
by the vRA system.
5.4.5 Certificate Authority Services
By default, vSphere 6.0 uses TLS/SSL certificates that are signed by VMCA (VMware Certificate
Authority) residing on the VMware Platform Services Controller appliance (PSC). By default, these
certificates are not trusted by end-user devices or browsers. It is a security best practice to replace at least
user-facing certificates with certificates that are signed by a third-party or enterprise Certificate Authority
(CA). Certificates for machine-to-machine communication can remain as VMCA-signed certificates. For
this design, a Microsoft Active Directory Enterprise integrated certificate authority is used. A two tier MS
certificate authority design is employed with 1 windows server as the offline root CA and an additional
windows server as the online subordinate CA.
5.4.5.1 Offline root CA server VM
As per Microsoft best practice, the root CA windows server does not participate in this design’s
AD domain. Setup the root CA per the following Microsoft information page:
https://technet.microsoft.com/library/hh831348.aspx
Note that while it is not recommended to connect this machine to a network, it will be network
connected until the root CA is generated. Once generated, transfer the root certificate files to the
subordinate CA server, remove the network access from the root CA server and shut down the OS.
It is recommended that client’s export this VM as an OVA to a secure location and remove it from
the vCenter Server’s inventory.
5.4.5.2 Online subordinate CA server VM
Setup the subordinate CA per the following MS link:
https://technet.microsoft.com/library/hh831348.aspx
Note that this system will also be configured for handling certificate requests and distributing
certificate revocation lists (CRL).
The VMCA is configured with a certificate issued via the MS Active directory integrated CA such
that all subsequent browser access to VMware services, are validated by any clients that a
members of the designs AD domain or have imported the domains trusted root certificate.
Page 72 of 133 Copyright IBM and VMware
Table 35 Root CA and Subordinate CA sizing
Attribute Specification
OS Windows 2012 R2
Number of CPUs 2
Memory 4 GB
Disk size 60 GB
5.5 Cloud Management Services
The Cloud Management Services provide the management components for the cloud solution. This layer
includes the Service Catalog, which houses the facilities to be deployed, Orchestration which provides the
workflows to get the catalog items deployed, and the Self-Service Portal that empowers the users to take
full advantage of the Software Defined Datacenter. vRealize Automation provides the Portal and the
Catalog, and vRealize Orchestrator takes care of the process orchestration.
The conceptual design of the vRealize Automation Cloud Management Services is illustrated in the
following diagram. Key design components and their descriptions are also provided.
Figure 24 vRealize Automation Conceptual Design
The Cloud Management Services consist of the following elements and components:
Users:
Cloud administrators – Tenant, group, fabric, infrastructure, service, and other administrators as
defined by business policies and organizational structure.
Copyright IBM and VMware Page 73 of 133
Cloud (or tenant) users – Provide direct access to virtual machine to perform operating system-
level operations provided by vRealize Automation IaaS services.
Cloud Management Portal:
vRealize Automation portal, Admin access – The default root tenant portal URL used to set up and
administer tenants and global configuration options.
vRealize Automation portal, Tenant access – Refers to a subtenant and is accessed using a unique
URL, with an appended tenant identifier.
It is also possible for a tenant portal to refer to the default tenant portal in some configurations. In
this case, the URLs are the same and the user interface is contextually controlled by the assigned
RBAC permissions of that user.
Tools and supporting infrastructure:
VM Templates and Blueprints - These are the templates used in authoring the blueprints that
tenants (users) use to provision their cloud workloads.
Provisioning infrastructure - the following are the on-premises and off-premises resources which together
form a hybrid cloud
Virtual – Supported hypervisors and associated management tools.
Cloud – Supported cloud providers and associated API interfaces.
In the above diagram illustrating the conceptual design of the Cloud Management Platform, these
resources are located in the Internal Virtual Resources and the External Cloud Resources
components.
The cloud management services deliver multi-platform and multi-vendor cloud services. The services
available include the following items.
Comprehensive and purpose-built capabilities that provide standardized resources to global
customers in a short time span.
Multi-platform and multi-vendor delivery methods that integrate with existing enterprise
management systems.
Central user-centric and business-aware governance for all physical, virtual, private, and public
cloud services.
Design that meets the customer and business needs and is extensible.
Table 36 Cloud Management Services Components
Component Services Provided
vRealize Automation Identity Appliance vCenter Single Sign-On
vRealize Automation virtual appliance vRealize Automation Portal Web/App
Server
vRealize Automation vPostgreSQL
Database
vRealize Automation Service Catalog
vRealize Automation IaaS components vRealize Automation IaaS Web
Server
vRealize Automation IaaS Manager
Services
Page 74 of 133 Copyright IBM and VMware
Component Services Provided
Distributed execution components vRealize Automation Distributed
Execution Managers:
Orchestrator
Workers
Integration components vRealize Automation Agent machines
Provisioning infrastructure vSphere environment
vRealize Orchestrator environment
Other supported physical, virtual, or
cloud environments.
Supporting infrastructure Microsoft SQL database environment
Active Directory environment
SMTP
NTP
DNS
5.5.1 Cloud Management Physical Design
This design uses NSX logical switches to abstract the vRealize Automation application and its supporting
services. This abstraction allows the application to be hosted in any given region regardless of the
underlying physical infrastructure such as network subnets, compute hardware, or storage types. This
design hosts the vRealize Automation application and its supporting services in the central cloud. The same
instance of the application manages both central cloud and any additional cloud regions.
Copyright IBM and VMware Page 75 of 133
Figure 25 vRealize Automation Design Overview for Central Cloud
Page 76 of 133 Copyright IBM and VMware
Figure 26 vRealize Automation Design Overview for Additional Cloud Regions
The configuration of these elements is described in the following sections.
5.5.1.1 vRealize Identity Appliance
vRealize Identity Appliance provides an infrastructure-independent failover capability. A single
appliance is deployed in the solution. vSphere HA is used to ensure high availability for Identity
Appliance.
The appliance is configured with 1 vCPU and 2 GB of RAM.
5.5.1.2 vRealize Automation Appliance
The vRealize Automation virtual appliance includes the Web portal and database services.
The vRealize Automation portal allows self-service provisioning and management of cloud
services, as well as authoring blueprints, administration, and governance. The vRealize
Automation virtual appliance uses an embedded PostgreSQL database for catalog persistence and
database replication.
Copyright IBM and VMware Page 77 of 133
The solution deploys two instances of the vRealize Automation appliance to achieve redundancy.
The database is configured between two vRealize Automation appliances for high availability.
Data is replicated between the embedded PostgreSQL database instances. Database instances are
configured in an active/passive. In this configuration, manual failover between the two database
instances is required.
Each appliance is configured with 4 vCPU and 16 GB of RAM.
5.5.1.3 vRealize Automation IaaS Web Server
vRealize Automation IaaS Web server provides a user interface within the vRealize Automation
portal web site for the administration and consumption of IaaS components. Two vRealize
Automation IaaS web servers are installed on virtual machines. Each virtual machine runs
Microsoft Windows Server 2012 R2 and it performs Model Manager (Web) and IaaS Web
functions. Each virtual machine is sized to 4 vCPU, 4 GB of RAM and 60 GB HDD.
5.5.1.4 vRealize Automation IaaS Manager Service and DEM Orchestrator Server
The vRealize Automation IaaS Manager Service and Distributed Execution Management (DEM)
server are at the core of the vRealize Automation IaaS platform. The vRealize Automation IaaS
Manager Service and DEM server supports several functions.
Manages the integration of vRealize Automation IaaS with external systems and
databases.
Provides multi-tenancy.
Provides business logic to the DEMs.
Manages business logic and execution policies.
Maintains all workflows and their supporting constructs.
A Distributed Execution Manager (DEM) runs the business logic of custom models, interacting
with the database and with external databases and systems as required. DEMs also manage cloud
and physical machines. The DEM Orchestrator monitors the status of the DEM workers. DEM
worker manages the scheduled workflows by creating new workflow instances at the scheduled
time and allows only one instance of a particular scheduled workflow to run at a given time. It also
preprocesses workflows before execution. Preprocessing includes checking preconditions for
workflows and creating the workflow's execution history.
The vRealize Automation IaaS Manager Service and DEM server are separate servers, but are
installed on the same virtual machine.
Two virtual machines are deployed to run both IaaS Manager Service and DEM Orchestrator. The
two servers share the same active/passive application model. Only one manager service can be
active at a time.
Each virtual machine runs Microsoft Windows Server 2012 R2. Each virtual machine is sized to 2
vCPU, 4 GB of RAM and 60 GB HDD.
5.5.1.5 vRealize Automation IaaS DEM Worker Server
vRealize Automation IaaS DEM workers are responsible for the provisioning and deprovisioning
tasks initiated by the vRealize Automation portal. DEM workers communicate with vRealize
Automation endpoints. In this instance, the endpoint is vCenter Server.
Each DEM Worker can process up to 15 concurrent workflows. Beyond this limit, workflows are
queued for execution. The current design implements 2 DEM workers for a total of 30 concurrent
workflows.
DEM Workers are installed on two virtual machines running Microsoft Windows Server 2012 R2.
Each virtual machine is sized to 4 vCPU, 8 GB of RAM and 60 GB HDD.
Page 78 of 133 Copyright IBM and VMware
5.5.1.6 vRealize Automation IaaS Proxy Agent
The vRealize Automation IaaS Proxy Agent is a Windows program that caches and forwards
information gathering from vCenter Server back to vRealize Automation. The IaaS Proxy Agent
server provides the following functions.
vRealize Automation IaaS Proxy Agent can interact with different types of hypervisors
and public cloud services, such as Hyper-V and AWS. For this design, only the vSphere
agent is used.
vRealize Automation does not virtualize resources by itself, but works with vCenter
Server to provision and manage the virtual machines. It uses vSphere agents to send
commands to and collect data from vCenter Server.
Two vRealize Automation vSphere Proxy Agent virtual machines are deployed in the current
architecture. The virtual machines are deployed on a dedicated virtual network to decouple them
from the main vRealize Automation infrastructure allowing.
Each virtual machine runs Microsoft Windows Server 2012 R2. Each virtual machine is sized to 2
vCPU, 4 GB of RAM and 60 GB HDD.
5.5.1.7 Load Balancer
Session persistence of a load balancer allows the same server to serve all requests after a session is
established with that server. The session persistence is enabled on the load balancer to direct
subsequent requests from each unique session to the same vRealize Automation server in the load
balancer pool.
The load balancer also handles failover for the vRealize Automation Server (Manager Service).
Only one Manager Service is active at any one time. Manual failover of Manager Service is
necessary. Session persistence is not enabled because it is not a required component for the
Manager Service.
The following tables describe load balancer implementation for vRealize Automation components:
Table 37 Load Balancer Application Profile
Server Role Type Enable SSL
Pass-through
Persistence Expires in
(Seconds)
vRealize
Automation
vPostgres
TCP n/a None n/a
vRealize
Automation
HTTPS (443) Enabled Source IP 120
vRealize
Automation
IaaS Web
HTTPS (443) Enabled Source IP 120
vRealize
Automation
IaaS Manager
HTTPS (443) Enabled Source IP 120
vRealize
Automation
Orchestrator
HTTPS (443) Enabled Source IP 120
Copyright IBM and VMware Page 79 of 133
Table 38 Load Balancer Service Monitoring Configuration
Monitor Interv
al
Timeou
t
Retries Type Method URL Receive
vRealize
Automation
vPostgres
3 9 3 TCP N/A N/A N/A
vRealize
Automation
3 9 3 HTTPS
(443)
GET /vcac/ser
vices/api
/status
REGIST
ERED
vRealize
Automation
IaaS Web
3 9 3 HTTPS
(443)
GET N/A N/A
vRealize
Automation
IaaS
Manager
3 9 3 HTTPS
(443)
GET /VMPS2 BasicHtt
pBinding
_VMPSP
roxyAge
nt_policy
vRealize
Automation
Orchestrator
3 9 3 HTTPS
(443)
GET /vco/api/
status
REGIST
ERED
Table 39 Load Balancer Pool Specifications
Server Role Algorithm Monitors Members Port Monitor
Port
vRealize
Automation
vPostgres
Round Robin <vRealize
Automation
vPostgres
monitor>
vRealize
Automation
Postgres
nodes
5432 5432
vRealize
Automation
Least
Connection
<vRealize
Automation
monitor>
vRealize
Automation
nodes
443 443
vRealize
Automation
IaaS Web
Least
Connection
<vRealize
Automation
IaaS Web
monitor>
IaaS web
nodes
443 443
vRealize
Automation
IaaS
Manager
Least
Connection
<vRealize
Automation
IaaS
Manager
monitor>
IaaS
Manager
nodes
443 443
vRealize
Automation
Orchestrator
Least
Connection
<vRealize
Automation
Orchestrator
monitor>
vRealize
Orchestrator
nodes
8281 8281
Table 40 Virtual Server Characteristics
Protocol Port Default Pool Application Profile
TCP 5432 vRealize Automation
vPostgres Pool
vRealize Automation
vPostgres Profile
HTTPS 443 vRealize Automation Pool vRealize Automation Profile
HTTPS 443 vRealize Automation IaaS
Web Pool
vRealize Automation IaaS
Web Profile
Page 80 of 133 Copyright IBM and VMware
Protocol Port Default Pool Application Profile
HTTPS 443 vRealize Automation IaaS
Manager Pool
vRealize Automation IaaS
Manager Profile
HTTPS 8281 vRealize Automation
Orchestrator Pool
vRealize Automation
Orchestrator Profile
5.5.2 vRealize Automation Supporting Infrastructure
A number of supporting elements are required for vRealize Automation. The following sections describe
their configuration.
5.5.2.1 Microsoft SQL Server Database
vRealize Automation uses a Microsoft SQL Server database to maintain the vRealize Automation
IaaS elements and the policies. The database also maintains information about the machines it
manages.
For simple failover of the entire vRealize Automation instance from one site to another, the
Microsoft SQL server is running in a virtual machine inside the vRealize Automation virtual
network. The virtual machine is running Microsoft Windows Server 2012 R2 and configured with
8 vCPU, 16 GB of RAM, 80 GB HDD. The Microsoft SQL Server Database version is SQL
Server 2012.
5.5.2.2 Postgres SQL Server Database
The vRealize Automation appliance uses a PostgreSQL database server to maintain the vRealize
Automation portal elements and services, and the information about the catalog items it manages.
Embedded PostgreSQL within each virtual appliance is utilized.
5.5.2.3 Notifications
System administrators configure default settings for both the outbound and inbound emails servers
used to send system notifications. Current solution implements outbound SMTP server only. The
automation creates a global outbound email server to process outbound email notifications. The
server appears as the default for all tenants.
5.5.3 vRealize Automation Cloud Tenant Design
A tenant is an organizational unit within a vRealize Automation deployment, and can represent a business
unit within an enterprise, or a company that subscribes to cloud services from a service provider. Each
tenant has its own dedicated configuration, although some system-level configuration is shared across
tenants.
5.5.3.1 Single-Tenant and Multi-Tenant Deployments
vRealize Automation supports deployments with a single tenant or multiple tenants. System-wide
configuration is always performed using the default tenant, and can then be applied to one or more
tenants. System-wide configuration specifies defaults for branding and notification providers.
Infrastructure configuration, including the infrastructure sources that are available for
provisioning, can be configured in any tenant and is shared among all tenants. The infrastructure
resources, such as cloud or virtual compute resources or physical machines, can be divided into
fabric groups managed by fabric administrators. The resources in each fabric group can be
allocated to business groups within each tenant by using reservations.
Single-Tenant Deployment — In a single-tenant deployment, all configuration occurs in
the default tenant. Service provider actors can manage users and groups, and configure
tenant-specific branding, notifications, business policies, and catalog offerings. All users
Copyright IBM and VMware Page 81 of 133
log in to the vRealize Automation console at the same URL, but the features available to
them are determined by their roles.
Multi-Tenant Deployment — In a multi-tenant deployment, the system administrator
creates new tenants for each organization that uses the same vRealize Automation
instance. Tenant users log in to the vRealize Automation console at a URL specific to
their tenant. Tenant-level configuration is segregated from other tenants and from the
default tenant, although users with system-wide roles can view and manage configuration
across multiple tenants.
5.5.3.2 Tenant Design
This design deploys a single tenant containing two business groups.
The first business group is designated for production workloads provisioning.
The second business group is designated for development workloads.
Service provider actors manage users and groups, configure tenant-specific branding,
notifications, business policies, and catalog offerings. All users log in to the vRealize Automation
console at the same URL, but the features available to them are determined by their roles.
The following diagram illustrates the single region tenant design.
Figure 27 Tenant Design for Single Region
Page 82 of 133 Copyright IBM and VMware
The following diagram illustrates the dual-region tenant design.
Figure 28 Tenant Design for Two Regions
The tenant has two business groups. A separate fabric group for each region is created. Each
business group can consume resources in both regions. Access to the default tenant is allowed
only by the system administrator and for the purposes of managing tenants and modifying system-
wide configurations.
The solution automatically configures vRealize Automation based on the requested deployment
type – single region or dual region.
5.5.3.3 Service Design
The service catalog provides a common interface for consumers of IT services to use to request
and manage the services and resources they need.
A service provider actor or service architect can specify information about the service catalog,
such as the service hours, support team, and change window.
The solution implements a service catalog provides the following services:
Central Cloud. Service catalog that is dedicated to the central cloud.
Additional Cloud Region. Service catalog that is dedicated to an additional cloud
region.
The solution is preinstalled with several catalog items. For a single site configuration, only the
central cloud service catalog is implemented.
Copyright IBM and VMware Page 83 of 133
5.5.3.4 Catalog Items
Users can browse the service catalog for catalog items they are entitled to request. Several generic
users will be automatically created and entitled to all items in the catalog. The users can be
disabled at a later stage or their permissions modified as appropriate.
For some catalog items, a request results in the provisioning of an item that the user can manage.
For example, the user can request a virtual machine with Windows 2012 preinstalled, and then
manage that virtual machine after it has been provisioned.
The service provider actor defines new catalog items and publish them to the service catalog. The
service provider actor can then manage the presentation of catalog items to the consumer and
entitle new items to consumers. To make the catalog item available to users, a service provider
actor must entitle the item to the users and groups who should have access to it.
A catalog item is defined in a blueprint, which provides a complete specification of the resource to
be provisioned and the process to initiate when the item is requested. It also defines the options
available to a requester of the item, such as virtual machine specifications or lease duration, or any
additional information that the requester is prompted to provide when submitting the request. The
blueprint also specifies custom properties that are applied to the requested resource.
5.5.3.5 Machine Blueprints
A machine blueprint is the complete specification for a virtual machine. A machine blueprint
determines the machine's attributes, how it is provisioned, and its policy and management settings.
Machine blueprints are published as catalog items in the service catalog.
Machine blueprints can be specific to a business group or shared among groups within a tenant. In
this design the preloaded machine blueprints are shared among business groups. Service provider
actors create shared blueprints that can be entitled to users in any business group within the tenant.
Business group managers create group blueprints that can only be entitled to users within a
specific business group. A business group manager cannot modify or delete shared blueprints.
Service provider actors cannot view or modify group blueprints unless they also have the business
group manager role for the appropriate group.
If a service provider actor sets a shared blueprint's properties so that it can be copied, the business
group manager can also copy the shared blueprint for use as a starting point to create a new group
blueprint.
5.5.3.6 Blueprint Design
The following sections provide details of each service definition that has been included as part of
the current phase of cloud platform deployment.
Table 41 Base Windows Server Blueprint
Service Name Description
Provisioning Method When users select this blueprint, vRealize Automation clones
a vSphere virtual machine template with preconfigured
vCenter customizations.
Entitlement Both Production and Development business group members.
Approval Process No approval (pre-approval assumed based on approved
access to platform)
Operating System and
Version Details
Windows Server 2012 R2
Configuration Disk: Single disk drive
Lease and Archival
Details
Lease:
Production Blueprints: No expiration date
Page 84 of 133 Copyright IBM and VMware
Service Name Description
Development Blueprints: Minimum 30 days –
Maximum 270 days
Archive: 15 days
Pre- and Post-
Deployment
Requirements
Email sent to manager confirming service request (include
description details)
Table 42 Base Windows Blueprint Sizing
vCPU Memory (GB) Storage (GB)
2 8 70
Table 43 Base Linux Server Blueprint
Service Name Description
Provisioning Method When users select this blueprint, vRealize Automation clones
a vSphere virtual machine template with preconfigured
vCenter customizations.
Entitlement Both Production and Development business group members
Approval Process No approval (pre-approval assumed based on approved
access to platform)
Operating System and
Version Details
Red Hat Enterprise Server 6
Configuration Disk: Single disk drive
Lease and Archival
Details
Lease:
Production Blueprints: No expiration date
Development Blueprints: Minimum 30 days –
Maximum 270 days
Archive: 15 days
Pre- and Post-
Deployment
Requirements
Email sent to manager confirming service request (include
description details)
Table 44 Base Linux Blueprint Sizing
vCPU Memory (GB) Storage (GB)
2 8 70
5.5.3.7 Branding
The solution branding is preconfigured. The cloud admin can change the appearance of the
vRealize Automation console to meet site-specific branding guidelines by changing the logo, the
background color, or information in the header and footer.
Copyright IBM and VMware Page 85 of 133
5.5.4 vRealize Automation vSphere Integration Design
The following terms apply to vRealize Automation integrated with vSphere. These terms and their meaning
may vary from the way they are used when referring only to vSphere.
Table 45 vRealize Integration with vSphere
Element Description
vSphere (vCenter Server)
endpoint
Provides information required by vRealize Automation IaaS to
access vSphere compute resources.
It requires the appropriate permissions for the vSphere proxy agent
to manage the vCenter Server instance.
Compute resource Virtual object within vRealize Automation that represents a vCenter
Server cluster or resource pool, and datastores or datastore clusters.
vRealize Automation provisions the virtual machines requested by
business group members on the compute resource.
Note: Compute resources are CPU, memory, storage and networks.
Datastores and datastore clusters are part of the overall storage
resources.
Fabric groups vRealize Automation IaaS organizes compute resources into fabric
groups.
Fabric administrators Fabric administrators manage compute resources, which are
organized into fabric groups.
Compute reservation A share of compute resources (vSphere cluster, resource pool,
datastores, or datastore clusters), such as CPU and memory reserved
for use by a particular business group for provisioning virtual
machines.
Note: vRealize Automation uses the term reservation to define
resources (be they memory, storage or networks) in a cluster. This is
different than the use of reservation in vCenter Server, where a share
is a percentage of total resources, and reservation is a fixed amount.
Storage reservation Similar to compute reservation (see above), but pertaining only to a
share of the available storage resources. In this context, a storage
reservation in terms of gigabytes is specified from an existing LUN
or Datastore.
Business groups A collection of virtual machine consumers, usually corresponding to
an organization's business units or departments. Only users in the
business group can request virtual machines.
Reservation policy vRealize Automation IaaS determines its reservation (also
called virtual reservation) from which a particular virtual machine is
provisioned. The reservation policy is a logical label or a pointer to
the original reservation. Each virtual reservation can be added to one
reservation policy.
Build profile A set of user defined properties a user is able to apply to a virtual
machine when it is provisioned. For example, the operating system
used in a blueprint, or the available networks to use for connectivity
at the time of provisioning the virtual machine.
Build profile properties determine the specification of the virtual
machine, the manner in which it is provisioned, operations to
perform after it is provisioned, or management information
maintained within vRealize Automation.
Page 86 of 133 Copyright IBM and VMware
Element Description
Blueprint The complete specification for a virtual machine, determining the
machine attributes, the manner in which it is provisioned, and its
policy and management settings.
Blueprint allows the users of a business group to create virtual
machines on a virtual reservation (compute resource) based on the
reservation policy, and using platform and cloning types. It also lets
a user specify or add machine resources and build profiles.
Copyright IBM and VMware Page 87 of 133
The following figure shows the logical design constructs discussed in the previous section as they apply to
the deployment of vRealize Automation integrated with vSphere in a single region design
Figure 29 vRealize Automation Integration with vSphere Endpoint – Central Cloud
Page 88 of 133 Copyright IBM and VMware
The following figure shows the logical design constructs discussed in the previous section as they apply to
the deployment of vRealize Automation integrated with vSphere in a dual region design
Figure 30 vRealize Automation Integration with vSphere Endpoint – Central Cloud and a Cloud Region (Region
A)
The solution automatically implements the design presented in the picture above for a dual site
configuration. When single site deployment is requested, the automation deploys only the central cloud
configuration.
Copyright IBM and VMware Page 89 of 133
5.5.5 Infrastructure Source Endpoints
An infrastructure source endpoint is a connection to the infrastructure that provides a set (or multiple sets)
of resources, which can then be made available by IaaS administrators for consumption by users. vRealize
Automation IaaS regularly collects information about known endpoint resources and the virtual resources
provisioned therein. Endpoint resources are referred to as compute resources (or as compute pods— the
terms are often used interchangeably).
Infrastructure data is collected through proxy agents that manage and communicate with the endpoint
resources. This information about the compute resources on each infrastructure endpoint and the machines
provisioned on each computer resource is collected at regular intervals.
During solution deployment the proxy agents and define their associated endpoints are configured
automatically.
5.5.6 Virtualization Compute Resources
A virtualization compute resource is a vRealize Automation object that represents an ESXi host or a cluster
of ESXi hosts (vSphere cluster). When a group member requests a virtual machine, the virtual machine is
provisioned on these compute resources. vRealize Automation regularly collects information about known
compute resources and the virtual machines provisioned on them through the proxy agents. Each region has
one compute cluster. The compute cluster is selected automatically during deployment.
5.5.6.1 Fabric Groups
A fabric group is a logical container of several compute resources, and can be managed by fabric
administrators. A fabric group for each region is created and it includes all the compute resources
and edge resources in that region.
5.5.6.2 Business Groups
A Business group is a collection of machine consumers (users), often corresponding to a line of
business, department, or other organizational unit. To request machines, a vRealize Automation
user must belong to at least one Business group. Each group has access to a set of local blueprints
used to request machines.
Business groups have the following characteristics.
A group must have at least one business group manager, who maintains blueprints for the
group and approves machine requests.
Groups can contain support users, who can request and manage machines on behalf of
other group members.
A vRealize Automation user can be a member of more than one Business group, and can
have different roles in each group.
Two business groups are created, one for production users and one for the development users.
5.5.6.3 Reservations
A reservation is a share of one compute resource's available memory, CPU and storage reserved
for use by a particular fabric group. Each reservation is for one fabric group only but the
relationship is many-to-many. A fabric group might have multiple reservations on one compute
resource or reservations on multiple compute resources or both. The solution implements only one
fabric group per region.
Each resource cluster has two reservations, one for production and one for development, allowing
both production and development workloads to be provisioned. An edge reservation in each region
is created and allows NSX to deploy edge services gateways on demand and place them on the
edge cluster.
Page 90 of 133 Copyright IBM and VMware
5.5.6.4 Reservation Policies
Each virtual reservation is added to one reservation policy. The reservation from which a
particular virtual machine is provisioned is determined by vRealize Automation based on the
reservation policy specified in the blueprint (if any), the priorities and current usage of the fabric
group's reservations, and other custom properties.
Two reservation policies are configured in each region, one for production and the other for
development. One edge reservation in each region is created for placement of the edge service
gateway.
5.5.6.5 Template Synchronization
In case of a single region, no template synchronization is made. Dual-region deployment allows
provisioning workloads across regions from the same portal using the same single-machine
blueprints.
vSphere Content Library is the synchronization mechanism for templates across regions this
design uses.
Figure 31 Template Synchronization
5.5.7 Process Orchestration
VMware vRealize Orchestrator is a development and process automation and orchestration platform that
provides a library of extensible workflows to allow a cloud admin to create and run automated,
configurable processes to manage the VMware vSphere infrastructure as well as other VMware and third-
party technologies.
5.5.7.1 Directory Services
vRealize Orchestrator instances will use Active Directory LDAP authentication. The only
configuration supported for multi-domain Active Directory is domain tree. Forest and external
trusts are not supported for process orchestration. Multiple domains that have two-way trust, but
are not in the same tree, are not supported and do not work with vRealize Orchestrator.
5.5.7.2 Network Ports
vRealize Orchestrator uses specific network ports to communicate with other systems. The ports
are configured with a default value, which is set by the automation at build time. It is
recommended that these values remain unchanged to ensure supportability of the system in the
future. Firewall ports within the solution will be opened to ensure communication to the
components and intra components. Firewalls not deployed by the solution need to be configured
appropriately.
Copyright IBM and VMware Page 91 of 133
Table 46 vRealize Orchestrator Default Configuration Ports
Port Number Protocol Source Target Description
HTTPS
Server port
8281 TCP End-user
external
system
vRealize
Orchestrator
server
The SSL
secured
HTTP
protocol used
to connect to
the vRealize
Orchestrator
REST API.
Web
configuration
HTTPS
access port
8283 TCP End-user
Web
browser
vRealize
Orchestrator
configuration
The SSL
access port
for the Web
UI for
vRealize
Orchestrator
configuration.
Table 47 vRealize Orchestrator Default External Communication Ports
Port Number Protocol Source Target Description
LDAP
using SSL
636 TCP vRealize
Orchestrator
server
LDAP
server
Lookup port of
the Active
Directory
server for
secure LDAP
authentication
server.
LDAP
using
Global
Catalog
3268 TCP vRealize
Orchestrator
server
Global
Catalog
server
Port to which
Microsoft
Global Catalog
server queries
are directed.
DNS 53 TCP vRealize
Orchestrator
server
DNS
server
Name
resolution
VMware
vCenter
Single
Sign-On
server
(PSC)
443 TCP vRealize
Orchestrator
server
vCenter
Single
Sign-On
server
Port used to
communicate
with the
vCenter Single
Sign-On server.
Page 92 of 133 Copyright IBM and VMware
Port Number Protocol Source Target Description
SQL
Server
1433 TCP vRealize
Orchestrator
server
Microsoft
SQL server
Port used to
communicate
with the
Microsoft SQL
Server or SQL
Server Express
instances that
are configured
as the vRealize
Orchestrator
database.
SMTP
Server port
25 TCP vRealize
Orchestrator
server
SMTP
Server
Port used for
notifications.
vCenter
Server API
port
443 TCP vRealize
Orchestrator
server
VMware
vCenter
server
The vCenter
Server API
communication
port used by
vRealize
Orchestrator to
obtain virtual
infrastructure
and virtual
machine
information
from the
orchestrated
vCenter Server
instances.
vCenter
Server
80 TCP vRealize
Orchestrator
server
vCenter
Server
Port used to
tunnel HTTPS
communication.
VMware
ESXi
443 TCP vRealize
Orchestrator
server
ESXi hosts Workflows
using the
vCenter Guest
Operations API
need direct
connection
between
vRealize
Orchestrator
and the ESXi
hosts the VM is
running on.
5.5.7.3 vRealize Orchestrator Deployment
Two vRealize Orchestrator appliance instances are required within this solution with 2 CPUs, 4
GB memory, and 16 GB of hard disk each. This solution uses MSSQL database already installed
Copyright IBM and VMware Page 93 of 133
to support other components. In cluster mode, multiple vRealize Orchestrator instances with
identical server and plug-in configurations work together as a cluster, and share a single
database. The instances are installed behind a load balancer. Although there are 2 instances for
availability, the failover required in a disaster recovery scenario is manual. Please refer to the
vRealize user guide for the process to manually failover these components.
All vRealize Orchestrator server instances communicate with each other by exchanging heartbeats
at a certain time interval. Only active vRealize Orchestrator server instances respond
to client requests and run workflows. If an active vRealize Orchestrator server instance fails to
send heartbeats, it is considered to be non-responsive, and one of the inactive instances takes over
to resume all workflows from the point at which they were interrupted. The heartbeat is
implemented through the shared database, so there are no implications in the network design for a
vRealize Orchestrator cluster. If there are more than one active vRealize Orchestrator nodes in a
cluster, concurrency problems can occur if different users use the different vRealize Orchestrator
nodes to modify the same resource. The implementation uses an active-active cluster.
The following tables outline characteristics for this vRealize Orchestrator active-active cluster
design
Table 48 vRO Service Monitor Specifications
Monitor Interval Timeout Retries Type Send String Receive
String
vco-https-
8281
3
9 3 HTTPS
(443)
GET
/vco/api/status\r\n
REGISTERED
Table 49 vRO Service Pool Characteristics
Pool Name Algorithm Monitors Members Port Monitor
Port
vco-pool Leastconn vco-https-
8281
vRealize
Orchestrator
nodes
8281 8281
Table 50 vRO Virtual Server Characteristics
Name Type Service Port Source Address
Translation
Default Pool
Name
vco-lb-8281 Performance
(Layer 4)
8281 Automap vco-pool
5.5.7.4 SSL Certificates
The vRealize Orchestrator configuration interface uses a secure connection to communicate with
vCenter Server, relational database management systems (RDBMS), LDAP, vCenter Single Sign-
On, and other servers. The required SSL certificates are generated by the certification authority
deployed within the solution.
5.5.7.5 vRealize Orchestrator Plug-Ins
Plug-ins allow vRealize Orchestrator to access and control external technologies and applications.
Exposing an external technology in a vRealize Orchestrator plug-in allows incorporating objects
and functions in workflows that access the objects and functions of the external technology. The
external technologies that can be accessed using plug-ins include virtualization management tools,
email systems, databases, directory services, and remote control interfaces. vRealize Orchestrator
provides a set of standard plug-ins.
The following plug-ins are configured in this design:
Page 94 of 133 Copyright IBM and VMware
vRealize Orchestrator NSX plug-in
vRealize Orchestrator vRealize Automation plug-in
vRealize Orchestrator vCenter Server plug-in
5.5.7.5.1 Multi-node plugin
vRealize Orchestrator comes as a single-site topology product. The multi-node plug-in creates a
primary-secondary relation between vRealize Orchestrator servers that extends the package
management and workflow execution features. This is only enabled when deploying a multi-
region topology. The plug-in contains a set of standard workflows for hierarchical orchestration,
management of vRealize Orchestrator instances, and the scale-out of vRealize Orchestrator
activities.
5.5.7.5.2 vRealize Orchestrator Client
The vRealize Orchestrator client is a desktop application that lets users import packages, create,
run, and schedule workflows, and manage user permissions.
vRealize Orchestrator Client can be installed standalone on a desktop system. Download the
vRealize Orchestrator Client installation files from the vRealize Orchestrator appliance
page: https://vRO_hostname:8281. Alternatively, vRealize Orchestrator Client can be run
using Java WebStart directly from the homepage of the vRealize Orchestrator appliance console.
5.5.7.5.3 vRealize Orchestrator Scalability
A single vRealize Orchestrator instance allows up to 300 concurrent workflow instances in the
running state. Workflow instances that are in the waiting or waiting-event states do not count
toward that number. You can design long running workflows in a way that preserves resources by
using the wait elements of the workflow palette. A single vRealize Orchestrator instance supports
up to 35,000 managed virtual machines in its inventory.
This architecture depicts a clustered vRealize Orchestrator environment. In a clustered
environment, workflows cannot be changed while other vRealize Orchestrator instances are
running. Stop all other vRealize Orchestrator instances before connecting the vRealize
Orchestrator client and changing or developing a new workflow. Failure to do so will result in
inconsistencies within the environment.
This architecture scale’s out a vRealize Orchestrator environment by having multiple independent
vRealize Orchestrator instances (each with their own database instance). This allows for the
increase in the number of managed inventory objects.
This solution implements an active-active cluster with two nodes.
5.5.8 Software Orchestration
This solution provides a centralized repository for software binaries and software orchestration templates
that are implemented on deployed resources. Main software orchestration engines installed are Chef Server
and Salt Stack. Each region hosts its own dedicated repository server and software orchestration stack.
Software binaries are replicated between regions using rsync tool. Software orchestration components use
internal specific replication mechanisms.
Copyright IBM and VMware Page 95 of 133
The following diagram shows a high level view of the software orchestration components.
Figure 32 Software Orchestration Logical Design
A single region design implements a central orchestration and a region orchestration.
5.5.8.1 Central Software Orchestration
Central software orchestration has no direct interaction with the resources, but provides a central
management point for maintaining the latest binaries and templates by the cloud administrator.
It is responsible for,
Maintaining the latest versions of software binaries
Maintaining the latest versions of software orchestration templates
Keeping the regions up to date with the latest software versions and templates
Inputs and Outputs
A central component pushes software binaries to the region component.
A Central component is accessed by the cloud administrator to update software and templates.
Design Rationale
In a hybrid cloud there may be many regions each with their own software and template
requirements. Plus, to avoid delays during deployment, the software binaries need to be as close to
the resource cluster as possible. This drives the need for a distributed repository of software
binaries. Maintaining these separately and keeping software versions on each up to date with the
latest versions can add considerable support cost. A central repository can regular or on change
update each of these regions without the need to update each one individually.
Implementation Approach
The central software orchestration components are co-located with the other central services to
minimize network access configuration for administrator access.
Other Functions
The central repository is also used as the backup location for the NSX Managers in the central
cloud.
5.5.8.2 Region Software Orchestration
Responsibilities
The region component provides the software orchestration engine that implements software to
deployed resources using defined software orchestration templates. The implementation includes
mounting the software installation media (software binaries), then installing and configuring the
software. The implementation may also involve other actions to configure the deployed resource
to the software orchestration template requirements such as setting password rules.
Page 96 of 133 Copyright IBM and VMware
It is responsible for,
being a repository for software binaries
being a repository for software orchestration templates
providing software orchestration engine to implement software on deployed resources
Inputs and Outputs
A region pushes software and configurations to deployed resources.
A region is called from a cloud region to deploy the software.
Design Rationale
A cloud needs to be able to implement software on deployed resources. This requires software
binaries and software orchestration templates (patterns). The region is to provide these functions.
It is separate to the cloud region in order to allow for different flavors without impacting the cloud
region design.
Implementation Approach
The region should be co-located with the other region services to minimize network traffic and
response times when deploying software to resources.
Other Functions
The region repository is also used as the backup location for the NSX Managers in the cloud
region.
5.5.8.3 Software Orchestration Components Sizing
The following table presents sizing for central and region
Table 51 Software Orchestration Components Sizing
Server Role vCPU RAM (GB) Disk (GB)
Central repository and
Chef server
2 8 300
Central Salt master 2 8 100
Region repository and
Chef server
4 16 300
Region Salt master 4 16 100
5.5.9 Infrastructure Orchestration
Infrastructure orchestration is handled by vRealize Orchestration and vRealize Automation which is
covered in the prior sections.
5.6 Operational Services
Operational services provide the services to enable the support and maintenance of the cloud management
platform. This design does not include operational services for user related virtual resources.
5.6.1 Backup and Restore
Data backup protects the data of this design against data loss, hardware failure, accidental deletion, or other
disaster for each region. For consistent image-level backups, this design uses backup software based upon
the VMware Virtual Disk Development Kit (VDDK), such as vSphere Data Protection (VDP). For this
design, VDP is used to back up the infrastructure VMs with the exception of the NSX components, which
have their own backup and recovery procedure documented in the networking section.
Copyright IBM and VMware Page 97 of 133
5.6.1.1 Logical Design
vSphere Data Protection protects the virtual infrastructure at the VMware vCenter Server layer.
Because vSphere Data Protection is connected to the Management vCenter Server, it can access all
management ESXi hosts, and can detect the virtual machines that require backups.
Figure 33 vSphere Data Protection Logical Design
5.6.1.2 Backup Datastore
vSphere Data Protection uses deduplication technology to back up virtual environments at data
block level, which enables efficient disk utilization. To optimize backups and leverage the
VMware vSphere Storage APIs, all ESXi hosts must have access to the NFS datastore. The
backup datastore stores all the data that is required to recover services according to a Recovery
Point Objective (RPO).
5.6.1.3 Performance
vSphere Data Protection generates a significant amount of I/O operations, especially when
performing multiple concurrent backups. The storage platform must be able to handle this I/O. If
the storage platform does not meet the performance requirements, it might miss backup windows.
Backup failures and error messages might occur. Run the vSphere Data Protection performance
analysis feature during virtual appliance deployment or after deployment to assess performance.
For this design a dedicated volume on performance storage in SoftLayer is used as a backup
target. The backup volume is NFS mounted to all ESXi hosts in the management cluster.
Table 52 VMware vSphere Data Protection Performance
Total Backup Size Avg Mbps in 4 hours
0.5 TB 306 Mbps
1 TB 611 Mbps
2 TB 1223 Mbps
Page 98 of 133 Copyright IBM and VMware
5.6.1.4 Volume Sizing
vSphere Data Protection can dynamically expand the destination backup store from 2 TB to 8 TB.
Using an extended backup storage requires additional memory on the vSphere Data Protection
appliance. For this design, an initial fixed NFS datastore size will be utilized. Additional NFS
space can be provisioned via the SoftLayer customer portal as required. Note that an existing
SoftLayer Endurance NFS storage allocation cannot be expanded once provisioned. If additional
storage is required, then a new export must be requested and migrated to.
For this design, 4TB of storage is initially sized with a 4 CPU and 12GB of memory for the vDP
appliance virtual machine for each region.
5.6.1.5 Other Considerations
vSphere Data Protection can protect virtual machines that reside on VMware Virtual SAN from
host failures. The virtual machine storage policy is not backed up with the virtual machine, but a
user is able to restore the storage policy after restoring the virtual machine.
Note: The default Virtual SAN storage policy is configured and includes Number Of
Failures To Tolerate = 1, which means that virtual machine data will be mirrored.
vSphere Data Protection is used to restore virtual machines that failed or need their data reverted
to a previous state.
The FTP server required for backing up NSX Manager will reside on the RDS central repository
server. Backups for NSX Manager setup with a daily schedule by the NSX Manager.
5.6.1.6 Backup Policies
Use vSphere Data Protection backup policies to specify virtual machine backup options, the
schedule window, and retention policies.
5.6.1.7 Virtual Machine Backup Options
vSphere Data Protection provides the following options for performing a backup of a virtual
machine:
Hot Add. Provides full image backups of virtual machines, regardless of the guest operating
system.
o The virtual machine base disk is attached directly to vSphere Data Protection to back
up data. vSphere Data Protection uses Changed Block Tracking to detect and back
up blocks that are altered.
o The backup and restore performance is faster because the data flow is through the
VMkernel layer instead of over a network connection.
o A quiesced snapshot can be used to redirect the I/O of a virtual machine disk .vmdk
file.
o Hot Add does not work in multi-writer disk mode.
Network Block Device (NBD). Transfers virtual machine data across the network to allow
vSphere Data Protection to back up the data.
o The performance of the virtual machine network traffic might be lower.
o NBD takes a quiesced snapshot. As a result, it might interrupt the I/O operations of
the virtual machine to swap the .vmdk file or consolidate the data after the backup
is complete.
o The time to complete the virtual machine backup might be longer than the backup
window.
o NBD does not work in multi-writer disk mode.
vSphere Data Protection Agent Inside Guest OS. Provides backup of certain applications
that are running in the guest operating system through an installed backup agent.
Copyright IBM and VMware Page 99 of 133
o Enables application-consistent backup and recovery with Microsoft SQL Server,
Microsoft SharePoint, and Microsoft Exchange support.
o Provides more granularity and flexibility to restore on the file level.
For this design, Hot Add will be used for all backup policies with the exception of the
infrastructure SQL server backups with will utilize the Guest OS Agent, in addition to local SQL
backups to disk.
5.6.1.8 Schedule Window
Even though vSphere Data Protection uses the Changed Block Tracking technology to optimize
the backup data, do not use a backup window when the production storage is in high demand to
avoid any business impact.
Warning: Do not perform any backup or other administrative activities during the vSphere Data
Protection maintenance window. Restore operations are allowed. By default, the vSphere Data
Protection maintenance window begins at 8 AM local server time and continues uninterrupted
until 8 PM or until the backup jobs are complete. Configure maintenance windows according to IT
organizational policy requirements.
The backup window is set to default and can be modified, based on customer requirements.
5.6.1.9 Retention Policies
Retention policies are properties of a backup job. If virtual machines are grouped by business
priority, it is possible to set the retention requirements according to the business priority.
This design requires that a dedicated NFS datastore be allocated for the vSphere Data Protection
appliance for the backup data in each region. The NFS datastore is allocated from IBM SoftLayer
Performance storage option with an initial allocation of 4TB.
5.6.1.10 Component Backup Jobs
Backups are configured for each of this design’s management components separately. No
requirement to back up the entire design exists, and this design does not imply such an operation.
Some products can perform internal configuration backups.
Separate from this, NSX is configured to backup to the FTP server within the design. The SQL
server will also be configured for local backups to disk on a daily basis.
5.6.1.11 Backup Jobs in Central Cloud
If multiple regions are deployed, create a single backup job for the components of a management
application according to the node configuration of the application in central cloud.
Table 53 Backup Jobs in Central Cloud
Product Image VM Backup Jobs in Central Cloud Application VM
Backup Jobs in
Central Cloud
ESXi N/A - No Backup
Platform
Services
Controller
Part of the vCenter Server backup job
Page 100 of 133 Copyright IBM and VMware
vCenter Server Management Job
mgmt01vc01.central.tornado.loc
al
mgmt01psc01.central.tornado.lo
cal
Compute Job
comp01vc01.central.tornado.loc
al
comp01psc01.central.tornado.lo
cal
vRealize
Automation
vra01vro01a.tornado.local
vra01vro01b.tornado.local
vra01dem01.tornado.local
vra01dem02.tornado.local
vra01ias01.tornado.local
vra01ias02.tornado.local
vra01ims01a.tornado.local
vra01ims01b.tornado.local
vra01iws01a.tornado.local
vra01iws01b.tornado.local
vra01svr01a.tornado.local
vra01svr01b.tornado.local
vra01mssql01.tornado.local
vra01ids01a.tornado.local
vra01mssql01.torna
do.local
vRealize Log
Insight
vrli-mstr-01
vrli-wrkr-01
vrli-wrkr-02
vRealize
Operations
Manager
vrops-mstrn-01
vrops-repln-02
vrops-datan-03
vrops-datan-04
vrops-rmtcol-01
vrops-rmtcol-02
vRealize
Orchestrator
Part of the vRealize
Automation backup job
Copyright IBM and VMware Page 101 of 133
5.6.1.12 Backup Jobs in Additional Cloud Region
Create a single backup job for the components of a management application according to
the node configuration of the application in an additional cloud region.
Table 54 Backup Jobs in Additional Cloud Region
Product Image VM Backups in Additional Cloud
Region
Application VM
Backup Jobs in
Additional Cloud
Region
ESXi N/A - No Backup
None
Platform
Services
Controller
Part of the vCenter Server backup job
vCenter Server Management Job
mgmt01vc51.regiona.tornado.local
mgmt01psc51.regiona.tornado.local
Compute Job
comp01vc51.regiona.tornado.local
comp01psc51.regiona.tornado.local
NSX for
vSphere Management Job
mgmt01nsxm51.regiona.tornado.local
Compute Job
comp01nsxm51.regiona.tornado.local
vRealize
Automation
vra01ias51.tornado.local
vra01ias52.tornado.local
vRealize Log
Insight
vrli-mstr-51
vrli-wrkr-51
vrli-wrkr-52
vRealize
Operations
Manager
vrops-rmtcol-51
vrops-rmtcol-52
vRealize
Orchestrator
Part of the vRealize Automation backup job
5.6.2 Disaster Recovery
A Site Recovery Manager instance is required for both the protected region and the recovery region. Site
Recovery Manager is installed after the installation and configuration of vCenter Server and the Platform
Services Controller in the region. Site Recovery Manager takes advantage of vCenter Server and Platform
Services Controller services such as storage management, authentication, authorization, and guest
Page 102 of 133 Copyright IBM and VMware
customization. Site Recovery Manager uses the standard set of vSphere administrative tools to manage
these services. Site Recovery Manager is installed on a dedicated window host VM within each region the
design.
5.6.2.1 Networking Design for Disaster Recovery
Moving a service physically from one region to another represents a networking challenge,
especially if applications have hard-coded IP addresses. Network address space and IP address
assignment considerations require that either the same IP address or a different IP address at the
recovery region is used. In many situations, new IP addresses are assigned, because VLANs do
not typically stretch between regions.
While protecting the management applications, it is possible to simplify the problem of IP address
assignment. This design leverages a load balancer to separate a public network segment and a
private network segment. The private network can remain unchanged and only the external load
balancer interface has to be reassigned.
On the public network segment, the management application is accessible via one or more virtual
IP (VIP) addresses
On the isolated private network segment, the application's virtual machines are isolated
After a failover, the recovered application is available under a different IPv4 address (VIP). The
use of the new IP address requires changes to the DNS records. DNS records are either changed
manually or by using a script in the Site Recovery Manager recovery plan.
Figure 34 Logical Network Design for Cross-Region Deployment with Management Application
Network Container
The IPv4 subnets (orange networks) are routed within the vSphere management network of each
Copyright IBM and VMware Page 103 of 133
region. Nodes on these network segments are reachable from within the SDDC. IPv4 subnets, such
as the subnet for the vRealize Automation primary components, overlap across a region. Make
sure that only the active IPv4 subnet is propagated in the region and beyond. The public facing
Ext-Mgmt network of both regions (grey networks) are reachable by users and provide connection
to external resources, such as Active Directory or DNS.
Load balancing functionality is provided by NSX Edge services gateways. In each region, the
same configuration for the management applications and their Site Recovery Manager shadow will
be used. Active Directory and DNS services must be running in both the protected and recovery
regions.
5.6.2.2 vSphere Replication
In a VMware Virtual SAN environment, array-based replication cannot be used. This design
utilizes vSphere Replication instead to transfer VMs between regions.
Within the design, vSphere Replication uses a VMkernel management interface on the ESXi host
to send replication traffic to the replication site's vSphere Replication appliance. To isolate
vSphere Replication traffic so that it does not impact other vSphere management traffic, the
vSphere Replication network is configured in the following way.
vSphere Replication appliances and vSphere Replication servers are the target for the replication
traffic that originates from the vSphere Replication VMkernel ports.
5.6.2.2.1 Placeholder Virtual Machines
Site Recovery Manager creates a placeholder virtual machine on the recovery region for every
machine from the Site Recovery Manager protection group. Placeholder virtual machine files
contain virtual machine configuration metadata but not virtual machine disks, and the files are
very small. Site Recovery Manager adds the placeholder virtual machines as recovery region
objects in the Management vCenter Server.
5.6.2.2.2 Snapshot Space
To perform failover tests, additional storage is provided for the snapshots of the replicated VMs.
This storage is minimal in the beginning, but grows as test VMs write to their disks. Replication
from the protected region to the recovery region continues during this time. The snapshots created
during testing are deleted after the failover test is complete.
5.6.2.2.3 Messages and Commands for Site Recovery Manager
Site Recovery Manager has options to present users with messages that provide notification and
accept acknowledgement. Site Recovery Manager also provides a mechanism to run commands
and scripts as necessary when executing a recovery plan. Pre-power-on or post-power-on
messages and commands can be inserted to the recovery plans. These messages and commands are
not specific to Site Recovery Manager, but support pausing the execution of the recovery plan to
complete other procedures, or executing customer-specific commands or scripts to enable
automation of recovery tasks.
5.6.2.2.4 Site Recovery Manager Messages
Additional steps are required to be configured when building more than the central cloud (i.e.
adding cloud regions). For example, the environment should be setup such that a message appears
when a recovery plan is initiated, and that the cloud admin must acknowledge the message before
the recovery plan continues. Messages are specific to each IT organization.
Consider the following example messages and confirmation steps:
Verify that IP address changes are made on the DNS server and that the changes are
propagated.
Page 104 of 133 Copyright IBM and VMware
Verify that the Active Directory services are available.
After the management applications are recovered, perform application tests to verify that
the applications are recovered correctly.
Additionally, confirmation steps can be inserted after every group of services that have a
dependency on other services. These conformations can be used to pause the recovery plan so that
appropriate verification and testing be performed before subsequent steps are taken. These
services are defined as follows:
Infrastructure services
Core services
Database services
Middleware services
Application services
Web services
Details on each message are specified in the workflow definition of the individual recovery plan.
5.6.2.2.5 Site Recovery Manager Commands
In this initial phase of the design, custom scripts are out of scope. However, custom scripts can
be running to perform infrastructure configuration updates or configuration changes on the virtual
machine environment. The scripts that a recovery plan executes are located on the Site Recovery
Manager server. The scripts can run against the Site Recovery Manager server or can impact a
virtual machine.
If a script must run in the virtual machine, Site Recovery Manager does not run it directly, but
instructs the virtual machine to do it. The audit trail that Site Recovery Manager provides does not
record the execution of the script because the operation is on the target virtual machine.
Scripts or commands must be available in the path on the virtual machine according to the
following guidelines:
Use full paths to all executables. For example, c:\windows\system32\cmd.exe
instead of cmd.exe.
Call only.exe or .com files from the scripts. Command-line scripts can call only
executables.
To run a batch file, start the shell command with
c:\windows\system32\cmd.exe.
The scripts that are run after powering on a virtual machine are executed under the Local Security
Authority of the Site Recovery Manager server. Store post-power-on scripts on the Site Recovery
Manager virtual machine. Do not store such scripts on a remote network share.
5.6.2.3 Recovery Plans for Site Recovery Manager
A recovery plan is the automated plan (runbook) for full or partial failover from the central cloud
to a cloud region.
5.6.2.3.1 Startup Order and Response Time
Virtual machine priority determines virtual machine startup order.
Copyright IBM and VMware Page 105 of 133
All priority 1 virtual machines are started before priority 2 virtual machines.
All priority 2 virtual machines are started before priority 3 virtual machines.
All priority 3 virtual machines are started before priority 4 virtual machines.
All priority 4 virtual machines are started before priority 5 virtual machines.
Additionally, a startup order of virtual machines within each priority group can be set.
The following timeout parameters are set:
Response time, which defines the time to wait after the first virtual machine powers on
before proceeding to the next virtual machine in the plan.
Maximum time to wait if the virtual machine fails to power on before proceeding to the
next virtual machine.
The response time values can be adjusted as necessary during execution of the recovery plan test
to determine the appropriate response time values.
5.6.2.3.2 Recovery Plan Test Network
When a recovery plan is created, the test network options must be configured. The following
options are available.
Isolated Network (Automatically Created). An isolated private network is created
automatically on each ESXi host in the cluster for a virtual machine that is being
recovered. Site Recovery Manager creates a standard switch and a port group on it.
A limitation of this automatic configuration is that a virtual machine connected to the
isolated port group on one ESXi host cannot communicate with a virtual machine on
another ESXi host. This option limits testing scenarios and provides an isolated test
network only for basic virtual machine testing.
Port Group. Selecting an existing port group provides a more granular configuration to
meet the client’s testing requirements. For virtual machines across ESXi hosts to
communicate, distributed switches with uplinks to the production network are used and a
port group is created on the switch that is tagged with a non-routable VLAN. In this way,
the network is isolated and cannot communicate with other production networks.
Because the isolated application networks are fronted by a load balancer, the recovery plan test
network is equal to the recovery plan production network and provides realistic verification of a
recovered management application.
5.6.2.3.3 Sizing
For each region, 1 SRM server and one vSphere Replication appliance are deployed. The SRM
application is installed on a Windows 2012R2 virtual machine and utilizes the built in PostgreSQL
DB. The vSphere Replication application is deployed as a virtual appliance.
Table 55 SRM Windows server sizing
Attribute Specification
VM size Medium
Number of CPUs 4
Memory 8 GB
Disk size 60 GB
Page 106 of 133 Copyright IBM and VMware
Table 56 vSphere Replication Appliance
Attribute Specification
VM size Medium
Number of CPUs 2
Memory 4 GB
Disk size 18G
5.6.3 Monitoring
Monitoring and Operations Management is a required element of a software-defined datacenter. Monitoring
operations support in vRealize Operations Manager provides capabilities for performance and capacity
management of related infrastructure and cloud management components.
5.6.3.1 Single-Region Logical Design
In this single-region design, vRealize Operations Manager is deployed with the following
configuration:
4-node (large-size) vRealize Operations Manager analytics cluster that is highly available
(HA). This topology provides high availability, scale-out capacity up to eight nodes, and
failover.
2-node (large-size) remote collector cluster. The remote collectors communicate directly
with the data nodes in the vRealize Operations Manager analytics cluster. For a multi-
region design, deploy two remote collectors in each region.
Note that a single region has its own remote collectors whose role is to ease scalability by
performing the data collection from the applications that are not a subject of failover and
periodically sending collected data to the analytics cluster. In cases of multiple regions, the
analytics cluster can be failed over because the analytics cluster is the construct that analyzes and
stores monitoring data. The multi-region configuration supports failover of the analytics cluster by
using Site Recovery Manager. In the event of a disaster, Site Recovery Manager migrates the
analytics cluster nodes to the failover region.
Copyright IBM and VMware Page 107 of 133
Figure 35 Logical Design of vRealize Operations Manager Central Cloud and a Cloud Region (Region
A) Deployment
5.6.3.2 Physical Design
The vRealize Operations Manager nodes run on the management cluster in each region of this
design.
5.6.3.3 Data Sources
vRealize Operations Manager collects data from the following virtual infrastructure and cloud
management components:
Management vCenter Server
o Platform Services Controller
o vCenter Server
Compute vCenter Server
o Platform Services Controller
o vCenter Server
Management, Edge and Compute ESXi hosts
NSX for vSphere for the management and compute clusters
o NSX Manager
o NSX Controller Instances
o NSX Edge instances
vRealize Automation
o vRealize Orchestrator
o vRealize Automation Components
vRealize Log Insight
vRealize Operations Manager (Self Health Monitoring)
5.6.3.4 vRealize Operations Manager Nodes
The analytics cluster of the vRealize Operations Manager deployment contains the nodes that
analyze and store data from the monitored components.
5.6.3.4.1 Compute for vRealize Operations Manager Nodes
The four-node vRealize Operations Manager analytics is deployed in the management cluster with
an application virtual network. The analytics cluster consists of one master node, one master
replica node, and two data nodes to enable scale out and high availability. The size of the vRealize
Operations Manager analytics cluster is sized according to VMware KB article 2130551 "vRealize
Operations Manager 6.1 Sizing Guidelines" and includes the following management packs:
Management Pack for VMware vCenter Server (installed by default)
Management Pack for NSX for vSphere
Management Pack for Storage Devices
Management Pack for vRealize Log Insight
Management Pack for vRealize Automation
Note that each node in the analytics cluster will be sized identically to support scale-out, high
availability, and design guidance for central cloud or cloud region infrastructure. As a result, the
configuration for each node is as follows:
Table 57 Analytics Cluster Node Configurations
Node vCPU Memory
Master 16 48 GB
Master Replica 16 48 GB
Page 108 of 133 Copyright IBM and VMware
Data 16 48 GB
Data 16 48 GB
5.6.3.4.2 Storage for vRealize Operations Manager Nodes
Each vRealize Operations Manager node in this design require 266GB of free space for data. To
collect the required number of metrics, a 1 TB VMDK will be added to each analytics cluster
node.
5.6.3.4.3 Network for vRealize Operations Manager Nodes
In this design, the clusters of vRealize Operations Manager will be placed in application isolated
networks for secure access, load balancing, portability, and functionality-specific subnet
allocation.
5.6.3.4.4 High Availability for vRealize Operations Manager Nodes
To protect the vRealize Operations Manager virtual machines from a host-level failure, this design
configures vSphere DRS to run the virtual machines for the analytics cluster and for the remote
collectors on different hosts in the management cluster.
Table 58 DRS Cluster Anti-Affinity Rule for vRealize Operations Manager Nodes
Rule Attribute Value
Name vropscluster-antiaffinity-rule
Enable rule Yes
Type Separate Virtual Machines
Members vrops master node
vrops master replica node
vrops data node 1
vrops-data node 2
5.6.3.5 vRealize Operations Remote Collector Nodes
Unlike the analytics cluster nodes, remote collector nodes have only the collector role. Deploying
two remote collector nodes in each region does not increase the number of monitored objects, but
removes the load from the analytics cluster by collecting metrics from applications that do not fail
over between regions in a multi-region environment. This means that collectors are assigned when
configuring the monitoring solution.
5.6.3.5.1 Compute for vRealize Operations Remote Collector Nodes
In this design, the remote collector nodes are deployed on the management cluster. The nodes are
identically sized and consist of the following compute resources in each region:
Table 59 Remote Collector Node Sizes
Node vCPU Memory
Remote Collector 1 4 16 GB
Remote Collector 2 4 16 GB
Copyright IBM and VMware Page 109 of 133
5.6.3.5.2 Storage for Remote Collector Nodes
In this design the remote collector nodes are with thin-provisioned disks. Because remote
collectors do not perform analytics operations or store data, the default VMDK size is sufficient
for this design.
5.6.3.5.3 High Availability for vRealize Operations Manager Nodes
To protect the vRealize Operations Manager virtual machines from a host-level failure, this design
configures vSphere DRS to run the virtual machines for the analytics cluster and for the remote
collectors on different hosts in the management cluster.
Table 60 DRS Cluster Anti-Affinity Rule for vRealize Operations Remote Collector Nodes
Rule Attribute Value
Name vropscollector-antiaffinity-rule
Enable rule Yes
Type Separate Virtual Machines
Members vrops remote collector 1
vrops remote collector 2
5.6.3.6 Networking for vRealize Operations Manager
In this design, the clusters of vRealize Operations Manager will be placed in application isolated
networks for secure access, load balancing, portability, and functionality-specific subnet
allocation.
Page 110 of 133 Copyright IBM and VMware
Figure 36 Networking Design of the vRealize Operations Manager Deployment
5.6.3.6.1 Application Isolated Network Design
In the single-region design, the two logical entities of the vRealize Operations Manager
deployment, the analytics cluster and remote collectors in the central cloud is installed in an
application isolated network. For multi-region, a third entity (i.e., remote collectors in an
additional cloud region) is also deployed on an application isolated network. As part of this
configuration, an NSX Edge services gateway will be placed in front of the application isolated
network to provide routing and load balancing. This networking design contains the following
features:
Copyright IBM and VMware Page 111 of 133
Each application virtual network of vRealize Operations Manager has connection to the
application virtual networks of vRealize Automation and vRealize Log Insight through a
dedicated network calle
Page 112 of 133 Copyright IBM and VMware
5.6.3.6.3 DNS Names
vRealize Operations Manager node name resolution uses a region-specific suffix, such as
central.tornado.local or regiona.tornado.local, the analytics nodes IP
addresses and the load balancer virtual IP address (VIP) are mapped to the root domain suffix
tornado.local. Access from the public network is provided through a VIP, the traffic to
which is handled by the NSX Edge service gateway. The following table shows example names
that can be used in a single- and multi-region design.
Table 62 DNS Names for the Application Virtual Networks
vRealize Operations Manager DNS
Name
Node Type
vrops-cluster-01.tornado.local Virtual IP of the analytics cluster
vrops-mstrn-01.tornado.local Master node in the analytics cluster
vrops-repln-02.tornado.local Master replica node in the analytics cluster
vrops-datan-03.tornado.local First data node in the analytics cluster
vrops-datan-04.tornado.local Second data node in the analytics cluster
vrops-rmtcol-01.central.tornado.local First remote collector node in the central cloud
vrops-rmtcol-02.central.tornado.local Second remote collector node in the central
cloud
vrops-rmtcol-51.regiona.tornado.local First remote collector node in any additional
cloud region
vrops-rmtcol-52.regiona.tornado.local Second remote collector node in any additional
cloud region
5.6.3.6.4 Networking for Failover and Load Balancing
Each node in the vRealize Operations Manager analytics cluster runs a Tomcat server instance for
access to the product user interface. By default, vRealize Operations Manager does not provide a
solution for load-balanced UI users’ sessions across nodes in the cluster. The lack of load
balancing for users’ sessions results in the following limitations:
Cloud admin must know the URL of each node to access the UI. As a result, a single node
might be overloaded if all cloud admin actors access it at the same time.
Each node supports up to four simultaneous cloud admin sessions.
Taking a node offline for maintenance might cause an outage. Cloud admin’s cannot access
the UI of the node when the node is offline.
To avoid such problems, analytics cluster is placed behind an NSX load balancer that is
configured to allow up to four connections per node. The load balancer must distribute the load
evenly to all cluster nodes. In addition, the load balancer is configured to redirect service requests
from the UI on port 80 to port 443. Load balancing and access to and from the public network is
not required for the remote collector nodes.
5.6.3.7 Security and Authentication
vRealize Operations Manager can use several sources for authentication. These sources include an
Active Directory service, vCenter Server, and local user inventory. Active Directory is used as the
primary authentication and authorization method in this design.
Copyright IBM and VMware Page 113 of 133
5.6.3.8 Identity Sources
The cloud admin will authenticate in vRealize Operations Manager using Active Directory
authentication. This provides access to vRealize Operations Manager by using standard Active
Directory accounts and ensures that authentication is available even if vCenter Server becomes
unavailable
5.6.3.9 Encryption
Access to all vRealize Operations Manager Web interfaces requires an SSL connection. By
default, vRealize Operations Manager uses a self-signed certificate. In this design, the default self-
signed certificates is replaced with a Certificate Authority (CA) signed certificate to provide
secure access to the vRealize Operations Manager user interface.
5.6.3.10 Monitoring and Alerting
vRealize Operations Manager can monitor itself and display the following administrative alerts:
System alert. A component of the vRealize Operations Manager application has failed.
Environment alert. vRealize Operations Manager has stopped receiving data from one or
more resources. Such an alert might indicate a problem with system resources or network
infrastructure.
Log Insight log event. The infrastructure on which vRealize Operations Manager is running
has low-level issues. You can also use the log events for root cause analysis.
Custom dashboard. vRealize Operations Manager can show super metrics for datacenter
monitoring, capacity trends and single pane of glass overview.
In order to enable the aforementioned alerts and events, the defaults must be configured in
vRealize Operations Manager to enable SMTP outbound alerts. An SMTP service is included in
the design which utilizes the SoftLayer SMTP service if no client on premises SMTP service is
provided. If the client has an existing SMTP service, then this will be configured. Additionally, the
design will incorporate deeper root cause analysis and infrastructure alerting by containing the
management pack for vRealize Log Insight.
5.6.3.11 Management Packs
This design contains several VMware products for network, storage, and cloud management. In
order to monitor and perform diagnostics on all these items, the following management packs are
used:
Management Pack for VMware vCenter Server (installed by default)
Management Pack for NSX for vSphere
Management Pack for Storage Devices
Management Pack for vRealize Log Insight
Management Pack for vRealize Automation
Page 114 of 133 Copyright IBM and VMware
5.6.4 Log Consolidation and Analysis
In each region of this design, a vRealize Log Insight cluster is configured with three nodes. This allows for
continued availability and increased log ingestion rates.
Figure 38 Logical Design of vRealize Log Insight
5.6.4.1 Sources of Log Data
vRealize Log Insight collects logs from the following virtual infrastructure and cloud management
components:
Management vCenter Server
o Platform Services Controller
o vCenter Server
Compute vCenter Server
o Platform Services Controller
o vCenter Server
Management, Edge and Compute ESXi hosts
NSX for vSphere for the management and for compute and edge clusters
o NSX Manager
o NSX Controller instances
o NSX Edge instances
vRealize Automation
o vRealize Orchestrator
o vRealize Automation components
vRealize Operations Manager
o Analytics cluster nodes
5.6.4.2 Cluster Nodes
The vRealize Log Insight cluster consists of one master node and two worker nodes. The
Integrated Load Balancer (ILB) on the cluster is configured to have vRealize Log Insight to
balance incoming traffic fairly among available nodes. vRealize Log Insight clients, using both
Copyright IBM and VMware Page 115 of 133
the Web user interface, and ingestion through syslog or the Ingestion API, connect to vRealize
Log Insight at the ILB address.
5.6.4.3 Sizing
In this design, the vRealize Log Insight virtual appliance has 2 vCPUs, 4 GB of virtual memory,
and 144 GB of disk space provisioned. vRealize Log Insight uses 100 GB of the disk space to
store raw, index and metadata.
5.6.4.4 Sizing Nodes
To accommodate all of log data from the products in this design, the Log Insight nodes must be
sized properly.
Table 63 Node Sizing
Attribute Specification
Appliance size Medium
Number of CPUs 8
Memory 16 GB
IOPS 1,000 IOPS
Amount of processed log data 38 GB/day
Number of process log messages 7,500
Environment Up to 250 syslog connections per node
Disk size 450 GB (see below)
5.6.4.5 Sizing Storage
For this design we will retain 7 days of data. Using the following calculations for the disk space
needs:
For 250 syslog sources at a rate of 150 MB of logs ingested per-day per-source over 7 days:
250 sources * 150 MB of log data ≈ 37 GB log data per-day
37 GB * 7 days ≈ 260 GB log data per vcraRealize Log Insight node
260 GB * 1.7 overhead index ≈ 450 GB
Note: vRealize Log Insight supports virtual hard disks of up to 2 TB. If more capacity is needed,
add another virtual hard disk. Do not extend existing retention virtual disks.
Page 116 of 133 Copyright IBM and VMware
5.6.4.6 Networking Design
In a multi-region deployment of the design, the vRealize Log Insight instances are connected to
both the vSphere management (gray in the network diagram) and the external management (blue
in the network diagram) networks. Each vRealize Log Insight instance is deployed within its own
application isolated network (gray boxes in the network diagram).
Figure 39 Networking Design for the vRealize Log Insight Deployment
5.6.4.7 Application Isolated Network Design
Each of the two instances of the vRealize Log Insight deployment is installed in its own isolated
application network. An NSX Edge appliance is configured at the front of each isolated
application network to provide the network isolation. NSX Edge services gateway is deployed in
front of the application isolated network to provide routing and load balancing. This networking
design has the following features:
Each application virtual network of vRealize Log Insight has connection to the application
virtual networks of vRealize Automation and vRealize Operations Manager through a
Copyright IBM and VMware Page 117 of 133
dedicated network called networkExchange. The role of networkExchange is to support
transit traffic and the exchange of routing tables.
All nodes have routed access to the vSphere management network through the Management
NSX Edge for the home region.
Routing to the vSphere management network and the external network is dynamic, and is
based on the Open Shortest Path First (OSPF) protocol.
The NSX Edge instances for the vRealize Log Insight are configured to use Source NAT
(SNAT) address translation when the vRealize Log Insight nodes access the public network.
The NSX Edge instances for the vRealize Log Insight provide access to vRealize Log Insight
from the public network over Destination NAT (DNAT).
Figure 40 Application Virtual Networks in the vRealize Log Insight Topology
5.6.4.8 IP Subnets
The following example subnets are allocated to the vRealize Log Insight deployment:
Table 64 IP Subnets in the Application Isolated Networks
vRealize Log Insight Cluster IP Subnet
Central Cloud 192.168.31.0/24
Cloud Region 192.168.32.0/24
5.6.4.9 DNS Names
vRealize Log Insight node name resolution uses a region-specific suffix, such as
central.tornado.local or lax01.tornado.local, including the load balancer virtual
IP addresses (VIPs). The Log Insight components in both regions have the following node names:
Table 65 Example DNS names of Log Insight nodes
DNS Name Role Region
vrli-cluster-01.central.tornado.local Log Insight ILB VIP A
Page 118 of 133 Copyright IBM and VMware
vrli-mstr01.central.tornado.local Master node A
vrli-wrkr01.central.tornado.local Worker node A
vrli-wrkr02.central.tornado.local Worker node A
vrli-cluster-51.regiona.tornado.local Log Insight ILB VIP B
vrli-mstr51.regiona.tornado.local Master node B
vrli-wrkr51.regiona.tornado.local Worker node B
vrli-wrkr52.regiona.tornado.local Worker node B
5.6.4.10 Retention and Archiving
In vRealize Log Insight, configure log retention for one week and archiving on storage sized for
90 days according to the vRealize Log Insight Design document.
5.6.4.10.1 Retention
vRealize Log Insight virtual appliances contain three default virtual disks and can use addition
virtual disks for storage, for example, hard disk 4.
Table 66 Virtual Disk Configuration in the vRealize Log Insight Virtual Appliance
Hard Disk Size Usage
Hard disk 1 12.125 GB Root file system
Hard disk 2 270 GB for
medium-size
deployment
Contains two partitions:
/storage/var System logs
/storage/core Storage for
Collected logs.
Hard disk 3 256 MB First boot only
Hard disk 4
(additional virtual
disk)
190 GB Storage for collected logs. The capacity from
this disk is added to /storage/core.
Calculate the storage space that is available for log data in the following way:
/storage/core = hard disk 2 space + hard disk 4 space - system
logs space on hard disk 2
Based on the size of the default and additional virtual disks, the storage core is equal to
/storage/core = 270 GB + 190 GB - 20 GB = 440 GB
Retention = /storage/core – 3% * /storage/core
If /storage/core is 425 GB, vRealize Log Insight can use 413 GB for retention.
Retention = 440 GB - 3% * 440 ≈ 427 GB
Configure a retention period of 7 days for the medium-size vRealize Log Insight appliance.
Copyright IBM and VMware Page 119 of 133
5.6.4.10.2 Archiving
vRealize Log Insight archives log messages as soon as possible. In the same time, they remain
retained on the virtual appliance until the free local space is almost over. Data exists on both the
vRealize Log Insight appliance and the archive location for most of the retention period. The
archiving period must be longer than the retention period.
The archive location must be on an NFS version 3 shared storage. The NFS export is mounted
directly to the vRealize Log Insight nodes, configured via the management web interface.
This design will apply an archive policy of 90 days for the medium-size vRealize Log Insight
appliance which takes up approximately about 1 TB of shared storage.
5.6.4.11 Alerting
vRealize Log Insight supports alerts that trigger notifications about its health. The following types
of alerts exist in vRealize Log Insight:
System Alerts. vRealize Log Insight generates notifications when an important system
event occurs, for example when the disk space is almost exhausted and vRealize Log
Insight must start deleting or archiving old log files.
Content Pack Alerts. Content packs contain default alerts that can be configured to send
notifications, these alerts are specific to the content pack and are disabled by default.
User-Defined Alerts. Administrators and users can define their own alerts based on data
ingested by vRealize Log Insight.
vRealize Log Insight handles alerts in two ways:
Send an e-mail over SMTP
Send to vRealize Operations Manager
5.6.4.12 SMTP Notification
E-mail notifications are enabled for alerts in vRealize Log Insight to point to the SMTP relay
service residing on the active directory server VMs.
5.6.4.13 Integration with vRealize Operations Manager
vRealize Log Insight integrates with vRealize Operations Manager to provide a central location
for monitoring and diagnostics.
vRealize Log Insight integrates with vRealize Operations Manager in the following ways:
Notification Events. Forward notification events from vRealize Log Insight to vRealize
Operations Manager.
Launch in Context. Launch vRealize Log Insight from the vRealize Operation Manager user
interface. This requires the vRealize Log Insight management pack installed in vRealize
Operations Manager.
5.6.4.14 Security and Authentication
vRealize Log Insight deployment utilizes a centralized role-based authentication using Microsoft
Active Directory as the authority.
5.6.4.15 Authentication
Role-based access control is enabled in vRealize Log Insight by using the existing tornado.local
Active Directory domain.
Page 120 of 133 Copyright IBM and VMware
5.6.4.16 Encryption
The default self-signed certificates are replaced with a CA-signed certificate to provide secure
access to the vRealize Log Insight Web user interface from the designs AD CA generated certs.
5.6.4.17 Configuration for Collecting Logs
Client applications send logs to vRealize Log Insight in one of the following ways:
Directly to vRealize Log Insight over the syslog protocol.
By using vRealize Log Insight agents.
Both of these are supported in this design for the different applications that provide the cloud
management platform.
5.6.4.18 Time Synchronization
Time synchronization is critical for the core functionality of vRealize Log Insight. vRealize Log
Insight synchronizes time with the SoftLayer NTP service.
Consistent NTP sources on all systems are configured that send log data (vCenter Server, ESXi,
vRealize Operation Manager). See Time Synchronization under Common Services.
5.6.4.19 Connectivity in the Cluster
This design requires that all vRealize Log Insight cluster nodes within a region are connected to
the same LAN with no firewall or NAT between the nodes.
5.6.4.20 External Communication
vRealize Log Insight receives log data over the syslog TCP, syslog TLS/SSL, or syslog UDP
protocol. The default syslog UDP protocol is used in this design, because security is already
designed at the level of the management network.
5.6.4.21 Event Forwarding between Regions
vRealize Log Insight supports event forwarding to other clusters and standalone instances. While
forwarding events, the vRealize Log Insight instance still ingests, stores and archives events
locally.
5.6.4.22 Event Forwarding Protocol
Forwarding of syslog data in vRealize Log Insight is achieved by using the Ingestion API or a
native syslog implementation.
The vRealize Log Insight Ingestion API uses TCP communication. In contrast to syslog, the
forwarding module supports the following features for the Ingestion API:
Forwarding to other vRealize Log Insight instances
Both structured and unstructured data, that is, multi-line messages.
Metadata in the form of tags
Client-side compression
Configurable disk-backed queue to save events until the server acknowledges the ingestion.
5.6.4.23 Disaster Recovery
Each region is configured to forward log information to the vRealize Log Insight instance in the
other region. Configuration of failover is not required.
Copyright IBM and VMware Page 121 of 133
5.6.5 Patching
VMware vSphere Update Manager (vUM) automates patch management and eliminates manual tracking
and patching of vSphere hosts and virtual machines. It compares the state of vSphere hosts with baselines,
then updates and patches to enforce compliance. Although vSphere Update Manager can be installed on the
same server as a vCenter Server, this design utilizes a vUM server installed on its own Windows Server
virtual machine due to the size of the environment and the use of vCenter Server Appliance. Additionally,
this design deploys two unique vUM servers in each region that is associated and registered to a unique
vCenter Server. This means that one vUM server is registered to the vCenter Server managing the
management clusters and another vUM server registered to the vCenter Server managing the capacity and
edge clusters.
In addition to associating vSphere Update Manager with a unique vCenter Server, vUM requires the use of
a dedicated database. In this design, the vUM associated with updating the capacity and edge clusters
connect to an instance of the Microsoft SQL Server 2012 database running on the same virtual machine.
The vUM associated with the management cluster uses the Microsoft SQL Server 2012 database that is
installed on the same machine as Update Manager.
5.6.5.1 vUM for vCenter Managing the Management Cluster
A vUM instance is linked to a vCenter Server instance. The following section describes the
configuration the vUM instance for the vCenter that manages the Management Cluster.
5.6.5.1.1 Compute and Storage Design
In this design, vUM is installed on a virtual machine running Windows 2012 Server with an
instance of Microsoft SQL Server 2012 installed. The virtual machine resides on the management
cluster and utilize a VSAN-backed datastore for storage. The resources for this virtual machine are
as follows in Table 67 Compute Resources for vUM vCenter Managing the Management Cluster.
Note that disk space was calculated using the vSphere Update Manager Sizing Estimator and
includes a 1.5 GB monthly utilization rate.
Table 67 Compute Resources for vUM vCenter Managing the Management Cluster
vCPU Memory Disk Space Disk Type
2 4 GB 60 GB Thin
5.6.5.1.2 Network Design
In this design, vUM is placed on the management network which will provide appropriate access
for it to upgrade and patch ESXi hosts, install and update third-party software on hosts, and
upgrade virtual machine hardware, VMware tools, and virtual appliances.
5.6.5.2 vUM for vCenter Managing Compute and Edge Clusters
A vUM instance is linked to a vCenter Server instance. The following section describes the
configuration the vUM instance for the vCenter that manages the Edge and Compute Clusters.
5.6.5.2.1 Compute and Storage Design
In this design, vUM is installed on a virtual machine running Windows 2012 Server and is
connected to instance of Microsoft SQL Server 2012 installed on the same VM. The virtual
machine resides on the management cluster and utilize a VSAN-backed datastore for storage. The
resources for this virtual machine are as follows in Table 68 Compute Resources for vUM vCenter
Managing the Compute and Edge Clusters. Note that disk space was calculated using the vSphere
Update Manager Sizing Estimator and includes a 1.5 GB monthly utilization rate.
Table 68 Compute Resources for vUM vCenter Managing the Compute and Edge Clusters
vCPU Memory Disk Space Disk Type
Page 122 of 133 Copyright IBM and VMware
2 4 GB 60 GB Thin
5.6.5.2.2 Network Design
In this design, vUM is placed on the management network which provides appropriate access for
it to upgrade and patch ESXi hosts, install and update third-party software on hosts, and upgrade
virtual machine hardware, VMware tools, and virtual appliances.
5.7 Business Services
This design uses vRealize Business Standard Edition to provide metering and chargeback functionality for
the cloud. vRealize Business provides director of cloud operations capabilities that allow them to gain
greater visibility into financial aspects of their IaaS services delivery and enable them to optimize and
improve these operations.
5.7.1.1 vRealize Business Standard Design
The following figure presents the design of vRealize Business Standard:
Figure 41 vRealize Business Logical Design
Data Collector
This component is responsible for connecting to vCenter Server instances and retrieving both
inventory information (servers, virtual machines, clusters, storage devices, and associations
between them) and usage (CPU and memory) statistics.
Reference Database
This component is responsible for providing default, out-of-the-box costs for each of the
supported cost drivers. References are updated periodically. For this solution, the reference
database is downloaded and installed manually. The new values affect cost calculation. Reference
data that is used depends on currency selected during installation. USD is used by default.
Communication between Data Collector and the Server
Data collector and the server communicate through a database. Data collector writes to the
database and the server reads the data. Data collector keeps inventory information time stamped,
so it is possible to retrieve and view the inventory back in time. The architecture of data collector
tables is flexible, and stores properties retrieved from the vCenter Server in key-value pairs.
Copyright IBM and VMware Page 123 of 133
5.7.1.2 vRealize Business Scalability
vRealize Business Standard Edition scales up to 20,000 virtual machines across four VMware
vCenter Server instances. This design implements one vRealize Business appliance for each
deployed vCenter.
vRealize Business Standard is deployed as a single virtual appliance hosted in the management
network of vRA. The appliance is configured with 2 vCPU, 4 GB of RAM and 50 GB disk.
5.7.1.3 vRealize Business Standard Integration
vRealize Business Standard Edition is integrated with vCenter Server and extracts the inventory
list from it. The inventory list contains information related to virtual machines configuration
information, ESXi hosts and cluster capacity, storage profiles and capacity, vCenter Server
attributes and tags.
vRealize Business Standard Edition has tight integration with vRealize Automation. It uses the
common services of vRealize Automation framework like SSO authentication and authorization.
The Infrastructure as a Service (IaaS) component of vRealize Automation consumes the base rate
APIs of vRealize Business Standard Edition to compute blueprint price of virtual machine.
Page 124 of 133 Copyright IBM and VMware
Appendix A – Bare Metal Summary
The bare metal servers used in this design consist of the following components found in the tables for
management, compute, and edge clusters below.
Management Cluster Nodes
The following table shows the bill of materials for the SoftLayer bare metal servers in the management
cluster.
Table 69 Management - Bare Metal Bill of Materials
Component Manufacturer / Model
Chassis SuperMicro PIO-628U-TR4T+-ST031
Motherboard SuperMicro X10DRU-i+_R1.02b
Processor 2 x 2.6GHz Intel Xeon-Haswell (E5-2690-V3-DodecaCore)
Memory 16 x 16GB
Network Interface Card 4 x Intel Ethernet Controller 10 Gigabit X540-AT2
Disk Controller Avago Technologies MegaRAID SAS 9361-8i
Disks 2 x 1 TB Seagate Constellation ES
2 x 1.2 TB Intel S3710
8 x 2 TB Western Digital
Compute Cluster Nodes
The following table shows the bill of materials for the SoftLayer bare metal servers in the compute cluster.
Table 70 Compute - Bare Metal Bill of Materials
Component Manufacturer / Model
Chassis SuperMicro PIO-628U-TR4T+-ST031
Motherboard SuperMicro X10DRU-i+_R1.02b
Processor 2 x 2.6GHz Intel Xeon-Haswell (E5-2690-V3-DodecaCore)
Memory 16 x 32GB
Network Interface Card 4 x Intel Ethernet Controller 10 Gigabit X540-AT2
Disk Controller Avago Technologies MegaRAID SAS 9361-8i
Disks 2 x 1 TB Seagate Constellation ES
2 x 1.2 TB Intel S3710
8 x 2 TB Western Digital
Copyright IBM and VMware Page 125 of 133
Edge Cluster Nodes
The following table shows the bill of materials for the SoftLayer bare metal servers in the Edge cluster.
Table 71 Edge - Bare Metal Bill of Materials
Component Manufacturer / Model
Chassis SuperMicro PIO-628U-TR4T+-ST031
Motherboard SuperMicro X10DRU-i+_R1.02b
Processor 2 x 2.6GHz Intel Xeon-Haswell (E5-2690-V3-DodecaCore)
Memory 8 x 16GB
Network Interface Card 4 x Intel Ethernet Controller 10 Gigabit X540-AT2
Disk Controller Avago Technologies MegaRAID SAS 9361-8i
Disks 2 x 1 TB Seagate Constellation ES
1 x 1.2 TB Intel S3710
4 x 2 TB Western Digital
Page 126 of 133 Copyright IBM and VMware
Appendix B – Software Bill of Materials The following software products and versions are used in this design for the cloud management platform.
This does not include any software products that are deployed by users when utilizing the cloud
management platform.
Table 72 Software Bill of Materials
Cloud Component Product Item Version
Virtual Infrastructure ESXi 6.0 U1b
vCenter Server Appliance (VIMISO) 6.0 U1
Virtual SAN 6.0 U1
vSphere Replication 6.1
VMware vCenter Site Recovery Manager 6.1
NSX for vSphere 6.2.1
Cloud Management vRealize Automation Appliance 6.2.3
vRealize Automation Identity Appliance 6.2.3
vRealize Orchestrator 6.0.3
vRealize Orchestrator Plug-in for NSX 1.0.2
Service Management vRealize Operations Manager Appliance 6.1.0
Management Pack for NSX for vSphere 2.0
Management Pack for vRealize Log Insight 1.0
Manager Management Pack for vRealize
Automation
1.0
Management Pack for Storage Devices 1.0
vRealize Log Insight 3.0
Business Continuity vSphere Data Protection 6.1
Business Management vRealize Business Standard 6.2.3
Copyright IBM and VMware Page 127 of 133
Cloud Component Product Item Version
Patching vSphere Update Manager 6.0U1b
Infrastructure Microsoft SQL Server 2012 R2
Microsoft Windows Server 2012 R2
Ubuntu Server 14.04 LTS
Software Orchestration Chef Server 12
Salt Stack 2015.8.1
Page 128 of 133 Copyright IBM and VMware
Appendix C – Management Virtual Machine Summary The following virtual machines are configured in the management cluster for the cloud management
platform by default.
Table 73 List of Management Cluster Virtual Machines and Sizes
Function vCPU
vRAM
(GB)
vDisk
(GB)
Analytics Edge #1 6 8 4.5
Analytics Edge #2 6 8 4.5
Analytics Master 8 32 1024
Analytics Replica 8 32 1024
Certificate Authority Server - Master CA 2 4 60
Certificate Authority Server - Subordinate 2 4 60
Chef Server and Software Binaries 2 4 300
Collector Edge #1 6 8 4.5
Collector Edge #2 6 8 4.5
Data Node #1 8 32 1024
Data Node #2 8 32 1024
DEM Worker #1 4 8 60
DEM Worker #2 4 8 60
IaaS Web Server Appliance #1 4 4 60
IaaS Web Server Appliance #2 4 4 60
Identity Appliance 1 2 10
Log Insight Edge #1 4 1 0.5
Log Insight Edge #2 4 1 0.5
Log Insight - Master 8 16 450
Log Insight - Slave #1 8 16 450
Log Insight - Slave #2 8 16 450
Management North-South Edge #1 6 8 4.5
Management North-South Edge #2 6 8 4.5
Model Manager and DEM Orchestrator 2 4 60
MS SQL Server 8 16 80
Primary AD, Master DNS and Master NTP 4 8 80
RDS Edge #1 4 1 0.5
RDS Edge #2 4 1 0.5
Remote Collector #1 4 16 250
Remote Collector #2 4 16 250
Copyright IBM and VMware Page 129 of 133
Function vCPU
vRAM
(GB)
vDisk
(GB)
Salt Stack Master 2 4 50
Secondary AD, Master DNS and Master NTP 4 8 80
Site Recovery Manager 4 8 60
vCenter - Compute and Edge 16 32 295
vCenter - Management 4 16 136
VMware Data Protection Appliance 4 12 8
vRealize Business Service 2 4 50
vRealize Automation Appliance #1 4 16 30
vRealize Automation Appliance #2 4 16 30
vRealize Automation Proxy Agent #1 2 4 60
vRealize Automation Proxy Agent #2 2 4 60
vRealize Edge #1 6 8 4.5
vRealize Edge #1 6 8 4.5
vRealize Orchestrator #1 2 4 16
vRealize Orchestrator #1 2 4 16
vSphere Disk Replicator 2 4 18
VMware Update Manager Edge #1 4 1 0.5
VMware Update Manager Edge #1 4 1 0.5
VMware Update Manager – Management 2 4 60
VMware Update Manager – Compute &
Edge 2 4 60
NSX Manager - Management 4 16 60
NSX Controller #1 – Management 4 4 20
NSX Controller #2 - Management 4 4 20
NSX Controller #3 - Management 4 4 20
NSX Manager – Compute and Edge 4 16 60
PSC – Management 2 2 30
PSC – Compute and Edge 2 2 30
TOTAL 255 536 8144
Page 130 of 133 Copyright IBM and VMware
The following virtual machines are configured in the Edge cluster by default;
Table 74 List of Default Edge Cluster Virtual Machines
Function vCPU
vRAM
(GB)
vDisk
(GB)
Edge North-South Compute #1 6 8 4.5
Edge North-South Compute #1 6 8 4.5
Edge East-West #1 4 1 0.5
Edge East-West #2 4 1 0.5
NSX Controller #1 Compute 4 4 20
NSX Controller #2 Compute 4 4 20
NSX Controller #3 Compute 4 4 20
Grand Total 32 30 70
Copyright IBM and VMware Page 131 of 133
Appendix D – Maximum Configurations The Advanced VMware SDDC on IBM Cloud solution supports the following;
One Central Cloud
One business entity (although multiple tenants are supported, the design is not intended for resale
purposes)
Up to four Cloud Regions
A Central Cloud of up to 10,000 VMs or 1,000 compute nodes
Each Cloud Region up to 10,000 VMs or 1,000 compute nodes
Up to a total of 50,000 VM’s can be supported by a single central cloud portal. This includes any
virtual machines on a user’s premise and management VMs.
Up to 100 concurrent transactions
Up to 2,500 catalog items on the self service portal across all users and groups
Up to 48 nodes per compute cluster per central cloud or cloud region
Up to 3 compute clusters per central cloud or cloud region
Page 132 of 133 Copyright IBM and VMware
Appendix E – Compatibility Guide
Browsers
The following web browsers are supported by the design,
Internet Explorer 10
Internet Explorer 11
Google Chrome
Mozilla Firefox
Guest Operating Systems
The following operating systems are supported as provisioned virtual machines,
Windows 7
Windows 8
Windows 8.1
Windows Server 2008 R2
Windows Server 2012
Windows Server 2012 R2
RHEL 5.9
RHEL 5.10
RHEL 6.1 Server
RHEL 6.4 Server
RHEL 6.5 Server
RHEL 7.0 Server
SLES 11 SP 2
SLES 11 SP 3
CentOS 5.10
CentOS 6.4
CentOS 6.5
CentOS 7.0
Ubuntu 12.04 LTS
Ubuntu 13.10
ESX 4.1 Update 2
ESXi 4.1 Update 2
ESXi 5.1 and updates
ESXi 5.5 and updates
ESXi 6.0
Copyright IBM and VMware Page 133 of 133
Appendix F – VMware VSAN Supported Configuration VMware has validated the VSAN configuration described in this architecture and has provided the
following statement of support as RPQ1107:
Supported Configuration VMware will provide support for the following supported configuration:
Support for Virtual SAN using Avago (LSI) 9361-8i IO Controller deployment with the recommendation
given below:
ESXi Version: VMware ESXi 6.0.0 Update 2 (bld#3800324)
Driver version: megaraid_sas Version 6.610.15.00 (bld# 2494585)
Firmware version: 4.650.00-6383
SSD Model: INTEL SSDSC2BA012T4
HDD Model: WDC WD2000FYYZ-01UL1B2
Mode supported: RAID-0 (Refer KB article http://kb.vmware.com/kb/2111266 for creating RAID-
0)
Checksum must remain enabled (enabled by default in VSAN 6.2)
Refer to the VSAN IO timeout settings mentioned in KB article:
http://kb.vmware.com/kb/2144936
To avoid high latencies on HDDs, keep the working set that fits the caching device
Supported with Write cache turned Off on HDD
Supported with no VMs on VMFS
Maximum number of drives tested with: 10
Hot-Plug feature has not been tested
Restrictions The VMware support policy purchased by SoftLayer will now cover the configuration described above.
Deviations from this configuration or any other VMware supported configuration will be considered
unsupported.
Please see https://www.vmware.com/support/policies/policy_index.html for VMware support policy
details.
Installation and Upgrade Support No upgrade or patching is supported to the recommended configuration without VMware consent or
recommendation.
Support Duration Support shall commence on the 26-Jul-2016 and would be supported for a period of 1 year. If additional
support is desired for this RPQ, a request stating your desired support period will need to be submitted to
VMware for consideration.
Transition from one-off (RPQ) to general support Currently there are no plans to include this in any GA release. But as and when the above functionality is
included in a future GA VMware build, VMware requires that SoftLayer transition to this new release in
order to obtain ongoing support. VMware will support SoftLayer in the use of the “supported
configuration” for a period of no more than 90 days following the GA release. This grace period is
provided to facilitate transition to the new release. Once the transition has occurred, SoftLayer may obtain
support through the standard support processes and policies. At this time, or at the conclusion of the 90 day
grace period, whichever comes first, support under this RPQ will terminate.