+ All Categories
Home > Documents > Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual...

Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual...

Date post: 24-Aug-2020
Category:
Upload: others
View: 15 times
Download: 0 times
Share this document with a friend
61
Dell EMC Ready Solutions for VDI Designs for Citrix Virtual Apps and Desktops on VxRail and vSAN Ready Nodes Validation Guide Abstract This validation guide describes the architecture and performance of the integration of Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyperconverged appliances or vSAN Ready Nodes in a VMware vSphere environment. Dell Technologies Solutions Part Number: H17344.3 October 2020
Transcript
Page 1: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Dell EMC Ready Solutions for VDIDesigns for Citrix Virtual Apps and Desktops on VxRail

and vSAN Ready NodesValidation Guide

Abstract

This validation guide describes the architecture and performance of the integration ofCitrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) andhosted shared desktops on Dell EMC VxRail hyperconverged appliances or vSAN ReadyNodes in a VMware vSphere environment.

Dell Technologies Solutions

Part Number: H17344.3October 2020

Page 2: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Notes, cautions, and warnings

NOTE: A NOTE indicates important information that helps you make better use of your product.

CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid

the problem.

WARNING: A WARNING indicates a potential for property damage, personal injury, or death.

© 2018 -2020 Dell Inc. or its subsidiaries. All rights reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries.Other trademarks may be trademarks of their respective owners.

Page 3: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Chapter 1: Executive Summary...................................................................................................... 5Document purpose.............................................................................................................................................................. 5Audience................................................................................................................................................................................ 5We value your feedback.....................................................................................................................................................5

Chapter 2: Test Environment Configuration and Best Practices.................................................... 6Validated hardware resources.......................................................................................................................................... 6Validated software resources........................................................................................................................................... 7Virtual networking configuration......................................................................................................................................7Management server infrastructure................................................................................................................................. 8

NVIDIA Virtual GPU Software License Server........................................................................................................8SQL Server databases.................................................................................................................................................. 8DNS................................................................................................................................................................................... 8

High availability.....................................................................................................................................................................8Citrix Virtual Apps and Desktops solution architecture..............................................................................................9

Chapter 3: Login VSI Performance Testing.................................................................................. 10Login VSI performance testing process....................................................................................................................... 10

Load generation............................................................................................................................................................ 10Login VSI workloads.................................................................................................................................................... 10Resource monitoring.................................................................................................................................................... 11Desktop VM configurations....................................................................................................................................... 13

Login VSI test results and analysis.................................................................................................................................14Summary of test results............................................................................................................................................. 14Login VSI Knowledge Worker, 522 users, ESXi 6.7u3, Citrix Virtual Desktops, 1912 LTSR....................... 16Login VSI Power Worker, 435 users, ESXi 6.7u3, Citrix Virtual Desktops, 1912 LTSR...............................22Login VSI Task Worker, 800 users, ESXi 6.7u3, Citrix Virtual Apps, 1912 LTSR..........................................28Login VSI Multimedia Worker, 48 users, ESXi 6.7u3, Citrix Virtual Desktops, 1912 LTSR.........................34

Chapter 4: NVIDIA nVector Performance Testing......................................................................... 41nVector performance testing process.......................................................................................................................... 41

Load generation............................................................................................................................................................ 41nVector Knowledge Worker workload....................................................................................................................42Resource monitoring...................................................................................................................................................42Desktop VM configurations.......................................................................................................................................44

nVector performance test results and analysis ........................................................................................................ 45Summary of test results............................................................................................................................................ 45nVector Knowledge Worker, 48 vGPU users, ESXi 6.7u3, Citrix Virtual Desktops, 1912 LTSR............... 46nVector Knowledge Worker, 48 users, non-graphics, ESXi 6.7u3, Citrix Virtual Desktops, 1912

LTSR...........................................................................................................................................................................52

Chapter 5: Conclusion................................................................................................................. 57Conclusion...........................................................................................................................................................................58User density recommendations..................................................................................................................................... 58

Contents

Contents 3

Page 4: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Summary..............................................................................................................................................................................58

Chapter 6: References................................................................................................................ 60Dell Technologies documentation................................................................................................................................. 60VMware documentation.................................................................................................................................................. 60Citrix documentation........................................................................................................................................................60NVIDIA documentation.................................................................................................................................................... 60

Appendix A: Appendix A: Login VSI metrics.................................................................................. 61

4 Contents

Page 5: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Executive SummaryThis chapter presents the following topics:

Topics:

• Document purpose• Audience• We value your feedback

Document purpose

This validation guide details the architecture, components, testing methods, and test results for Dell EMC VxRail appliances andvSAN Ready Nodes with Citrix Virtual Apps and Desktops. It includes the test environment configuration and best practices forsystems that have undergone testing.

AudienceThis guide is intended for architects, developers, and technical administrators of IT environments. It provides an in-depthexplanation of the testing methodology and basis for VDI densities. It also demonstrates the value of Dell EMC Ready Solutionsfor VDI, which deliver Microsoft Windows virtual desktops to users of Citrix Virtual Apps and Desktops VDI components onVxRail appliances or vSAN Ready Nodes.

We value your feedback

Dell Technologies and the authors of this document welcome your feedback on the solution and the solution documentation.Contact the Dell EMC Solutions team by email or provide your comments by completing our documentation survey.

Authors: Dell EMC Ready Solutions for VDI Team.

NOTE: The following website provides additional documentation for VDI Ready Solutions: VDI Info Hub for Ready Solutions.

1

Executive Summary 5

Page 6: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Test Environment Configuration and BestPractices

This chapter presents the following topics:

Topics:

• Validated hardware resources• Validated software resources• Virtual networking configuration• Management server infrastructure• High availability• Citrix Virtual Apps and Desktops solution architecture

Validated hardware resourcesThe Dell EMC Ready Solutions for VDI team validated the Citrix Virtual Apps and Desktops solution on Dell EMC VxRailappliances with the specific hardware resources listed in this section.

Enterprise platforms

We performed the testing with the Density Optimized configuration. Configuration details are given in the following table:

Table 1. Validated hardware configurations

Serverconfiguration

Platform CPU Memory RAID Ctrl HD configuration Network

DensityOptimized

VxRailV570F,BIOSversion2.4.8

2 x Intel XeonGold 6248 (20C, 2.5 GHz)

768 GB @2,933 MT/s

HBA 330 Adapter -16.17.00.03

Cache - 2 x 800 GB SASSSD, 2.5-inch Disk Drives

Capacity - 6 x 1.92 TB NLSAS SSD, 2.5-inch DiskDrives

BroadcomDual-Port25G rNDC -21.40.16.60

NOTE: With the introduction of the six-channels-per-CPU requirement for Skylake, and now Cascade Lake processors, the

Density Optimized memory configuration recommendation has increased from the previous guidance of 512 GB to 768 GB.

This change was necessary to ensure a balanced memory configuration and optimized performance for your VDI solution.

The additional memory is advantageous, considering the resulting increase in operating system resource utilization and the

enhanced experience for users when they have access to additional memory allocations.

Graphics hardware

We used NVIDIA T4 Tensor Core GPUs for the graphics workload testing. The NVIDIA T4 is a single-slot form factor, 70 W, 16GB DDR6 memory, and PCI Express Gen3, universal GPU for data center workflows. The NVIDIA T4 is flexible enough to runKnowledge Worker VDI or professional graphics workloads. You can configure up to six NVIDIA T4 GPU cards into a VxRailV570F appliance to enable 96 GB of graphics frame buffer. For modernized data centers, use this card in off-peak hours to runyour inferencing workloads.

2

6 Test Environment Configuration and Best Practices

Page 7: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Network hardware

We used the following network hardware for this testing:

● Dell EMC Networking S3048-ON (1 GbE ToR switch)—The S3048-ON switch accelerates applications in high-performance environments with a low-latency top-of-rack (ToR) switch that features 48 x 1 GbE and 4 x 10 GbE ports, adense 1U design, and up to 260 Gbps performance. This switch also supports ONIE for zero-touch installation of networkoperating systems.

● Dell EMC Networking S5248F-ON (25 GbE ToR switch)—The S5248F-ON switch provides optimum flexibility and cost-effectiveness for demanding compute and storage traffic environments. This ToR switch features 48 x 25 GbE SFP28 ports,4 x 100 GbE QSFP28 ports, and 2 x 100 GbE QFSP28-DD ports. This switch also supports ONIE.

For more information, see PowerSwitch Data Center Switches.

Validated software resourcesWe validated this solution with the software components listed in the following table:

Table 2. Software components

Category Software/version

Hypervisor VMware ESXi 6.7u3

Broker technology Citrix Virtual Apps and Desktops 7, Long Term Service Release (LTSR) 1912

Broker database Microsoft SQL Server 2017

Management VM operating system Microsoft Windows Server 2016

Virtual desktop operating system Microsoft Windows 10 Enterprise, 64-bit, 1909

Office application suite Microsoft Office 2019 Professional Plus

Login VSI test suite version 4.1.32.1

Platform Dell EMC VxRail v 4.7.410

NVIDIA vGPU software (for graphics testing) 10.1

Virtual networking configurationWe used 25 GbE networking for this validation effort. The VLAN configurations used in the testing were as follows:

● VLAN configuration:

○ Management VLAN: Configured for hypervisor infrastructure traffic—L3 routed using core switch○ VDI VLAN: Configured for VDI session traffic—L3 routed using core switch○ VMware vSAN VLAN: Configured for VMware vSAN traffic—L2 switched only using ToR switch○ vMotion VLAN: Configured for Live Migration traffic—L2 switched only, trunked from Core (HA only)○ VDI Management VLAN: Configured for VDI infrastructure traffic—L3 routed using core switch

● A VLAN iDRAC was configured for all hardware management traffic—L3 routed using core switch

Test Environment Configuration and Best Practices 7

Page 8: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Management server infrastructureThe following table lists the management server component sizing requirements:

Table 3. Sizing for VxRail Appliances or vSAN Ready Nodes

Component vCPU RAM (GB) NIC Operating system+ Data vDisk(GB)

Tier 2 Volume(GB)

VMware vCenter Appliance 2 16 1 290

Platform Services Controller 2 2 1 30

Citrix Delivery Controller and LicenseServer

4 8 1 40

SQL Server 5 8 1 40 210 (VMDK)

File Server 2 4 1 40 2048 (VMDK)

VxRail Appliance Manager 2 8 1 32

NVIDIA vGPU License Server 2 4 1 40 + 5

NVIDIA Virtual GPU Software License Server

When using NVIDIA vGPUs, graphics-enabled VMs must obtain a license from NVIDIA vGPU Software License Server on yournetwork.

We installed the vGPU license server software on a system running a Windows 2016 operating system to test vGPUconfigurations.

We made the following changes to the NVIDIA license server to address licensing requirements:

● Used a reserved fixed IP address● Configured a single MAC address● Applied time synchronization to all hosts on the same network

SQL Server databases

We used SQL Server 2017 for hosting the management component databases. We separated SQL data, logs, and tempdb intotheir respective volumes.

DNS

DNS is the basis for Microsoft Active Directory and also controls access to various software components for Citrix and VMwareservices. All hosts, VMs, and consumable software components must have a presence in DNS. We used a dynamic namespaceintegrated with Active Directory and adhered to Microsoft's best practices.

High availabilityAlthough we did not enable high availability (HA) during the validation that is documented in this guide, we strongly recommendthat you factor HA into any VDI design and deployment. This process follows the N+1 model with redundancy at both thehardware and software layers. The design guide for this architecture provides additional recommendations for HA and isavailable at the VDI Info Hub for Ready Solutions.

8 Test Environment Configuration and Best Practices

Page 9: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Citrix Virtual Apps and Desktops solution architectureCitrix Virtual Apps and Desktops enables organizations to deliver on-demand Windows applications and desktops to any device,over any network. It provides secure remote access to applications and data residing in the data center. Citrix HDX technologydelivers an excellent user experience by addressing the application performance and network challenges presented byvirtualization.

We tested with Citrix Virtual Apps and Desktops 7, 1912 Long Term Service Release (LTSR) version. We used the Citrix MachineCreation Services (MCS), linked-clone provisioning method to create Pooled-random desktops. MCS is a collection of servicesthat work together to create virtual desktops from a master image on demand, optimizing storage utilization, and providing apristine virtual machine to users each time they log in. Citrix Thinwire Plus was used as the remote display protocol in thisvalidation effort.

The following figure shows the architecture of the validated Citrix Virtual Apps and Desktop solution stack, including thenetwork, compute, management, and storage layers. The solution runs on the VxRail HCI platform based on VMware vSANsoftware-defined storage.

Figure 1. Citrix Virtual Apps and Desktops on VxRail architecture

We validated this solution with the Login VSI and NVIDIA nVector performance tools. We used a 4-node VxRail cluster. One ofthe hosts was used to host both management and compute VMs, and the other three hosts were used only for hosting computeVMs.

For the nVector test involving graphics workloads, only one compute node was used for the desktops VMs with six NVIDIA T4Tensor Core GPUs configured on that host. See the Design Guide for this solution for more information about the solutiondesign and best practices.

Test Environment Configuration and Best Practices 9

Page 10: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Login VSI Performance TestingThis chapter presents the following topics:

Topics:

• Login VSI performance testing process• Login VSI test results and analysis

Login VSI performance testing processWe conducted the performance analysis and characterization testing (PAAC) on this solution using the Login VSI load-generation tool. Login VSI is an industry-standard tool for benchmarking VDI workloads. It uses a carefully designed, holisticmethodology that monitors both hardware resource utilization parameters and end-user experience (EUE) during load testing.

We tested each user load against four runs: a pilot run to validate that the infrastructure was performing properly and valid datacould be captured, and three subsequent runs to enable data correlation.

During testing, while the environment was under load, we logged in to a session and completed tasks that correspond to theuser workload. While this test is subjective, it helps to provide a better understanding of the EUE in the desktop sessions,particularly under high load. It also helps to ensure reliable data gathering.

Load generation

Login VSI installs a standard collection of desktop application software, including Microsoft Office and Adobe Acrobat Reader,on each VDI desktop testing instance. It then uses a configurable launcher system to connect a specified number of simulatedusers to available desktops within the environment. When the simulated user is connected, a login script configures the userenvironment and starts a defined workload. Each launcher system can launch connections to several VDI desktops (targetmachines). A centralized management console configures and manages the launchers and the Login VSI environment.

We used the following login and boot conditions:

● Users were logged in within a login timeframe of 1 hour, except during testing of low-density solutions such as GPU/graphic-based configurations, in which case users were logged in every 10 to 15 seconds.

● All desktops were started before users were logged in.

Login VSI workloads

The following table describes the Login VSI workloads that we tested:

Table 4. Login VSI workloads

Profile name/workload

Workload description

Task Worker The least intensive of the standard workloads. This workload primarily runs Microsoft Excel andMicrosoft Internet Explorer, with some minimal Microsoft Word activity, as well as Microsoft Outlook,Adobe, and copy and zip actions. The applications are started and stopped infrequently, which resultsin lower CPU, memory, and disk I/O usage.

Knowledge Worker Designed for virtual machines with 2 vCPUs. This workload includes the following activities:

● Microsoft Outlook—Browse messages.● Internet Explorer—Browse websites and open a YouTube style video (480p movie trailer) three

times in every loop.● Word—Start one instance to measure response time and another to review and edit a document.

3

10 Login VSI Performance Testing

Page 11: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Table 4. Login VSI workloads (continued)

Profile name/workload

Workload description

● Doro PDF Printer and Acrobat Reader—Print a Word document and export it to PDF.● Excel—Open a large randomized sheet.● PowerPoint—Review and edit a presentation.● FreeMind—Run a Java-based Mind Mapping application.● Other—Perform various copy and zip actions.

Power Worker The most intensive of the standard workloads. The following activities are performed with thisworkload:

● Begin by opening four instances of Internet Explorer and two instances of Adobe Reader thatremain open throughout the workload.

● Perform more PDF printer actions than in the other workloads.● Watch a 720p and a 1080p video.● Reduce the idle time to two minutes.● Perform various copy and zip actions.

Multimedia Worker A workload that is designed to heavily stress the CPU when using software graphics acceleration.GPU-accelerated computing offloads the most compute-intensive sections of an application to theGPU while the CPU processes the remaining code. This modified workload uses the followingapplications for its GPU/CPU-intensive operations:

● Adobe Acrobat● Google Chrome● Google Earth● Microsoft Excel● HTML5 3D spinning balls● Internet Explorer● MP3● Microsoft Outlook● Microsoft PowerPoint● Microsoft Word● Streaming video

Resource monitoring

To ensure that the user experience was not compromised, we monitored the following important resources:

● Compute host servers—Solutions based on VMware vCenter for VMware vSphere gather key data (CPU, memory, disk,and network usage) from each of the compute hosts during each test run. This data is exported to .csv files for single hostsand then consolidated to show data from all hosts. While the report does not include specific performance metrics for themanagement host servers, these servers are monitored during testing to ensure that they are performing at an expectedlevel with no bottlenecks.

● Hardware resources—Resource overutilization can cause poor EUE. We monitored the relevant resource utilizationparameters and compared them to relatively conservative thresholds. The thresholds, as shown in the following table, wereselected based on industry best practices and our experience to provide an optimal trade-off between good EUE and cost-per-user while also allowing sufficient burst capacity for seasonal or intermittent spikes in demand.

Table 5. Parameter pass/fail thresholds for steady state utilization

Parameter Pass/fail threshold

Physical host CPU utilization 85% a

Physical host memory utilization 85%

Network throughput 85%

Disk latency 20 milliseconds

Login VSI Performance Testing 11

Page 12: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Table 5. Parameter pass/fail thresholds for steady state utilization (continued)

Parameter Pass/fail threshold

Login VSI failed sessions 2%

Physical host CPU readiness 10%

a. The Ready Solutions for VDI team recommends that average CPU utilization not exceed 85 percent in a productionenvironment. A 5 percent margin of error was allocated for this validation effort. Therefore, CPU utilization sometimesexceeds our recommended percentage. Because of the nature of Login VSI testing, these exceptions are reasonablefor determining our sizing guidance.

● GPU resources—vSphere Client monitoring collects data about the GPU resource use from a script that is run on ESXi 6.7and later hosts. The script runs for the duration of the test and contains NVIDIA System Management Interface commands.The commands query each GPU and log the GPU processor, temperature, and memory use data to a .csv file.

12 Login VSI Performance Testing

Page 13: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Desktop VM configurations

The following table summarizes the desktop VM configurations used for the Login VSI workloads that we tested:

Table 6. Desktop VM specifications

Login VSI workload vCPUs ESXi configuredmemory

ESXi reservedmemory

Screenresolution

Operating system

Task Worker 8* 32 GB* 32 GB* 1280 x 720 Windows Server 2016

Knowledge Worker 2 4 GB 2 GB 1920 x 1080 Windows 10 Enterprise 64-bit

Power Worker 2 6 GB 3 GB 1920 x 1080 Windows 10 Enterprise 64-bit

Multimedia Worker 4 8 GB 8 GB 1920 x 1080 Windows 10 Enterprise 64-bit

*Task Worker testing was carried using an RDSH hosted desktop approach: each RDSH VM hosted a number of virtual desktopusers and so the associated memory and CPU resources are shared between multiple users.

Login VSI Performance Testing 13

Page 14: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Login VSI test results and analysis

Summary of test results

The testing was performed on a 4-node VxRail cluster. We used the Citrix MCS linked-clone provisioning method to provisionpooled-random desktop VMs. The Citrix ThinWire Plus was used as the remote display protocol.

The following table summarizes the host utilization metrics for the different Login VSI workloads that we tested, and the userdensity derived from Login VSI performance testing. The CPU was the bottleneck in all of the test cases. In all of the tests weperformed, the CPU utilization metric reached the 85 percent threshold (with a +5 percent margin) that we set for CPUutilization.

Table 7. Login VSI test results summary

Serverconfiguration

Login VSIworkload

Userdensity

AverageCPUa

Averageconsumedmemory

Averageactivememory

AverageIOPS peruser

Averagenetwork Mbpsper user

DensityOptimized

Knowledge Worker 131 85% 578 GB 167 GB 5.22 1.4 Mbps

DensityOptimized

Power Worker 110 86% 704 GB 155 GB 5.20 1.8 Mbps

DensityOptimized

Task Worker 200 85% 297 GB 88 GB 1.19 1.2 Mbps

DensityOptimized + (6x NVIDIA T4)

Multimedia Worker(NVIDIA T4-2B)

48 92%b 449 GB 385 GB 12.73 17.06 Mbps

a. The Ready Solutions for VDI team recommends that average CPU utilization not exceed 85 percent in a productionenvironment. A 5 percent margin of error was allocated for this validation effort. Therefore, CPU utilization sometimesexceeds our recommended percentage. Because of the nature of Login VSI testing, these exceptions are reasonable fordetermining our sizing guidance.

b. Note that 48 users were achieved at 92 percent CPU utilization. The CPU utilization threshold of 85 percent was relaxedwhen testing with Multimedia Worker using a GPU. This test represents maximum utilization of the graphical resourcesavailable to the system as well as full user concurrency. Ideally, in a production environment, you should decrease the userdensity slightly or use higher bin processors to bring the CPU utilization closer to the 85 percent threshold.

These threshold values, as shown in Table 5, are carefully selected to deliver an optimal combination of excellent EUE and lowcost per user while also providing burst capacity for seasonal or intermittent spikes in usage. We did not load the system beyondthese thresholds to reach Login VSImax. Login VSImax shows the number of sessions that can be active on a system before thesystem is saturated.

Memory was not a constraint during testing. The total memory of 768 GB was sufficient for all of the Login VSI workloads to runwithout any constraints. With a dual-port 25 GbE NIC available on the hosts, network bandwidth was also not an issue. Disklatency was also under the threshold that we set, and disk performance was good.

For the Multimedia Worker workload test, we relaxed the 85 percent threshold to test with 48 users per VxRail node, which isthe maximum number of users that can be hosted on a VxRail node configured with six NVIDIA T4 GPUs and NVIDIA T4-2BvGPU profiles (2 GB of frame buffer per user). The total available frame buffer on the host with six NVIDIA T4 GPUs configuredis 96 GB. The CPU utilization recorded was 92 percent. However, the Login VSI scores and host metric results indicate thatboth user experience and performance were good during the running of this graphics-intensive workload.

We have recommended user densities based on the Login VSI test results and considering the thresholds that we set for hostutilization parameters. To maintain good EUE, do not exceed these thresholds. You can load more user sessions and exceedthese thresholds, but you might experience a degradation in user experience.

The host utilization metrics mentioned in the table are defined as follows:

● User density—The number of users per compute host that successfully completed the workload test within the acceptableresource limits for the host. For clusters, this number reflects the average of the density achieved for all compute hosts inthe cluster.

● Average CPU—The average CPU usage over the steady state period. For clusters, this number represents the combinedaverage CPU usage of all compute hosts. On the latest Intel processors, the ESXi host CPU metrics exceed the rated 100percent for the host if Turbo Boost is enabled, which is the default setting. An additional 35 percent of CPU is available from

14 Login VSI Performance Testing

Page 15: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

the Turbo Boost feature, but this additional CPU headroom is not reflected in the VMware vSphere metrics where theperformance data is gathered.

● Average consumed memory—The average consumed memory during the steady state period. ESXi consumed memory isthe amount of host physical memory granted within a host. For clusters, this is the average consumed memory across allcompute hosts over the steady state period.

● Average active memory—For ESXi hosts, the amount of memory that is actively used, as estimated by the VMKernelbased on recently accessed memory pages. For clusters, this is the average amount of physical guest memory that is activelyused across all compute hosts over the steady state period.

● Average IOPS per user—IOPS calculated from the average cluster disk IOPS over the steady state period divided by thenumber of users.

● Average network Mbps per user—Average network usage on all hosts calculated over the steady state period divided bythe number of users.

Login VSI Performance Testing 15

Page 16: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Login VSI Knowledge Worker, 522 users, ESXi 6.7u3, Citrix VirtualDesktops, 1912 LTSR

We performed this test with the Login VSI Knowledge Worker workload. We used a 4-node VxRail cluster. Host 3 hosted bothmanagement and desktop VMs. We populated the compute hosts with 131 desktop VMs machines each and the managementhost with 129 desktop VMs. We created Pooled-random desktops with the Citrix MCS linked-cloned provisioning method. Weused Citrix ThinWire Plus as the remote display protocol.

CPU usage

CPU usage with all VMs powered on was approximately 12 percent before the test started. The CPU usage steadily increasedduring the login phase, as shown in the following figure.

Figure 2. CPU usage

During the steady state phase, we recorded an average CPU utilization of 85 percent. This value is close to the pass/failthreshold we set for average CPU utilization (see Table 5). To maintain good EUE, do not exceed this threshold. You can loadmore user sessions while exceeding this threshold for CPU but you might experience a degradation in user experience.

As shown in the following figure, the CPU readiness was well below the 10 percent threshold that we set. CPU readiness isdefined as the percentage of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU.The CPU readiness percentage was low throughout testing, indicating that the VMs had no significant delays in scheduling CPUtime.

16 Login VSI Performance Testing

Page 17: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 3. CPU readiness

The average steady-state CPU core utilization across the four hosts was 74 percent, as shown in the following figure.

Figure 4. CPU core utilization

Memory

We observed no memory constraints during the testing on either the management or compute hosts. Out of 768 GB of availablememory per node, the compute host reached a maximum consumed memory of 594 GB and a steady state average of 578 GB.

Login VSI Performance Testing 17

Page 18: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 5. Consumed memory usage

Active memory usage reached a maximum active memory of 199 GB and a steady state average memory of 167 GB. There wasno memory ballooning or swapping on the hosts.

Figure 6. Active memory usage

18 Login VSI Performance Testing

Page 19: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Network usage

Network bandwidth was not an issue during the testing. The network usage, as shown in the following figure, reached a steadystate average of 753 Mbps. The busiest period for network traffic was during the steady state phase when a peak value of 926Mbps was recorded. The average steady state network usage per user was 1.4 Mbps.

Figure 7. Network usage

Cluster IOPS

Cluster IOPS reached a maximum value of 436 for read IOPS and 2,635 for write IOPS. The average steady state read and writeIOPS were 370 and 2,356, respectively. The average disk IOPS (read and write) per user was 5.22.

Login VSI Performance Testing 19

Page 20: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 8. Cluster IOPS

Disk I/O latency

Cluster disk latency reached a maximum read latency of 0.41 milliseconds and a maximum write latency of 0.78 milliseconds. Theaverage steady state read latency was 0.4 milliseconds, and the average write latency was 0.75 milliseconds.

Figure 9. Disk latency

20 Login VSI Performance Testing

Page 21: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

User experience

The baseline score for the Login VSI test was 952. This score falls in the 800 to 1,199 range rated as "Good" by Login VSI. Formore information about Login VSI baseline ratings and baseline calculations, see this Login VSImax article. The Login VSI testwas run for 522 user sessions for the Knowledge Worker workload. As indicated by the blue line in the following figure, thesystem reached a VSImax average score of 1,280 when 510 sessions were loaded. This value is well below the VSI thresholdscore of 1,953 set by the Login VSI tool.

Figure 10. Login VSI graph

During the testing, VSImax was never reached, which typically indicates a stable system and a better user experience. SeeAppendix A, which explains the Login VSI metrics discussed here.

The Login VSIMax user experience score for this test was not reached. During manual interaction with the sessions during thesteady state phase, the mouse and window movement were responsive, and video playback was good. No "stuck sessions" werereported during the testing, indicating that the system was not overloaded at any point.

Table 8. Login VSI score summary

Login VSI baseline VSI index average VSIMax reached VSI threshold

952 1,280 No 1,953

Login VSI Performance Testing 21

Page 22: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Login VSI Power Worker, 435 users, ESXi 6.7u3, Citrix VirtualDesktops, 1912 LTSR

We performed this test with the Login VSI Power Worker workload. We used a 4-node VxRail cluster. Host 3 was provisionedwith both management and desktop VMs. We populated the compute host with 110 desktop VMs and the management hostwith 105 desktop VMs. We created Pooled-random desktops with the Citrix MCS linked-cloned provisioning method. We usedCitrix ThinWire Plus as the remote display protocol.

CPU usage

As shown in the following figure, CPU usage with all VMs powered on was approximately 12 percent before the test started. TheCPU usage steadily increased during the login phase. During steady state, an average CPU utilization of 86 percent wasrecorded. This value is close to the pass/fail threshold we set for average CPU utilization (see Table 5). To maintain good EUE,do not exceed this threshold. You can load more user sessions while exceeding this threshold for CPU, but you might experiencea degradation in user experience.

Figure 11. CPU usage

As shown in the following figure, the CPU readiness was well below the 10 percent threshold that we set. The CPU readinesspercentage was low throughout testing, indicating that the VMs had no significant delays in scheduling CPU time.

22 Login VSI Performance Testing

Page 23: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 12. CPU readiness

As shown in the following figure, the average steady state CPU core utilization across the four hosts was 77 percent.

Figure 13. CPU core utilization

Memory

We observed no memory constraints during the testing on either the management or compute hosts. Out of 768 GB of availablememory per node, the compute host reached a maximum consumed memory of 706 GB and a steady state average of 704 GB,as shown in the following figure.

Login VSI Performance Testing 23

Page 24: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 14. Consumed memory usage

Active memory usage reached a maximum of 178 GB, and steady state average memory usage was 155 GB, as shown in thefollowing figure. There was no memory ballooning or swapping on the hosts.

Figure 15. Active memory usage

Network usage

Network bandwidth was not an issue in this test. A steady state average of 786 Mbps was recorded during the test. The busiestperiod for network traffic was towards the end of the logout phase when it reached a maximum network usage of 1,044 Mbps.The average steady state network usage per user was 1.8 Mbps.

24 Login VSI Performance Testing

Page 25: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 16. Network usage

Cluster IOPS

As shown in the following figure, the cluster read IOPS reached a maximum value of 391 for read IOPS and 2,269 for write IOPS.The average steady state read and write IOPS were 253 and 2,011, respectively. The average disk IOPS per user during thesteady state period was 5.2.

Figure 17. IOPS

Login VSI Performance Testing 25

Page 26: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Disk I/O latency

As shown in the following figure, cluster disk latency reached a maximum read latency of 0.44 milliseconds, and a maximum writelatency of 0.89 milliseconds during the steady state phase. The average steady state read latency was 0.41 milliseconds, and theaverage state write latency was 0.85 milliseconds.

Figure 18. Disk latency

User experience

The baseline score for the Login VSI test was 889. This score falls in the 800 to 1,199 range rated as "Good" by Login VSI. Formore information about Login VSI baseline ratings and baseline calculations, see this Login VSImax article. The Login VSI testwas run for 435 user sessions for the Power Worker workload. As indicated by the blue line in the following figure, the systemreached a VSImax average score of 1,240 when 435 sessions were loaded. This is well below the VSI threshold score of 1,889set by the Login VSI tool.

26 Login VSI Performance Testing

Page 27: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 19. Login VSI graph

During the duration of testing, VSImax was never reached, which normally indicates a stable system and a better userexperience. See Appendix A, which explains the Login VSI metrics discussed here.

During manual interaction with the sessions during the steady state phase, the mouse and window movement were responsive,and video playback was good. There was no "stuck session" reported during the testing, indicating that the system was notoverloaded at any time.

Table 9. Login VSI score summary

Login VSI baseline VSI index average VSIMax reached VSI threshold

889 1,240 No 1,889

Login VSI Performance Testing 27

Page 28: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Login VSI Task Worker, 800 users, ESXi 6.7u3, Citrix Virtual Apps,1912 LTSR

We performed this test with the Login VSI Task Worker workload. We used a 4-node VxRail cluster. Host 3 was provisioned withboth management and Remote Desktop Session Host (RDSH) VMs. We populated each host with eight RDSH VMs. Each hostran 200 Task Worker sessions. The RDSH VMs were provisioned using Citrix MCS. We used Citrix ThinWire Plus as the remotedisplay protocol.

CPU usage

As shown in the following figure, CPU usage with all VMs powered on was approximately 1 percent before the test started. TheCPU usage steadily increased during the login phase. During the steady state phase, an average CPU utilization of 85 percentwas recorded across the hosts. This value is close to the pass/fail threshold we set for average CPU utilization (see Table 5).To maintain good EUE, do not exceed this threshold. You can load more user sessions while exceeding this threshold for CPU,but you might experience a degradation in user experience.

Figure 20. CPU usage

As shown in the following figure, the CPU readiness was well below the 10 percent threshold that we set. The CPU readinesspercentage was low throughout testing, indicating that the VMs had no significant delays in scheduling CPU time.

28 Login VSI Performance Testing

Page 29: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 21. CPU readiness

The average steady state CPU core utilization across the four hosts was 69 percent, as shown in the following figure.

Figure 22. CPU core utilization

Memory

We observed no memory constraints during the testing on either the management or compute hosts. Out of 768 GB of availablememory per node, the compute host reached a maximum consumed memory of 317 GB and a steady state average of 297 GB,as shown in the following figure.

Login VSI Performance Testing 29

Page 30: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 23. Consumed memory usage

Active memory usage reached a maximum of 148 GB and a steady state average memory of 88 GB, as shown in the followingfigure. There was no memory ballooning or swapping on the hosts.

Figure 24. Active memory usage

Network usage

Network bandwidth was not an issue in this test. A steady state average of 956 Mbps was recorded during the test. The busiestperiod for network traffic was in the steady state phase, recording a maximum network usage of 1,320 Mbps. The averagesteady state network usage per user was 1.2 Mbps.

30 Login VSI Performance Testing

Page 31: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 25. Network usage

Cluster IOPS

Cluster IOPS reached a maximum value of 172 for read IOPS and a maximum value of 1,952 for write IOPS. The average steadystate read and write IOPS figures were 80 and 871, respectively. The average disk IOPS (read and write) per user was 1.2.

Figure 26. IOPS

Disk I/O latency

As shown in the following figure, cluster disk latency reached a maximum read latency of 1.18 milliseconds, and a maximum writelatency of 2.74 milliseconds during the steady state phase. The average steady state read latency was 1.14 milliseconds, and theaverage state write latency was 2.51 milliseconds.

Login VSI Performance Testing 31

Page 32: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 27. Disk latency

User experience

The baseline score for the Login VSI test was 635. This score falls in the 0 to 799 range rated as "Very Good" by Login VSI. Formore information about Login VSI baseline ratings and baseline calculations, see this Login VSImax article. The Login VSI testwas run for 800 RDSH user sessions for the Task Worker workload. As indicated by the blue line in the following figure, thesystem reached a VSImax average score of 964 when 800 sessions were loaded. This is well below the VSI threshold score of1,636 set by the Login VSI tool.

32 Login VSI Performance Testing

Page 33: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 28. Login VSI graph

During the duration of testing, VSImax was never reached, which normally indicates a stable system and a better userexperience. See Appendix A, which explains the Login VSI metrics discussed here.

During manual interaction with the sessions during the steady state phase, the mouse and window movement were responsive,and video playback was good. There was only three "stuck session" reported during the testing, indicating that the system wasnot overloaded at any time.

Table 10. Login VSI score summary

Login VSI baseline VSI index average VSIMax reached VSI threshold

635 964 No 1,636

Login VSI Performance Testing 33

Page 34: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Login VSI Multimedia Worker, 48 users, ESXi 6.7u3, Citrix VirtualDesktops, 1912 LTSR

We performed this test with the Login VSI Multimedia Worker workload. We configured six NVIDIA T4 GPUs on compute host 1.We provisioned the host with 48 vGPU-enabled VMs and each of the desktop VMs used the NVIDIA T4-2B vGPU profile. Thedesktop VMs were provisioned using Citrix MCS. We used Citrix ThinWire Plus as the remote display protocol.

CPU usage

As shown in the following figure, CPU usage with all GPU-enabled VMs powered on was approximately 8 percent before thetest started. The CPU usage steadily increased during the login phase. During the steady-state phase, an average CPUutilization of 92 percent was recorded on GPU-enabled compute host 1. We relaxed the 85 percent threshold for this testing sothat we could test with 48 users. In a production environment, you can decrease the user density or use a higher bin processorto achieve the 85 percent CPU utilization threshold.

Figure 29. CPU usage

As shown in the following figure, CPU readiness was well below the 10 percent threshold that we set. The CPU readinesspercentage was low throughout testing, indicating that the VMs had no significant delays in scheduling CPU time.

34 Login VSI Performance Testing

Page 35: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 30. CPU readiness

The average steady-state CPU core utilization across the four hosts was 81 percent, as shown in the following figure:

Figure 31. CPU core utilization

Login VSI Performance Testing 35

Page 36: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

GPU usage

The following figure shows the GPU usage across the six NVIDIA T4 GPUs that we configured on the host. The GPU usageduring the steady-state period across the six GPUs averaged approximately 39 percent. A peak GPU usage of 78 percent wasrecorded on GPU 4 during the steady-state phase.

Figure 32. GPU utilization

Memory

We observed no memory constraints during the testing on the GPU-enabled compute host. Out of 768 GB of available memoryper node, the GPU-enabled compute host reached a maximum consumed memory of 449 GB and a maximum active memory of385 GB, as shown in the following figures. No variations in memory usage occurred throughout the test because all vGPU-enabled VM memory was reserved. There was no memory ballooning or swapping on the hosts.

36 Login VSI Performance Testing

Page 37: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 33. Consumed memory usage

Figure 34. Active memory usage

Network usage

Network bandwidth was not an issue in this test. A steady-state average of 802 Mbps was recorded during the test, as shownin the following figure. The busiest period for network traffic was when a maximum network usage of 1,320 Mbps was recordedduring the steady-state phase. The average steady state network usage per user was 16.71 Mbps.

Login VSI Performance Testing 37

Page 38: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 35. Network usage

Cluster IOPS

As shown in the following figure, cluster IOPS reached a maximum value of 498 for read IOPS and a maximum value of 537 forwrite IOPS. The average steady state read and write IOPS figures were 184 and 427, respectively. The average disk IOPS (readand write) per user was 12.73.

Figure 36. Cluster IOPS

38 Login VSI Performance Testing

Page 39: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Disk I/O latency

As shown in the following figure, cluster disk latency reached a maximum read latency of 0.51 milliseconds and a maximum writelatency of 2.84 milliseconds during the steady-state phase. The average steady-state read latency was 0.39 milliseconds, andthe average steady-state write latency was 1.34 milliseconds.

Figure 37. Disk latency

User experience

The baseline score for the Login VSI test was 1,204. This score falls within the 1,200 to 1,599 range, which Login VSI rates as"Fair." For more information about Login VSI baseline ratings and baseline calculations, see this Login VSImax article. The LoginVSI test was run for 48 multimedia user sessions. As indicated by the blue line in the following figure, the system reached aVSImax average score of 1,940 when 48 sessions were loaded. This is well below the VSI threshold score of 2,204 set by theLogin VSI tool.

Login VSI Performance Testing 39

Page 40: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 38. Login VSI graph

For the duration of testing, VSImax was never reached, which indicates a stable system and a better user experience. For anexplanation of the Login VSI metrics, see Appendix A.

During manual interaction with the sessions during the steady-state phase, the mouse and window movement were responsive,and video playback was good. No "stuck sessions" were reported during the testing, indicating that the system was notoverloaded at any time. The following table shows the score summary:

Table 11. Login VSI score summary

Login VSI baseline VSI index average VSIMax reached VSI threshold

1,204 1,940 No 2,204

40 Login VSI Performance Testing

Page 41: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

NVIDIA nVector Performance TestingThis chapter presents the following topics:

Topics:

• nVector performance testing process• nVector performance test results and analysis

nVector performance testing processNVIDIA nVector is a performance testing tool for benchmarking VDI workloads. The nVector tool creates a load on the systemby simulating a workload that matches a typical VDI environment. The tool assesses the experience at the endpoint devicerather than the response time of the virtual desktop.

The nVector tool captures performance metrics from the endpoints that quantify user experience, including image quality, framerate, and user latency. These metrics, when combined with resource utilization information from the servers under test, enableIT teams to assess their VDI graphics-accelerated environment needs.

We tested multiple runs for each user load scenario to eliminate single-test bias. We used a pilot run to validate that the solutionwas functioning as expected, and we validated that testing data was being captured. We then tested subsequent runs toprovide data that confirmed that the results we obtained were consistent.

To assess the EUE experience, we logged into a VDI session and completed several tasks that are typical of a normal userworkload. This small incremental load on the system did not significantly affect our ability to provide reproducible results. Whilethe assessment undoubtedly is subjective, it helps to provide a better understanding of the EUE under high load. It also helps toassess the reliability of the overall testing data.

Load generation

The nVector tool runs the simulated workflow of a typical VDI workload at a predesignated scale. This part of the test requiresperformance monitoring to measure resource utilization. Acting as an execution engine, nVector orchestrates the followingnecessary stages that are involved in measuring EUE for a predefined number of VDI instances:

1. Provision VDI instances with predefined settings such as vCPU, vRAM, and frame buffer, and provision an equal number ofVMs that act as virtual thin clients.

2. Establish remote connections to VDI desktops using virtual clients.3. Measure resource utilization statistics on the server and on the guest operating system of the VDI desktop.4. Run the designated workload on all the VDI instances.5. Collect and analyze performance data and EUE measurements.6. Generate a report that reflects the trade-off between EUE and user density (scale).

The following figure shows the stages in the NVIDIA benchmarking tool's measurement of user experience:

4

NVIDIA nVector Performance Testing 41

Page 42: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 39. NVIDIA nVector workflow

nVector Knowledge Worker workload

We performed this testing exercise with NVIDIA's nVector Knowledge Worker workload. This workload contains a mixture oftypical office applications, including some multimedia usage. This workload is representative of what a typical office worker doesduring the working day. The activities performed include:

● Working on Microsoft Excel files● Scrolling through PDFs● Opening and working on Microsoft Word documents● Opening and presenting a Microsoft PowerPoint presentation● Opening and viewing web pages and web videos using Google Chrome● Opening and closing applications and saving or copying content

Resource monitoring

Host metrics

We used VMware vCenter to gather key host utilization metrics, including CPU, GPU, memory, disk, and network usage fromthe compute host during each test run. This data was exported to .csv files for each host and then consolidated for reporting.

Resource overutilization can cause poor EUE. We monitored the relevant resource utilization parameters and compared them torelatively conservative thresholds. The thresholds were selected based on industry best practices and our experience to providean optimal trade-off between good EUE and cost-per-user while also allowing enough burst capacity for seasonal or intermittentspikes in demand. The following table shows the pass/fail threshold for host utilization metrics:

Table 12. Resource utilization thresholds

Parameter Pass/fail threshold

Physical host CPU utilization 85%

Physical host memory utilization 85%

42 NVIDIA nVector Performance Testing

Page 43: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Table 12. Resource utilization thresholds (continued)

Parameter Pass/fail threshold

Network throughput 85%

Disk latency 20 milliseconds

Physical host CPU readiness 10%

Measuring EUE

This section explains the EUE metrics measured by the nVector tool. These metrics include image quality, frame rate, and end-user latency.

Metric 1: Image quality—NVIDIA nVector uses a lightweight agent on the VDI desktop and the client to measure imagequality. These agents take multiple screen captures on the VDI desktop and on the thin client to compare later. The structuralsimilarity (SSIM) of the screen capture taken on the client is computed by comparing it to the one taken on the VDI desktop.When the two images are similar, the heatmap reflects more colors above the spectrum, with an SSIM value closer to 1.0, asshown on the right-hand side in the following figure. As the images become less similar, the heatmap reflects more colors downthe spectrum, with a value of less than 1.0. More than a hundred pairs of images across an entire set of user sessions areobtained. The average SSIM index of all pairs of images is computed to provide the overall remote session quality for all users.

Figure 40. SSIM as a measure of image quality

Metric 2: Frame rate—Frame rate is a common measure of user experience that defines how smooth the experience is. Itmeasures the rate at which frames are delivered on the screen of the endpoint device. During the workload testing, NVIDIAnVector collects data on the frames per second (FPS) sent to the display device on the end client. This data is collected fromthousands of samples, and the value of the 90th percentile is taken for reporting. A larger FPS indicates a more fluid userexperience.

NVIDIA nVector Performance Testing 43

Page 44: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Metric 3: End-user latency—The end-user latency metric defines the level of response of a remote desktop or application. Itmeasures the duration of any lag that an end-user experiences when interacting with a remote desktop or application.

Desktop VM configurations

The following table summarizes the compute VM configurations for the nVector workloads that we tested:

Table 13. Desktop VM specifications

Test case nVectorworkload

vCPUs Memory Reservedmemory

vGPUprofile

Operatingsystem bitlevel

HD size Screenresolution

Non-GPU KnowledgeWorker

4 8 GB 8 GB N/A Windows 10,64-bit

60 GB 1920 x 1080

GPU KnowledgeWorker+GPU

4 8 GB 8 GB T4-2B Windows 10,64-bit

60 GB 1920 x 1080

44 NVIDIA nVector Performance Testing

Page 45: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

nVector performance test results and analysis

Summary of test results

We performed GPU and non-GPU tests with the NVIDIA nVector Knowledge Worker workload to compare the metrics fromboth test cases. For the GPU testing, we used a single compute host provisioned with six NVIDIA T4 GPUs. We ran 48 virtualmachines enabled with NVIDIA T4-2B vGPU profiles on this compute host. The vGPU scheduling policy was set to "Fixed ShareScheduler." For the non-GPU test, we performed testing on a compute host running 48 virtual machines without enabling vGPUprofiles.

The compute host was part of a 4-node, VMware vSAN software-defined storage cluster. We used the Citrix MCS linked-cloneprovisioning method to provision desktop VMs. The Citrix ThinWire Plus was used as the remote display protocol.

Both tests were performed with the NVIDIA nVector Knowledge Worker workload. Table 13 compares the host utilizationmetrics gathered from vCenter for both GPU and non-GPU test cases, while Table 14 compares the EUE metrics generated bythe nVector tool. For definitions of the nVector EUE metrics, see the "Measuring EUE" section. For definitions of host utilizationmetrics, see the Login VSI "Summary of test results" section.

The host utilization metrics in both tests were well below the threshold that we set (see Table 11). Both tests produced thesame image quality (SSIM value 0.99). However, with GPUs enabled, the frames per second (FPS) rate increased by 7.7percent, and the end-user latency decreased by 22.6 percent.

Table 14. Host utilization metrics

Testcase

Serverconfiguration

nVectorworkload

Density perhost

AverageCPU usage

AverageGPU usage

Averageactivememory

AverageIOPS peruser

Averagenet Mbpsper user

GPU DensityOptimized + sixNVIDIA T4s

KnowledgeWorker(NVIDIAT4-2B)

48 46% 18% 385 GB 7.56 2.8

Non-GPU

DensityOptimized

KnowledgeWorker

48 65% N/A 67 GB 11 2.7

Table 15. NVIDIA nVector EUE metrics

Testconfiguration

nVectorworkload

GPU profile Density perhost

End-userlatency

Frame rate Image quality

GPU KnowledgeWorker

NVIDIA T4-2B 48 82 milliseconds 14 0.99

Non-GPU KnowledgeWorker

N/A 48 106 milliseconds 13 0.99

NVIDIA nVector Performance Testing 45

Page 46: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

nVector Knowledge Worker, 48 vGPU users, ESXi 6.7u3, CitrixVirtual Desktops, 1912 LTSR

We performed this test on a 4-node VxRail cluster. Host 1 was configured with six NVIDIA T4 GPUs was running 48 virtualdesktops VMs. Each virtual desktop VM was configured with the NVIDIA T4-2B vGPU profile. Host 2 was used to run nVectorlauncher VMs. A launcher is an endpoint VM from where the desktop VM launch is initiated. Host 2 was configured with threeP40 GPUs, and the launcher VMs in host 2 were GPU-enabled through an NVIDIA P40-1B profile. It is a requirement for thenVector tool to enable launcher VMs with GPUs. Host 3 and 4 did not have any load.

The total GPU frame-buffer available on compute host 1 was 96 GB. With vGPU VMs enabled with the NVIDIA T4-2B profile,the maximum number of GPU-enabled users that can be hosted on compute host 1 is 48 users.

CPU

The following figure shows CPU utilization across the GPU-enabled host 1 and launcher host 2 during the testing. We can see aspike in CPU usage for compute host 1 during linked-clone creation and the login phase. During the steady state phase, anaverage CPU utilization of 46 percent was recorded on the GPU-enabled compute host 1. This value was lower than the pass/fail threshold we set for average CPU utilization (see Table 11).

Figure 41. CPU usage

As shown in the following figure, the CPU readiness percentage was well below the 10 percent threshold that we set. The CPUreadiness percentage was low throughout testing, indicating that the VMs had no significant delays in scheduling CPU time.

46 NVIDIA nVector Performance Testing

Page 47: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 42. CPU readiness

As shown in the following figure, the average steady state CPU core utilization was 41 percent on the GPU-enabled computehost 1.

Figure 43. CPU core utilization

GPU usage

The following figure shows the GPU usage across the six NVIDIA T4 GPUs configured on the GPU-enabled compute host 1. TheGPU usage during the steady state period across the six GPUs averaged approximately 18 percent.

NVIDIA nVector Performance Testing 47

Page 48: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 44. GPU usage

Memory

We observed no memory constraints during the testing on the compute host or the management host. Out of 768 GB ofavailable memory per node, the compute host 1 reached a maximum consumed memory of 439 GB.

Figure 45. Consumed memory usage

Active memory usage reached a maximum of 386 GB. There were no variations in memory usage throughout the test, as allvGPU-enabled VM memory was reserved. There was no memory ballooning or swapping on hosts.

48 NVIDIA nVector Performance Testing

Page 49: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 46. Active memory usage

Network usage

Network bandwidth was not an issue in this test. A steady state average network usage of 134 Mbps was recorded during thesteady state phase of the testing. The busiest period for network traffic was during the VM creation phase when we recorded amaximum network usage of 8,775 Mbps on compute host 1. The steady state average network usage per user was 2.8 Mbps.

Figure 47. Network usage

NVIDIA nVector Performance Testing 49

Page 50: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Cluster IOPS

As shown in the following figure, the cluster IOPS reached a maximum value of 28,454 for read IOPS and 4,905 for write IOPSduring the testing. The average steady state read IOPS was 111, and the average steady state write IOPS was 252. The averagesteady state IOPS (read and write) per user was 7.6.

Figure 48. Cluster IOPS

Disk latency

As shown in the following figure, cluster disk latency reached a maximum read latency of 1.51 milliseconds and a maximum writelatency of 2.76 milliseconds. The average steady state read latency was 0.15 milliseconds, and the average steady state writelatency was 0.8 milliseconds.

50 NVIDIA nVector Performance Testing

Page 51: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 49. Disk latency

NVIDIA nVector Performance Testing 51

Page 52: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

nVector Knowledge Worker, 48 users, non-graphics, ESXi 6.7u3,Citrix Virtual Desktops, 1912 LTSR

We ran this non-graphics test to compare the performance of a GPU-enabled host and a non-GPU host while running thenVector Knowledge Worker workload. We performed this test on a 4-node VxRail cluster. We ran the compute host 1 with 48desktop VMs. No GPUs were configured. The host 2 ran the nVector launcher virtual machines. We configured host 2 withthree P40 GPUs, and the launcher VMs in host 2 were GPU-enabled through an NVIDIA GRID P40-1B profile. It is a requirementfor the nVector tool to enable launcher VMs with vGPUs. Host 3 and 4 did not have any load.

CPU

The following figure shows the CPU utilization across the desktop host 1 and launcher host 2 during the testing. During thesteady state phase, an average CPU utilization of 65 percent was recorded on compute host 1. This value was lower than thepass/fail threshold we set for average CPU utilization (see Table 11). The launcher host 2 had very low CPU usage during thesteady state phase.

Figure 50. CPU usage

As shown in the following figure, the CPU readiness was well below the 5 percent threshold that we set.

52 NVIDIA nVector Performance Testing

Page 53: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 51. CPU readiness

As shown in the following figure, the average steady state CPU core utilization on the compute host 1 was 56 percent.

Figure 52. CPU core utilization

Memory

We observed no memory constraints during the testing on the compute or the management host. There was no memoryballooning or swapping on hosts. The steady state average consumed memory on the compute host 1 was 434 GB.

NVIDIA nVector Performance Testing 53

Page 54: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 53. Consumed memory usage

As shown in the following figure, the average steady state active memory usage was 67 GB.

Figure 54. Active memory usage

Network usage

Network bandwidth was not an issue in this test. A steady state average of 131 Mbps was recorded on compute host 1 duringthe steady state phase of the testing. The average network usage per user on the compute host 1 was 2.7 Mbps.

54 NVIDIA nVector Performance Testing

Page 55: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 55. Network usage

Cluster IOPS

As shown in the following figure, the cluster IOPS reached a maximum value of 24,279 for read IOPS and 3,968 for write IOPSduring the testing. The steady state average read IOPS was 198 and the average write IOPS was 333. The average steady statedisk IOPS (read and write) per user during the steady state was 11.

Figure 56. Cluster IOPS

Disk latency

As shown in the following figure, cluster disk latency reached a maximum read latency of 1.48 milliseconds and a maximum writelatency of 2.34 milliseconds during the testing. The average steady state read latency was 0.12 milliseconds, and the averagesteady state write latency was 0.47 milliseconds.

NVIDIA nVector Performance Testing 55

Page 56: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Figure 57. Disk latency

56 NVIDIA nVector Performance Testing

Page 57: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

ConclusionThis chapter presents the following topics:

Topics:

• Conclusion• User density recommendations• Summary

5

Conclusion 57

Page 58: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

ConclusionThis guide describes the integration of vSAN-based appliances from Dell Technologies and Citrix Virtual Apps and Desktopsbrokering software to create virtual application and desktop environments. This architecture provides exceptional scalability andan excellent user experience, and empowers IT teams to play a proactive strategic role in the organization.

Dell Technologies offers comprehensive, flexible, and efficient VDI solutions that are designed and optimized for theorganization's needs. These VDI solutions are easy to plan, deploy, and run.

Dell EMC Ready Solutions for VDI offer several key benefits to clients:

● Predictable costs, performance, and scalability to support a growing workforce● Rapid deployments● Rapid scaling, ready to serve enterprises of any size● Dell Technologies support

All the Dell EMC Ready Solutions for VDI are configured to produce similar results. You can be sure that the vSAN-basedappliances you choose have been designed and optimized for your organization's needs.

User density recommendationsThe recommended user densities in the following table were achieved during performance testing on VxRail appliances. Wefollowed the VMware best practice of FTT = 1 and configured a reserved slack space of 30 percent. All configurations weretested with Microsoft Windows 10, 64-bit, version 1909, and Microsoft Office 2019. We implemented all mitigations to patch theSpectre, Meltdown, and L1TF vulnerabilities at the hardware, firmware, and software levels to ensure an improved performanceimpact, which is reflected in the user densities that we achieved.

Table 16. User density recommendations for VMware vSphere ESXi 6.7 with Citrix Virtual Apps andDesktops, 1912 LTSR

Server configuration Workload User density

Density Optimized Login VSI Knowledge Worker 131

Density Optimized Login VSI Power Worker 110

Density Optimized + six NVIDIA T4s Login VSI Multimedia Worker (VirtualPC: T4-2B)

48

Density Optimized Login VSI Task Worker a 200

Density Optimized + six NVIDIA T4s nVector Knowledge Worker (Virtual PC:T4-2B)

48

a. This testing was carried out using Citrix Virtual Apps with Remote Desktop Session Host shared desktops.

SummaryWe have provided extensive performance testing results and guidance based on the PAAC testing carried out with the Login VSITask Worker, Knowledge Worker, Power Worker, Multimedia Worker, and nVector Knowledge Worker workloads. The 2ndGeneration Intel Xeon Scalable processors in our Density Optimized configuration provide performance, density, and agility foryour VDI workloads. The NVIDIA GPU options offer exceptional performance for graphics-intensive workloads.

For the Login VSI workloads that we tested, the recommended user densities were achieved before reaching the Login VSImaxscores, indicating that the system was not saturated. We ensured that the host utilization metrics were under the thresholdsthat we set to maintain a stable and high performing VDI environment. You will get exceptional user experience and performanceby running our Citrix Virtual Apps and Desktops solution with these recommended user densities. From the nVector KnowledgeWorker workload test, we found that desktop VMs running on hosts configured with GPUs have increased frame rate anddecreased end-user latency when compared to a host running with no GPUs. You may get more value out of GPUs if you runmore graphic-intensive workloads or if users have multiple monitors with 4K resolution.

The configurations for the VxRail appliances are optimized for VDI. We selected the memory and CPU configurations thatprovide optimal performance. You can change these configurations to meet your environmental requirements, but keep in mindthat changing the memory and CPU configurations from those that have been validated in this document will affect the user

58 Conclusion

Page 59: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

density per host. The guidance provided for VxRail appliances also applies to solutions based on VMware vSAN Ready nodes.Use VxRail sizing tools for sizing the solution and reserve resources for management tools when designing your VDIenvironment. For further assistance on our solutions, contact your Dell Technologies account representative.

With VDI solutions from Dell Technologies, you can streamline the design and implementation process, and be assured that youhave a solution that is optimized for performance, density, and cost-effectiveness.

Conclusion 59

Page 60: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

ReferencesThis chapter presents the following topics:

Topics:

• Dell Technologies documentation• VMware documentation• Citrix documentation• NVIDIA documentation

Dell Technologies documentationThe following links provide additional information from Dell Technologies. Access to these documents depends on your logincredentials. If you do not have access to a document, contact your Dell Technologies representative. Also see the VDI Info Hubfor a complete list of VDI resources.

● VDI Validation Guide—Citrix Virtual Apps and Desktops on VxRail and vSAN Ready Nodes● VDI Design Guide—Citrix Virtual Apps and Desktops on VxRail and vSAN Ready Nodes● Dell Technologies Virtual Desktop Infrastructure● Dell EMC VxRail Hyperconverged Infrastructure● Dell EMC vSAN Ready Nodes

Previous versions

Previous versions of the documentation for this solution can be found here:

● VDI Info Hub Archive

VMware documentation● VMware vSphere documentation● vSAN Ready Node Configurator● VMware Compatibility Guide● vSAN Hardware Quick Reference Guide

Citrix documentation● Citrix Virtual Apps and Desktops● Citrix Virtual Apps and Desktops technical documentation● Citrix Virtual Apps and Desktops 7 1912 LTSR

NVIDIA documentation● NVIDIA Virtual GPU Software Quick Start Guide

6

60 References

Page 61: Dell EMC Ready Architectures for VDI · Citrix Virtual Apps and Desktops components for virtual desktop infrastructure (VDI) and hosted shared desktops on Dell EMC VxRail hyper-converged

Appendix A: Login VSI metricsTable 17. Description of Login VSI metrics

Login VSImetrics

Description

VSImax VSImax shows the number of sessions that can be active on a system before the system is saturated. It isthe point where the VSImax V4 average graph line meets the VSImax V4 threshold graph line. Theintersection is indicated by a red X in the Login VSI graph. This number gives you an indication of thescalability of the environment (higher is better).

VSIbase VSIbase is the best performance of the system during a test (the lowest response times). This number isused to determine what the performance threshold will be. VSIbase gives an indication of the baseperformance of the environment (lower is better).

VSImax v4average

VSImax v4 average is calculated on the number of active users that are logged into the system, but removesthe two highest and two lowest samples to provide a more accurate measurement.

VSImax v4threshold

VSImax v4 threshold indicates at which point the environment's saturation point is reached (based onVSIbase).

A

Appendix A: Login VSI metrics 61


Recommended