+ All Categories
Home > Documents > Azure Stack HCI

Azure Stack HCI

Date post: 23-Oct-2021
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
103
Azure Stack HCI The best infrastructure for hybrid Module 3 Core Networking
Transcript
Page 1: Azure Stack HCI

Azure Stack HCIThe best infrastructure for hybrid

Module 3Core Networking

Page 2: Azure Stack HCI

Core Networking

Page 3: Azure Stack HCI

Learnings Covered in this Unit

Simplifying the Network

Network Deployment Options

The Virtual Switch

Acceleration Technologies

Virtual Network Adapters

Page 4: Azure Stack HCI

Ph

ysic

al N

IC

Virtual SWITCH

Page 5: Azure Stack HCI

Ph

ysi

cal N

IC

Virtual SWITCH

Ho

st v

NIC

MGMT SMB1 SMB2

Gu

est

s

Networ

k

VLAN

ID

QOS

Weight

Acceler

ations

Manage

ment

0 5

Storage

A

5 DCB 50 vRDMA

Storage

B

6 DCB 50 vRDMA

Guests 10-99 1-5 SR-IOV

Page 6: Azure Stack HCI

Three Types of Core Networks

Management

• Part of North-South

Network

• Used for Host

Communication

Compute

• Part of North-South

Network

• Virtual Machine Traffic

• Needs varying levels of

QOS

• May need SR-IOV,

vRDMA

Storage

• East-West Only

• Needs RDMA

• 10GB+

• Can host Live Migration

Page 7: Azure Stack HCI

Traffic in HCICluster Heartbeats & Inter-Node comms

[SMB] Storage Bus Layer

[SMB] Cluster Shared Volume

[SMB] Storage Rebuild

[SMB, possibly] Live Migrations

Generally RDMA Traffic

East West

Page 8: Azure Stack HCI

Traffic in S2DExternal (to the S2D cluster)

VM Tenant traffic

Could be any protocol

North

South

Page 9: Azure Stack HCI

pNIC = Physical NIC on the Host

vNIC = Host Hyper-V Virtual Network Adapter

vmNIC = Virtual Machine Hyper-V Virtual Network Adapter

tNIC = Microsoft Team Network Interface Controller (vlan tagged in the LBFO Team)

NIC Terminology Refresher

Page 10: Azure Stack HCI

Cluster Network Deployment Options

Converged Network

Combining Multiple Network Intents (MGMT, Compute, Storage)

Best if deploying 3+ Physical Nodes

Connect pNics to Top-Of-Rack Switches

ROCEv2 Highly Recommended

Switchless

North-South Communication is a Team, combining Compute and Management Networks

Storage (E-W) is directly connected Node to Node

iWarp Recommended

No need to configure Data-Center Bridging (DCB)Features

Really only for HCI Clusters with 2 Physical Nodes

Hybrid

Best of both, easy deployment of Compute/Mgmt on North-South

Separate Storage Nics into separate adapters, not teamed

iWarp or ROCE it doesn’t matter

DCB Config not required, but recommended.

Page 11: Azure Stack HCI

Converged

CSV

Host vNICMGMT

Host vNIC

SET VM Switch

10GBHyper-V HostPhysical NIC

10GBHyper-V HostPhysical NIC

LM

Host vNIC

VM1 VM2 VM3

SMB Multichannel

Page 12: Azure Stack HCI

Switchless

Top of Rack Switch

SET vSwitch SET vSwitch

SMB1-10GB

SMB2-10GB

Page 13: Azure Stack HCI

Hybrid

MGMT

Host vNIC

SET VM Switch

10GBHyper-V HostPhysical NIC

10GBHyper-V HostPhysical NIC

VM1 VM2 VM3Top of Rack Switch

Top of Rack Switch

10GBHyper-V HostPhysical NIC

10GBHyper-V HostPhysical NIC

SMB1 SMB2

Page 14: Azure Stack HCI

Networking Stack overview

Azure Stack HCI Node

VM Storage

SMB

Host Partition

VM

DCB

pNIC

VM

DCB

pNIC

Hyper-V Switch (SDN)

Integrated NIC Teaming

Legend:RDMA

TCP/IP

Page 15: Azure Stack HCI

Networking Stack overview

• Virtual Switch

• ManagementOS NICs

• VM NICs

Azure Stack HCI Node

VM Storage

SMB

Host Partition

VM

DCB

pNIC

VM

DCB

pNIC

Hyper-V Switch (SDN)

Integrated NIC Teaming

Legend:RDMA

TCP/IP

Page 16: Azure Stack HCI

Networking Stack overview

• Physical NICs

• ManagementOS NICs

• VM NICs

Azure Stack HCI

VM Storage

SMB

Host Partition

VM

DCB

pNIC

VM

DCB

pNIC

Hyper-V Switch (SDN)

Integrated NIC Teaming

Legend:RDMA

TCP/IP

Page 17: Azure Stack HCI

High availability

Load Balancing and Failover (LBFO): Switch Embedded Teaming (SET):

Switch Switch

NIC NIC

Team

vSwitch

vNIC vNIC vNIC vmNIC

Switch Switch

NIC NIC

vSwitch

vNIC vNIC vNIC vmNIC

tNIC

Page 18: Azure Stack HCI

SET Switch benefits

Switch Embedded Teaming•

Switch Switch

NIC NIC

vSwitch

vNIC vNIC vNIC vmNIC

Page 19: Azure Stack HCI

SET Switch limitations

• Network adapters has to be same

• Hyper-V/Dynamic load balancing only

• Switch Independent only

• LACP/Static not supported

• Active/Passive not supported

Switch Embedded Teaming

Switch Switch

NIC NIC

vSwitch

vNIC vNIC vNIC vmNIC

Page 20: Azure Stack HCI

New-VMSwitch -Name SETSwitch -EnableEmbeddedTeaming $TRUE -NetAdapterName(Get-NetIPAddress -IPAddress 10.* ).InterfaceAlias

• Automatically creates one management network adapter

• InterfaceAlias can be also queried with commands such as (Get-Netadapter -InterfaceDescription Mellanox*).InterfaceAlias to select just some model

PowerShell command

Page 21: Azure Stack HCI

Network Quality of Service

Page 22: Azure Stack HCI

Bandwidth Mode

New-VMSwitch -Name "vSwitch" -AllowManagementOS $true -NetAdapterName NIC1-MinimumBandwidthMode <Absolute | Default | None | Weight >

Page 23: Azure Stack HCI

New-VMSwitch -Name "vSwitch" -AllowManagementOS $true -NetAdapterName NIC1-MinimumBandwidthMode Weight

New-VMSwitch -Name "vSwitch" -AllowManagementOS $true -NetAdapterName NIC1-MinimumBandwidthMode Absolute

Page 24: Azure Stack HCI
Page 25: Azure Stack HCI
Page 26: Azure Stack HCI

Warning!

Page 27: Azure Stack HCI
Page 28: Azure Stack HCI

Common Networking Challengesin Azure Stack HCI

Deployment Time Complexity Error Prone

Page 29: Azure Stack HCI

Common Networking Challengesin Azure Stack HCI

Deployment Time Complexity Error Prone

Page 30: Azure Stack HCI

Network ATC

New host management service on Azure Stack HCI

Install-WindowsFeature -Name NetworkATC

Available for All Azure Stack HCI Subscribers (via feature update) in 2021

Page 31: Azure Stack HCI

Intent

Complexity: HCI Converged Example

pNIC

Default OS

Host vNIC

Guest

Management VLAN 100

Storage VLAN 1 711

Storage VLAN 2 712

Storage MTU 9K

Cluster Traffic Class 7

Cluster Bandwidth Reservation 1%

RDMA Traffic Class 3

RDMA Bandwidth Reservation 50%

Page 32: Azure Stack HCI

Rename-NetAdapter -Name <OldName> -NewName ConvergedPNIC1

Set-NetAdapterAdvancedProperty -Name ConvergedPNIC1 -RegistryKeyword VLANID -RegistryValue 0

New-VMSwitch -Name ConvergedSwitch -AllowManagementOS $false -EnableIov $true -EnableEmbeddedTeaming $true -

NetAdapterName ConvergedPNIC1

Rename-NetAdapter -Name <OldName> -NewName ConvergedPNIC2

Set-NetAdapterAdvancedProperty -Name ConvergedPNIC2 -RegistryKeyword VLANID -RegistryValue 0

Add-VMSwitchTeamMember -VMSwitchName ConvergedSwitch -NetAdapterName ConvergedPNIC2

Set-NetAdapterRss -Name ConvergedPNIC1 -NumberOfReceiveQueues 16 -MaxProcessors 16 -BaseProcessorNumber 2 -

MaxProcessorNumber 19

Set-NetAdapterRss -Name ConvergedPNIC2 -NumberOfReceiveQueues 16 -MaxProcessors 16 -BaseProcessorNumber 2 -

MaxProcessorNumber 19

Complexity: Physical NICs, vSwitch, and VMQ

Page 33: Azure Stack HCI

Add-VMNetworkAdapter -ManagementOS -SwitchName ConvergedSwitch -Name Management

Rename-NetAdapter –Name *Management* -NewName Management

New-NetIPAddress -InterfaceAlias Management -AddressFamily IPv4 -IPAddress 192.168.0.51 -PrefixLength 24 -DefaultGateway

192.168.0.1

Set-VMNetworkAdapterIsolation -ManagementOS –VMNetworkAdapterName Management -AllowUntaggedTraffic $True -IsolationMode

VLAN -DefaultIsolationID 10

Add-VMNetworkAdapter -ManagementOS -SwitchName ConvergedSwitch -Name SMB01

Rename-NetAdapter –Name *SMB01* -NewName SMB01

Set-VMNetworkAdapterIsolation -ManagementOS –VMNetworkAdapterName SMB01 -AllowUntaggedTraffic $True -IsolationMode VLAN -

DefaultIsolationID 11

Set-VMNetworkAdapterTeamMapping -ManagementOS -SwitchName ConvergedSwitch -VMNetworkAdapterName SMB01 -

PhysicalNetAdapterName ConvergedPNIC1

Set-DnsClient -InterfaceAlias *SMB01* -RegisterThisConnectionsAddress $true

Complexity: Virtual NICs (Mgmt and SMB01)

Page 34: Azure Stack HCI

Add-VMNetworkAdapter -ManagementOS -SwitchName ConvergedSwitch -Name SMB02

Rename-NetAdapter –Name *SMB02* -NewName SMB02

Set-VMNetworkAdapterIsolation -ManagementOS –VMNetworkAdapterName SMB02 -AllowUntaggedTraffic $True -

IsolationMode VLAN -DefaultIsolationID 12

Set-VMNetworkAdapterTeamMapping -ManagementOS -SwitchName ConvergedSwitch -VMNetworkAdapterName SMB02 -

PhysicalNetAdapterName ConvergedPNIC2

Set-DnsClient -InterfaceAlias *SMB02* -RegisterThisConnectionsAddress $true

New-NetIPAddress -InterfaceAlias *SMB01* -AddressFamily IPv4 -IPAddress 192.168.1.1 -PrefixLength 24

New-NetIPAddress -InterfaceAlias *SMB02* -AddressFamily IPv4 -IPAddress 192.168.2.1 -PrefixLength 24

Complexity: Virtual NICs (SMB02)

Page 35: Azure Stack HCI

Install-WindowsFeature -Name Data-Center-Bridging

New-NetQosPolicy -Name 'Cluster' -Cluster -PriorityValue8021Action 7

New-NetQosTrafficClass -Name 'Cluster' -Priority 7 -BandwidthPercentage 1 -Algorithm ETS

New-NetQosPolicy -Name 'SMB' -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3

New-NetQosTrafficClass 'SMB' -Priority 3 -BandwidthPercentage 50 -Algorithm ETS

New-NetQosPolicy -Name 'DEFAULT' -Default -PriorityValue8021Action 0

Disable-NetQosFlowControl -Priority 0, 1, 2, 4, 5, 6, 7

Enable-NetQosFlowControl -Priority 3

Set-NetQosDcbxSetting -InterfaceAlias ConvergedPNIC1 -Willing $False

Set-NetQosDcbxSetting -InterfaceAlias ConvergedPNIC2 -Willing $False

Enable-NetAdapterQos -InterfaceAlias ConvergedPNIC1, ConvergedPNIC2

<< Customer must now get the physical fabric configured to match these settings >>

Complexity: Configure DCB for Storage NICs

Page 36: Azure Stack HCI

> 30+ cmdlets…

> 90+ parameters…

Match Settings on Switch

Repeat Exactly on Node 2, 3, 4…

Repeat Exactly on cluster a, b, c…

Page 37: Azure Stack HCI

Goals

• Deploy your Network Host through only a few commands

• Don’t worry about turning every knob

• Don’t worry about changed defaults between OS’

• Don’t worry about latest best practices

• Don’t worry about it changing (configuration drift)

You have enough to worry about

Page 38: Azure Stack HCI

Add-NetIntent -Management -Compute -Storage -ClusterName HCI01 -AdapterName pNIC1, pNIC2

Page 39: Azure Stack HCI
Page 40: Azure Stack HCI

Networking in Azure Stack HCIwith Network ATC

Deployment Time Complexity Error Prone

Page 41: Azure Stack HCI

Summary: Network ATC

Intent-based host network deployment

Deploy the whole cluster with ~1 command

Easily replicate the same configuration to another cluster

Outcome driven; we’ll handle default OS changes

Always deployed with the latest, Microsoft supported and validated best practices

You stay in control with overrides

Auto-remediates configuration drift

Available in Azure Stack HCI 21H2

Page 42: Azure Stack HCI

Networking Stack overview

Components

• Physical NICs

• Virtual Switch

Supporting technologies

• LBFO Teaming/SET

• Offloading technologies

• SMB Direct (RDMA)

• …

Azure Stack HCI Node

VM Storage

SMB

Host Partition

VM

DCB

pNIC

VM

DCB

pNIC

Hyper-V Switch (SDN)

Integrated NIC Teaming

Legend:RDMA

TCP/IP

Page 43: Azure Stack HCI

Management OS vNICs

Page 44: Azure Stack HCI

Almost same as Management OS NICs, except connected to VMs

• Azure Stack HCI supports Guest RDMA on the vmNIC

• You can use SR-IOV in VMs (More information in the SR-IOV slides)

Virtual Machine vmNICs

Page 45: Azure Stack HCI
Page 46: Azure Stack HCI

RDMA (Remote Direct Memory Access)• Typically East to West traffic

• Transfer data from an application (SMB) to pre-allocated memory of another system

• Low Latency; high throughput; minimal host CPU processingUse Diskspd to test (must leverage SMB over the network)

• Two predominant RDMA “transports” on WindowsiWARP – S2D Recommended (lossless OOB)

RoCE (RDMA over Converged Ethernet) (lossless with DCB)

Vendor iWARP RoCE

Broadcom No Yes

Cavium Yes Yes

Chelsio Yes No

Intel Yes No

Mellanox No Yes

Page 47: Azure Stack HCI

Remote Direct Memory Access (RDMA) Traffic Flow

51

File Client

SMBBuffer

File Server

With RDMAWithout RDMA

AppBuffer

SMBBuffer

OSBuffer

DriverBuffer

SMBBuffer

OSBuffer

DriverBuffer

AppBuffer

SMBBuffer

rNICrNIC NIC AdapterBufferNICAdapter

BufferAdapterBuffer

AdapterBuffer

iWARP

RoCE

• Higher performance through offloading of network I/O processing onto network adapter

• Higher throughput with low latency and ability to take advantage of high-speed networks (such as RoCE, iWARP and InfiniBand*)

• Remote storage at the speed of direct storage

• Transfer rate of around 50 Gbps on a single NIC PCIe x8 port

• Compatible with SMB Multichannel for load balancing and failover

• Windows Server 2016 add support for RDMA on vNICs

*InfiniBand is not supported on SET Switch vNIC

Page 48: Azure Stack HCI

Remote Direct Memory Access (RDMA) Performance Limits

52

PCI Express Version Transfer rate Throughput

X1 x4 x8 x16

1.0 - 2,5 GT/s 250 MB/s 1 GB/s 2 GB/s 4 GB/s

2.0 - 5 GT/s 500 MB/s 2 GB/s 4 GB/s 8 GB/s

3.0 - 8 GT/s 984 MB/s ~4 GB/s ~8 GB/s ~16 GB/s

4.0 - 16 GT/s 1969 MB/s ~8 GB/s ~16 GB/s ~32 GB/s

• Performance limit of the PCI Express

Example with Mellanox Connect X3 Pro Dual-Port 40/56 Gigabit:

• PCI Express 3.0 x8 Card

• The Dual-Port will not be able to deliver 80Gb/s or 112 Gb/s

o Maximum will be around 60 Gigabit/s in 8GT/s slot

o Maximum will be around 30 Gigabit/s in 5GT/s Slot

• For best performance use two singe port card.

Page 49: Azure Stack HCI

Remote Direct Memory Access (RDMA) Technologies

53

• Infiniband (IB)

o IB

• Internet Wide Area RDMA Protocol (iWARP)

o iWARP

• RDMA over Converged Ethernet (RoCE)

o RoCE Version 1

o RoCE Version 2

Page 50: Azure Stack HCI

Remote Direct Memory Access (RDMA) Hardware

54

• Infiniband (IB)

o Mellanox

• Internet Wide Area RDMA Protocol (iWARP)

o Chelsio T580-LP-CR (10-40Gbps)

o Chelsio T62100-LP-CR (40-50-100Gbps)

o QLogic FastLinQ QL45611HLCU (100Gbps)

o Intel

• RDMA over Converged Ethernet (RoCE)

o Mellanox (Connect X3 Pro, X4 EN and X5 EN)

▪ Connect X4 LX (10-25-50Gbps )

▪ Connect X4 EN (10-25-25-100Gbps)

▪ Connect X5 EN

o Cisco (UCS VIC 1385)

o QLogic FastLinQ QL45611HLCU (100Gbps)

o Emulex/Broadcom (XE100 Series)

Page 51: Azure Stack HCI

RDMA Network Layers (iWARP and RoCE v2)

55

iWARP RoCE v.2

Page 52: Azure Stack HCI

RDMA Network Layers (RoCE v1 and v2)

56

RoCE v.2RoCE v.1

Page 53: Azure Stack HCI

Azure Stack HCI support Guest RDMA (Mode 3) on the vmNIC

Virtual Machine vmNICs (Guest RDMA)

Page 54: Azure Stack HCI

• Guest RDMA (Mode 3) on the vmNIC

• Device Manger will show both the Hyper-V Network Adapterand the VF for each vmNIC

• Driver for the pNIC/VF in the VMs

Virtual Machine vmNICs (Guest RDMA)

pSwitch

vSwitch

Host OS VM VM VM

vm

NIC

vm

NIC

vm

NIC

pN

IC

SM

B1

vN

IC

SM

B2

vN

IC

MG

MT

vN

IC

Server 01

pSwitch

SMB traffic

RDMA

VF

VF

VF

SR-IOV

Page 55: Azure Stack HCI

Mapping vNICs (vRDMA) to pNICs

59

• Needed to avoid two vNICs with vRDMA to end on one pNIC with RDMA

• By default logic uses round robin and problem might happenhttps://technet.microsoft.com/en-us/library/mt732603.aspx

Invoke-Command -ComputerName $servers -ScriptBlock {$physicaladapters=Get-NetAdapter | where status -eq up | where Name -NotLike vEthernet* | Sort-ObjectSet-VMNetworkAdapterTeamMapping –VMNetworkAdapterName "SMB1" –ManagementOS –PhysicalNetAdapterName ($physicaladapters[0]).nameSet-VMNetworkAdapterTeamMapping –VMNetworkAdapterName "SMB2" –ManagementOS –PhysicalNetAdapterName ($physicaladapters[1]).name

}

Page 56: Azure Stack HCI

Taken from snia.org: https://www.snia.org/sites/default/files/ESF/How_Ethernet_RDMA_Protocols_Support_NVMe_over_Fabrics_Final.pdf

What does “Lossless” mean anyway?

HCI enables Hyper-V Compute, S2D, and SDN to be collocated on the same host (and switchports)

But now you have congestion…

- iWARP uses TCP Transport

- RoCE uses IB Transport

- RoCE uses UDP as Tunnel

And congestion causes packet drops…

Data Center Bridging is REQUIREDfor RoCE to handle congestion

Page 57: Azure Stack HCI

Data Center Bridging (DCB)

• Data Center Bridging (DCB): Can make RoCE “lossless”

• Priority Flow Control (PFC)

• Required for RoCE

• Optional for iWARP

• Enhanced Transmission Selection (ETS)

• TX Reservation Requirements (Minimum not limits)

• ECN and DCBX not used in Windows

• Implementation Guide: https://aka.ms/ConvergedRDMA

• (Windows) Configuration Collector: https://aka.ms/Get-NetView

• (Windows) Validation Tool: https://aka.ms/Validate-DCB

• RDMA Connectivity Tool: https://aka.ms/Test-RDMA

• Not stress tool – Connectivity only!

• Instructions in the Deployment Guide

Must be configured across all network hops for RoCE to be lossless under congestion.

Page 58: Azure Stack HCI

QoS Inspection

Inspect NetQos

Use configuration guides

S2D/SDDChttps://aka.ms/ConvergedRDMA

Similar guide with developer annotationshttps://github.com/Microsoft/SDN/blob/master/Diagnostics/WS2016_ConvergedNIC_Configuration.docx

LocalAdminUser @ TK5-3WP07R0512:

PS C:\DELETEME> Get-NetAdapterQos -Name "RoCE-01" -IncludeHidden -ErrorActionSilentlyContinue | Out-String -Width 4096

Name : RoCE-01

Enabled : True

Capabilities : Hardware Current

-------- -------

MacSecBypass : NotSupported NotSupported

DcbxSupport : IEEE IEEE

NumTCs(Max/ETS/PFC) : 8/8/8 8/8/8

OperationalTrafficClasses : TC TSA Bandwidth Priorities

-- --- --------- ----------

0 ETS 39% 0-2,4,6-7

1 ETS 1% 5

2 ETS 60% 3

OperationalFlowControl : Priorities 3,5 Enabled

OperationalClassifications : Protocol Port/Type Priority

-------- --------- --------

Default 0

NetDirect 445 3

Page 59: Azure Stack HCI

Validate the host with Validate-DCB

• https://aka.ms/Validate-DCB

• Primary Benefit• Validate the expected configuration on one to

N number of systems or clusters

• Validate the configuration meets best practices

• Secondary Benefits• Doubles as DCB documentation for the

expected configuration of your systems.

• Answer "What Changed?" when faced with an operational issue

Page 60: Azure Stack HCI

Test-RDMA

• https://aka.ms/Test-RDMA

• PoSH tool to test Network Direct (RDMA)

• Ping doesn’t do it!

Page 61: Azure Stack HCI

65

>> C:\TEST\Test-RDMA.PS1 -IfIndex 3 -IsRoCE $true -RemoteIpAddress

192.168.2.111 –PathToDiskspd C:\TEST

VERBOSE: Diskspd.exe found at C:\TEST\Diskspd-

v2.0.17\amd64fre\\diskspd.exe

VERBOSE: The adapter Test-40G-2 is a physical adapter

VERBOSE: Underlying adapter is RoCE. Checking if QoS/DCB/PFC is

configured on each physical adapter(s)

VERBOSE: QoS/DCB/PFC configuration is correct.

VERBOSE: RDMA configuration is correct.

VERBOSE: Checking if remote IP address, 192.168.2.111, is reachable.

VERBOSE: Remote IP 192.168.2.111 is reachable.

VERBOSE: Disabling RDMA on adapters that are not part of this test.

RDMA will be enabled on them later.

VERBOSE: Testing RDMA traffic now for. Traffic will be sent in a

parallel job. Job details:

VERBOSE: 34251744 RDMA bytes sent per second

VERBOSE: 967346308 RDMA bytes written per second

VERBOSE: 35698177 RDMA bytes sent per second

VERBOSE: 976601842 RDMA bytes written per second

VERBOSE: Enabling RDMA on adapters that are not part of this test.

RDMA was disabled on them prior to sending RDMA traffic.

VERBOSE: RDMA traffic test SUCCESSFUL: RDMA traffic was sent to

192.168.2.111

Page 62: Azure Stack HCI
Page 63: Azure Stack HCI

Multiple RDMA NICs

Multiple NICs

Single RSS NIC

SMB Server

SMB Client

Full Throughput

• Bandwidth aggregation with multiple NICs

• Multiple CPUs cores engaged when using Receive Side Scaling (RSS)

Automatic Failover

• SMB Multichannel implements end-to-end failure detection

• Leverages NIC teaming if present, but does not require it

Automatic Configuration

• SMB detects and uses multiple network paths

SMB Multichannel

SMB Server

SMB Client

SMB Server

SMB Client

Sample Configurations

Team of NICs

SMB Server

SMB Client

NIC Teaming

NIC Teaming

Switch10GbE

NIC10GbE

NIC10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch10GbE/IB

NIC10GbE/IB

NIC10GbE/IB

Switch10GbE/IB

NIC10GbE/IB

NIC10GbE/IB

Switch10GbE

RSS

RSS

Page 64: Azure Stack HCI

SMB Multichannel – Single 10GbE NIC1 session, without Multichannel

SMB Server

SMB Client

Switch10GbE

NIC10GbE

NIC10GbE

CPU utilization per core

Core 1 Core 2 Core 3 Core 4RSS

RSS

Page 65: Azure Stack HCI

SMB Server

SMB Client

SMB Multichannel – Single 10GbE NIC

• No failover

• Can’t use full 10Gbps

o Only one TCP/IP connection

o Only one CPU core engaged

• No failover

• Full 10Gbps available

o Multiple TCP/IP connections

o Receive Side Scaling (RSS) helps distribute load across CPU cores

1 session, without Multichannel

Switch10GbE

NIC10GbE

NIC10GbE

SMB Server

SMB Client

Switch10GbE

NIC10GbE

NIC10GbE

CPU utilization per core

Core 1 Core 2 Core 3 Core 4

CPU utilization per core

Core 1 Core 2 Core 3 Core 4

RSS

RSS

RSS

RSS

Page 66: Azure Stack HCI

1 session, without Multichannel

SMB Multichannel – Multiple NICs

SMB Server 1

SMB Client 1

Switch10GbE

SMB Server 2

SMB Client 2

NIC10GbE

NIC10GbE

Switch10GbE

NIC10GbE

NIC10GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

RSS RSS

RSS RSS

No automatic failover

Can’t use full bandwidth

Only one NIC engaged

Only one CPU core engaged

Page 67: Azure Stack HCI

1 session, with Multichannel1 session, without Multichannel

• No automatic failover

• Can’t use full bandwidth

o Only one NIC engaged

o Only one CPU core engaged

SMB Multichannel – Multiple NICs

• Automatic NIC failover

• Combined NIC bandwidth available

o Multiple NICs engaged

o Multiple CPU cores engaged

SMB Server 1

SMB Client 1

Switch10GbE

SMB Server 2

SMB Client 2

NIC10GbE

NIC10GbE

Switch10GbE

NIC10GbE

NIC10GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

SMB Server 1

SMB Client 1

Switch10GbE

SMB Server 2

SMB Client 2

NIC10GbE

NIC10GbE

Switch10GbE

NIC10GbE

NIC10GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

RSS RSS

RSS RSS

RSS RSS

RSS RSS

Page 68: Azure Stack HCI

• Linear bandwidth scaling

o 1 NIC – 1150 MB/sec

o 2 NICs – 2330 MB/sec

o 3 NICs – 3320 MB/sec

o 4 NICs – 4300 MB/sec

• Leverages NIC support for RSS (Receive Side Scaling)

• Bandwidth for small IOs is bottlenecked on CPU

SMB Multichannel Performance

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

512 1024 4096 8192 16384 32768 65536 131072 262144 524288 1048576

MB

/sec

I/O Size

SMB Client Interface Scaling - Throughput

1 x 10GbE 2 x 10GbE 3 x 10GbE 4 x 10GbE

Page 69: Azure Stack HCI

1 session, with NIC Teaming, no MC

SMB Multichannel + NIC Teaming

SMB Server 2

SMB Client 1

Switch1GbE

SMB Server 2

SMB Client 2

NIC1GbE

NIC1GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC Teaming

NIC Teaming

NIC Teaming

NIC Teaming

• Automatic NIC failover

• Can’t use full bandwidth

o Only one NIC engaged

o Only one CPU core engaged

Page 70: Azure Stack HCI

1 session, with NIC Teaming and MC1 session, with NIC Teaming, no MC

• Automatic NIC failover

• Can’t use full bandwidth

o Only one NIC engaged

o Only one CPU core engaged

SMB Multichannel + NIC Teaming

• Automatic NIC failover (faster with NIC Teaming)

• Combined NIC bandwidth available

o Multiple NICs engaged

o Multiple CPU cores engaged

SMB Server 1

SMB Client 1

SMB Server 2

SMB Client 2

NIC Teaming

NIC Teaming

NIC Teaming

NIC Teaming

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch1GbE

NIC1GbE

NIC1GbE

SMB Server 2

SMB Client 1

Switch1GbE

SMB Server 2

SMB Client 2

NIC1GbE

NIC1GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC Teaming

NIC Teaming

NIC Teaming

NIC Teaming

Page 71: Azure Stack HCI

SMB Direct and SMB Multichannel

SMB Server 2

SMB Client 2

SMB Server 1

SMB Client 1

Switch10GbE

Switch10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

1 session, without Multichannel

• No automatic failover

• Can’t use full bandwidth

o Only one NIC engaged

o RDMA capability not used

Page 72: Azure Stack HCI

1 session, with Multichannel1 session, without Multichannel

• No automatic failover

• Can’t use full bandwidth

o Only one NIC engaged

o RDMA capability not used

SMB Direct and SMB Multichannel

• Automatic NIC failover

• Combined NIC bandwidth available

o Multiple NICs engaged

o Multiple RDMA connections

SMB Server 2

SMB Client 2

SMB Server 1

SMB Client 1

SMB Server 2

SMB Client 2

SMB Server 1

SMB Client 1

Switch10GbE

Switch10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Switch10GbE

Switch10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Page 73: Azure Stack HCI
Page 74: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV)

78

• “Network Switch in the Network Adapter”

• Directly I/0 to VMs vNICsRemoves CPU from the process of moving datato/from a VMs. Data is DMA´d directly to/fromthe VM without the Virtual Switch "touching" it

• High I/O workloads

• New in Windows Server 2016Support for SR-IOV in the SET Switch

Page 75: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV)

79

HostVirtual Machine

Virtual Machine Network Stack

Synthetic NIC

Hyper-VExtensible Switch

SR-IOV NIC VF

Virtual Function

VFVF

• Directly I/0 to NIC

• High I/O workloads

Requires

• SR-IOV capable NICs

• Windows server 2012 or higher VMs

• SET Switch

Benefits

• Maximizes use of host system processors and memory

• Reduces host CPU overhead for processing network traffic(by up to 50%)

• Reduces network latency(by up to 50%)

• Provides higher network throughput(by up to 30%)

• Full support for Live Migration

Page 76: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV): Installation Workflow

80

2. Check NIC firmware and drivers to support SR-IOV

5. Install NIC drivers on the host

Check driver advanced properties to ensure SRIOV is enabled at the

driver level.

1. Check with the server vendor if the chipset support SR-IOV

Note that some older systems will have an SR-IOV menu item in their

BIOS, but did not enable all of the necessary IOMMU functionality

needed for Windows Server 2012

6. Create SR-IOV enabled Hyper-V Switch

3. Enable SR-IOV on the Server BIOS

7. Enable SR-IOV for each VM we want to use with (default)

4. Enabler SR-IOV on the NIC BIOS. Here we define the amount of

VF per NIC

8. Install NIC driver inside the VM

Page 77: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV): SETSwitch

81

Windows Server 2016

• Support of SR-IOV in SET Switch

• Host pNIC team with the SET Switch

• No need for Guest Team, only one vNIC in VM

Page 78: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV): SETSwitch

82

Windows Server 2016

• Support of SR-IOV in SET Switch

• Host pNIC team with the SET Switch

• No need for Guest Team, only one vNIC in VM

Page 79: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV): VM Performance

83

SR-IOV vNIC Performance

VM Guest with 4 VPs

30 Gigabit (Mellanox Connect X3-Pro 40G)

3100 MB/s

Page 80: Azure Stack HCI

84

Page 81: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV): VM Performance

85

SR-IOV vNIC Performance

VM Guest with 8 VPs

40 Gigabit (Mellanox Connect X3-Pro 40G)

4510 MB/s

Page 82: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV): VM Performance

86

• Windows Server 2012 R2 VM Guest

• Performance example with SR-IOV vNIC

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 8 VPs in the VM Guest (vRSS no enabled)

• VM use only Core 2 (no vRSS in the VM Guest)

• Test with Microsoft ctsTraffic.exe (send/receive)

Page 83: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV): VM Performance

87

• Windows Server 2012 R2 VM Guest

• Performance example with SR-IOV vNIC

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 8 VPs in the VM Guest (vRSS Enabled)

• VM use only HT cores

• Test with Microsoft ctsTraffic.exe (send/receive)

Page 84: Azure Stack HCI

Single Root I/O Virtualization (SR-IOV): VM Performance

88

• Windows Server 2012 R2 VM Guest

• Performance example with SR-IOV vNIC

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 16 VPs in the VM Guest (vRSS Enabled)

• VM use all cores

• Test with Microsoft ctsTraffic.exe (send/receive)

Page 85: Azure Stack HCI

SR-IOV and Virtual Machine Mobility

89

Live Migration, Quick Migration, and Snapshots are supported by SR-IOV mobility

1

2

3

VF is presented to the Virtual Machine using SR-IOV on the

source Hyper-V Host

Set-VMNetworkAdapter VM –IOVWeight 1

VF on the VM is removed once migration is started

Set-VMNetworkAdapter VM –IOVWeight 0

• VF will be presented again on the Destination Host if it

supports SR-IOV.

• Network connectivity will continue without SR-IOV if

Destination Host does not support it

Set-VMNetworkAdapter VM –IOVWeight 1

VF

PF

PF

VF

VF

PF

Page 86: Azure Stack HCI

Troubleshooting SR-IOV

90

Hyper-V Manager Networking tab on each

VM will show if SR-IOV is not operational1

Event Viewer also indicates if some error

exists, enabling SR-IOV on the VM network

adapter. Check the Hyper-V SynthNIC log3

Using Windows PowerShell, we can validate

why SR-IOV is not operational. In this

example, the BIOS and the NIC does not

support SR-IOV

PS C:\Windows\system32> (Get-VMHost).iovsupport False PS C:\Windows\system32> (Get-VMHost).iovsupportreasons Ensure that the system has chipset support for SR-IOV and that I/O virtualization is enabled in the BIOS. The chipset on the system does not do DMA remapping, without which SR-IOV cannot be supported. The chipset on the system does not do interrupt remapping, without which SR-IOV cannot be supported. To use SR-IOV on this system, the system BIOS must be updated to allow Windows to control PCI Express. Contact your system manufacturer for an update. SR-IOV cannot be used on this system as the PCI Express hardware does not support Access Control Services (ACS) at any root port. Contact your system vendor for further information. PS C:\Windows\system32>

2

Page 87: Azure Stack HCI
Page 88: Azure Stack HCI

Dynamic VMMQ

Dynamic ~5Gbps

Dynamic ~20Gbps

Static

WS2016

Multiple VMQs for the same virtual NIC (VMMQ)

Statically assigned queues

WS2019\Azure Stack HCI

Autotunes queues to:

Maximize virtual NIC throughput

Maintain consistent virtual NIC throughput

Maximize Host CPU efficiency

Premium Certified Adapters Required

Supports Windows and Linux Guests

Try it out! (without special drivers)

https://aka.ms/DVMMQ-Validation

Page 89: Azure Stack HCI

Virtual Machine Multi-Queue (VMMQ)

93

• Feature that allows Network traffic for a VM to be spread across multiple queues

• VMMQ is the evolution of VMQ with Software vRSS

• High traffic VMs, will benefit from the CPU load spreading that multiple queues can provide.

• Default disabled in the SET Switch

• Default disabled in the VM Guest

Page 90: Azure Stack HCI

Virtual Machine Multi-Queue (VMMQ)

94

For VMMQ for a VM to be enabled, RSS in the VM needs to be enabled

• VrssEnabled: true

• VmmqEnabled: false (Off by default)

• VmmqQueuePairs: 16

• Use VMQ for low Traffic VMs

• Use VMMQ for high Traffic VMs

Page 91: Azure Stack HCI

Virtual Machine Multi-Queue (VMMQ) Performance

95

Performance example with VMQ vs. VMMQ

Hardware used

• Mellanox Connect X3-Pro 40 Gigabit

• Dell T430 with Intel E5-2620v4

Software used:

• Microsoft NTttcp.exe

• Microsoft ctsTraffic.exe

Note:

• VM111-VM140 Windows Server 2016

• VM151-VM152 Windows Server 2012R2

Page 92: Azure Stack HCI

Virtual Machine Multi-Queue (VMQ) Performance

96

• Performance example with VMMQ Disabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest

• VM use VMQ BaseProcessor 2

Page 93: Azure Stack HCI

Virtual Machine Multi-Queue (VMMQ) Performance

97

• Performance example with VMMQ enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest

• 4 Queues used (VMMQ use 4 core from 8 to 14)

Page 94: Azure Stack HCI

Virtual Machine Multi-Queue (VMMQ) Performance

98

• Performance example with VMMQ enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 8 VPs in the VM Guest

• 8 Queues used (VMQ use 8 cores from 0 to 14)

Page 95: Azure Stack HCI

Virtual Machine Multi-Queue (VMMQ) Performance

99

• Performance example with VMMQ enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 8 VPs in the VM Guest

• 7 Queues used (VMQ use 7 cores from 2 to 14)

• Test with Microsoft ctsTraffic.exe (send/receive)

Page 96: Azure Stack HCI

Virtual Machine Multi-Queue (VMQ) Performance

100

• Windows Server 2012 R2 VM Guest

• Performance example with VMMQ Disabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest (vRSS default Disabled)

• VM use Core 10 (100%)

Page 97: Azure Stack HCI

Virtual Machine Multi-Queue (VMQ) Performance

101

• Windows Server 2012 R2 VM Guest

• Performance example with VMMQ Disabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest (vRSS Enabled in VM)

• VM use Core 10 (100%)

Page 98: Azure Stack HCI

Virtual Machine Multi-Queue (VMMQ) Performance

102

• Windows Server 2012 R2 VM Guest

• Performance example with VMMQ Enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest (vRSS Enabled in VM)

• VM use Core 2-14

Page 99: Azure Stack HCI

Virtual Machine Multi-Queue (VMMQ) Performance

103

• Windows Server 2012 R2 VM Guest

• Performance example with VMMQ Enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest (vRSS Enabled in VM)

• VM use Core 2-14

• Test with Microsoft ctsTraffic.exe (send/receive)

Page 100: Azure Stack HCI
Page 101: Azure Stack HCI

Virtual Receive-side Scaling (vRSS)

105

• vRSS default enabled in Windows Server 2016 and higher VMs

• vRSS is supported on Host vNIC

• vRSS works with VMQ or VMMQ(VMMQ = use RSS queues in hardware)

• Not compatible with SR-IOV vmNIC

Page 102: Azure Stack HCI

Virtual RSS Azure Stack

106

Node 0 Node 1 Node 2 Node 3

2

2

3

3

1

1

0

0

Incoming

packets

vProcvProcvProcvProc

vNIC

• vRSS provides near line rate to a Virtual Machine on existing hardware, making it possible to virtualize traditionally network intensive physical workloads

• Maximizes resource utilization by spreading Virtual Machine traffic across multiple virtual processors

• Helps virtualized systems reach higher speeds with 10 to 100 GbpsNICs

• Requires no hardware upgrade and works with any NICs that support RSS

Page 103: Azure Stack HCI

Virtual RSS

107

• Supported Guest OS with the latest Integration Services Installed

• NIC must support RSS and VMQ

• VMQ must be enabled

• SR-IOV must be disabled for the Network Card using vRSS

• RSS enabled inside the Guest

• Enable-NetAdapterRSS -Name "AdapterName"

• RSS configuration inside the Guest is Required (same as physical computer)


Recommended