Azure Stack HCI

Post on 23-Oct-2021

7 views 0 download

transcript

Azure Stack HCIThe best infrastructure for hybrid

Module 3Core Networking

Core Networking

Learnings Covered in this Unit

Simplifying the Network

Network Deployment Options

The Virtual Switch

Acceleration Technologies

Virtual Network Adapters

Ph

ysic

al N

IC

Virtual SWITCH

Ph

ysi

cal N

IC

Virtual SWITCH

Ho

st v

NIC

MGMT SMB1 SMB2

Gu

est

s

Networ

k

VLAN

ID

QOS

Weight

Acceler

ations

Manage

ment

0 5

Storage

A

5 DCB 50 vRDMA

Storage

B

6 DCB 50 vRDMA

Guests 10-99 1-5 SR-IOV

Three Types of Core Networks

Management

• Part of North-South

Network

• Used for Host

Communication

Compute

• Part of North-South

Network

• Virtual Machine Traffic

• Needs varying levels of

QOS

• May need SR-IOV,

vRDMA

Storage

• East-West Only

• Needs RDMA

• 10GB+

• Can host Live Migration

Traffic in HCICluster Heartbeats & Inter-Node comms

[SMB] Storage Bus Layer

[SMB] Cluster Shared Volume

[SMB] Storage Rebuild

[SMB, possibly] Live Migrations

Generally RDMA Traffic

East West

Traffic in S2DExternal (to the S2D cluster)

VM Tenant traffic

Could be any protocol

North

South

pNIC = Physical NIC on the Host

vNIC = Host Hyper-V Virtual Network Adapter

vmNIC = Virtual Machine Hyper-V Virtual Network Adapter

tNIC = Microsoft Team Network Interface Controller (vlan tagged in the LBFO Team)

NIC Terminology Refresher

Cluster Network Deployment Options

Converged Network

Combining Multiple Network Intents (MGMT, Compute, Storage)

Best if deploying 3+ Physical Nodes

Connect pNics to Top-Of-Rack Switches

ROCEv2 Highly Recommended

Switchless

North-South Communication is a Team, combining Compute and Management Networks

Storage (E-W) is directly connected Node to Node

iWarp Recommended

No need to configure Data-Center Bridging (DCB)Features

Really only for HCI Clusters with 2 Physical Nodes

Hybrid

Best of both, easy deployment of Compute/Mgmt on North-South

Separate Storage Nics into separate adapters, not teamed

iWarp or ROCE it doesn’t matter

DCB Config not required, but recommended.

Converged

CSV

Host vNICMGMT

Host vNIC

SET VM Switch

10GBHyper-V HostPhysical NIC

10GBHyper-V HostPhysical NIC

LM

Host vNIC

VM1 VM2 VM3

SMB Multichannel

Switchless

Top of Rack Switch

SET vSwitch SET vSwitch

SMB1-10GB

SMB2-10GB

Hybrid

MGMT

Host vNIC

SET VM Switch

10GBHyper-V HostPhysical NIC

10GBHyper-V HostPhysical NIC

VM1 VM2 VM3Top of Rack Switch

Top of Rack Switch

10GBHyper-V HostPhysical NIC

10GBHyper-V HostPhysical NIC

SMB1 SMB2

Networking Stack overview

Azure Stack HCI Node

VM Storage

SMB

Host Partition

VM

DCB

pNIC

VM

DCB

pNIC

Hyper-V Switch (SDN)

Integrated NIC Teaming

Legend:RDMA

TCP/IP

Networking Stack overview

• Virtual Switch

• ManagementOS NICs

• VM NICs

Azure Stack HCI Node

VM Storage

SMB

Host Partition

VM

DCB

pNIC

VM

DCB

pNIC

Hyper-V Switch (SDN)

Integrated NIC Teaming

Legend:RDMA

TCP/IP

Networking Stack overview

• Physical NICs

• ManagementOS NICs

• VM NICs

Azure Stack HCI

VM Storage

SMB

Host Partition

VM

DCB

pNIC

VM

DCB

pNIC

Hyper-V Switch (SDN)

Integrated NIC Teaming

Legend:RDMA

TCP/IP

High availability

Load Balancing and Failover (LBFO): Switch Embedded Teaming (SET):

Switch Switch

NIC NIC

Team

vSwitch

vNIC vNIC vNIC vmNIC

Switch Switch

NIC NIC

vSwitch

vNIC vNIC vNIC vmNIC

tNIC

SET Switch benefits

Switch Embedded Teaming•

Switch Switch

NIC NIC

vSwitch

vNIC vNIC vNIC vmNIC

SET Switch limitations

• Network adapters has to be same

• Hyper-V/Dynamic load balancing only

• Switch Independent only

• LACP/Static not supported

• Active/Passive not supported

Switch Embedded Teaming

Switch Switch

NIC NIC

vSwitch

vNIC vNIC vNIC vmNIC

New-VMSwitch -Name SETSwitch -EnableEmbeddedTeaming $TRUE -NetAdapterName(Get-NetIPAddress -IPAddress 10.* ).InterfaceAlias

• Automatically creates one management network adapter

• InterfaceAlias can be also queried with commands such as (Get-Netadapter -InterfaceDescription Mellanox*).InterfaceAlias to select just some model

PowerShell command

Network Quality of Service

Bandwidth Mode

New-VMSwitch -Name "vSwitch" -AllowManagementOS $true -NetAdapterName NIC1-MinimumBandwidthMode <Absolute | Default | None | Weight >

New-VMSwitch -Name "vSwitch" -AllowManagementOS $true -NetAdapterName NIC1-MinimumBandwidthMode Weight

New-VMSwitch -Name "vSwitch" -AllowManagementOS $true -NetAdapterName NIC1-MinimumBandwidthMode Absolute

Warning!

Common Networking Challengesin Azure Stack HCI

Deployment Time Complexity Error Prone

Common Networking Challengesin Azure Stack HCI

Deployment Time Complexity Error Prone

Network ATC

New host management service on Azure Stack HCI

Install-WindowsFeature -Name NetworkATC

Available for All Azure Stack HCI Subscribers (via feature update) in 2021

Intent

Complexity: HCI Converged Example

pNIC

Default OS

Host vNIC

Guest

Management VLAN 100

Storage VLAN 1 711

Storage VLAN 2 712

Storage MTU 9K

Cluster Traffic Class 7

Cluster Bandwidth Reservation 1%

RDMA Traffic Class 3

RDMA Bandwidth Reservation 50%

Rename-NetAdapter -Name <OldName> -NewName ConvergedPNIC1

Set-NetAdapterAdvancedProperty -Name ConvergedPNIC1 -RegistryKeyword VLANID -RegistryValue 0

New-VMSwitch -Name ConvergedSwitch -AllowManagementOS $false -EnableIov $true -EnableEmbeddedTeaming $true -

NetAdapterName ConvergedPNIC1

Rename-NetAdapter -Name <OldName> -NewName ConvergedPNIC2

Set-NetAdapterAdvancedProperty -Name ConvergedPNIC2 -RegistryKeyword VLANID -RegistryValue 0

Add-VMSwitchTeamMember -VMSwitchName ConvergedSwitch -NetAdapterName ConvergedPNIC2

Set-NetAdapterRss -Name ConvergedPNIC1 -NumberOfReceiveQueues 16 -MaxProcessors 16 -BaseProcessorNumber 2 -

MaxProcessorNumber 19

Set-NetAdapterRss -Name ConvergedPNIC2 -NumberOfReceiveQueues 16 -MaxProcessors 16 -BaseProcessorNumber 2 -

MaxProcessorNumber 19

Complexity: Physical NICs, vSwitch, and VMQ

Add-VMNetworkAdapter -ManagementOS -SwitchName ConvergedSwitch -Name Management

Rename-NetAdapter –Name *Management* -NewName Management

New-NetIPAddress -InterfaceAlias Management -AddressFamily IPv4 -IPAddress 192.168.0.51 -PrefixLength 24 -DefaultGateway

192.168.0.1

Set-VMNetworkAdapterIsolation -ManagementOS –VMNetworkAdapterName Management -AllowUntaggedTraffic $True -IsolationMode

VLAN -DefaultIsolationID 10

Add-VMNetworkAdapter -ManagementOS -SwitchName ConvergedSwitch -Name SMB01

Rename-NetAdapter –Name *SMB01* -NewName SMB01

Set-VMNetworkAdapterIsolation -ManagementOS –VMNetworkAdapterName SMB01 -AllowUntaggedTraffic $True -IsolationMode VLAN -

DefaultIsolationID 11

Set-VMNetworkAdapterTeamMapping -ManagementOS -SwitchName ConvergedSwitch -VMNetworkAdapterName SMB01 -

PhysicalNetAdapterName ConvergedPNIC1

Set-DnsClient -InterfaceAlias *SMB01* -RegisterThisConnectionsAddress $true

Complexity: Virtual NICs (Mgmt and SMB01)

Add-VMNetworkAdapter -ManagementOS -SwitchName ConvergedSwitch -Name SMB02

Rename-NetAdapter –Name *SMB02* -NewName SMB02

Set-VMNetworkAdapterIsolation -ManagementOS –VMNetworkAdapterName SMB02 -AllowUntaggedTraffic $True -

IsolationMode VLAN -DefaultIsolationID 12

Set-VMNetworkAdapterTeamMapping -ManagementOS -SwitchName ConvergedSwitch -VMNetworkAdapterName SMB02 -

PhysicalNetAdapterName ConvergedPNIC2

Set-DnsClient -InterfaceAlias *SMB02* -RegisterThisConnectionsAddress $true

New-NetIPAddress -InterfaceAlias *SMB01* -AddressFamily IPv4 -IPAddress 192.168.1.1 -PrefixLength 24

New-NetIPAddress -InterfaceAlias *SMB02* -AddressFamily IPv4 -IPAddress 192.168.2.1 -PrefixLength 24

Complexity: Virtual NICs (SMB02)

Install-WindowsFeature -Name Data-Center-Bridging

New-NetQosPolicy -Name 'Cluster' -Cluster -PriorityValue8021Action 7

New-NetQosTrafficClass -Name 'Cluster' -Priority 7 -BandwidthPercentage 1 -Algorithm ETS

New-NetQosPolicy -Name 'SMB' -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3

New-NetQosTrafficClass 'SMB' -Priority 3 -BandwidthPercentage 50 -Algorithm ETS

New-NetQosPolicy -Name 'DEFAULT' -Default -PriorityValue8021Action 0

Disable-NetQosFlowControl -Priority 0, 1, 2, 4, 5, 6, 7

Enable-NetQosFlowControl -Priority 3

Set-NetQosDcbxSetting -InterfaceAlias ConvergedPNIC1 -Willing $False

Set-NetQosDcbxSetting -InterfaceAlias ConvergedPNIC2 -Willing $False

Enable-NetAdapterQos -InterfaceAlias ConvergedPNIC1, ConvergedPNIC2

<< Customer must now get the physical fabric configured to match these settings >>

Complexity: Configure DCB for Storage NICs

> 30+ cmdlets…

> 90+ parameters…

Match Settings on Switch

Repeat Exactly on Node 2, 3, 4…

Repeat Exactly on cluster a, b, c…

Goals

• Deploy your Network Host through only a few commands

• Don’t worry about turning every knob

• Don’t worry about changed defaults between OS’

• Don’t worry about latest best practices

• Don’t worry about it changing (configuration drift)

You have enough to worry about

Add-NetIntent -Management -Compute -Storage -ClusterName HCI01 -AdapterName pNIC1, pNIC2

Networking in Azure Stack HCIwith Network ATC

Deployment Time Complexity Error Prone

Summary: Network ATC

Intent-based host network deployment

Deploy the whole cluster with ~1 command

Easily replicate the same configuration to another cluster

Outcome driven; we’ll handle default OS changes

Always deployed with the latest, Microsoft supported and validated best practices

You stay in control with overrides

Auto-remediates configuration drift

Available in Azure Stack HCI 21H2

Networking Stack overview

Components

• Physical NICs

• Virtual Switch

Supporting technologies

• LBFO Teaming/SET

• Offloading technologies

• SMB Direct (RDMA)

• …

Azure Stack HCI Node

VM Storage

SMB

Host Partition

VM

DCB

pNIC

VM

DCB

pNIC

Hyper-V Switch (SDN)

Integrated NIC Teaming

Legend:RDMA

TCP/IP

Management OS vNICs

Almost same as Management OS NICs, except connected to VMs

• Azure Stack HCI supports Guest RDMA on the vmNIC

• You can use SR-IOV in VMs (More information in the SR-IOV slides)

Virtual Machine vmNICs

RDMA (Remote Direct Memory Access)• Typically East to West traffic

• Transfer data from an application (SMB) to pre-allocated memory of another system

• Low Latency; high throughput; minimal host CPU processingUse Diskspd to test (must leverage SMB over the network)

• Two predominant RDMA “transports” on WindowsiWARP – S2D Recommended (lossless OOB)

RoCE (RDMA over Converged Ethernet) (lossless with DCB)

Vendor iWARP RoCE

Broadcom No Yes

Cavium Yes Yes

Chelsio Yes No

Intel Yes No

Mellanox No Yes

Remote Direct Memory Access (RDMA) Traffic Flow

51

File Client

SMBBuffer

File Server

With RDMAWithout RDMA

AppBuffer

SMBBuffer

OSBuffer

DriverBuffer

SMBBuffer

OSBuffer

DriverBuffer

AppBuffer

SMBBuffer

rNICrNIC NIC AdapterBufferNICAdapter

BufferAdapterBuffer

AdapterBuffer

iWARP

RoCE

• Higher performance through offloading of network I/O processing onto network adapter

• Higher throughput with low latency and ability to take advantage of high-speed networks (such as RoCE, iWARP and InfiniBand*)

• Remote storage at the speed of direct storage

• Transfer rate of around 50 Gbps on a single NIC PCIe x8 port

• Compatible with SMB Multichannel for load balancing and failover

• Windows Server 2016 add support for RDMA on vNICs

*InfiniBand is not supported on SET Switch vNIC

Remote Direct Memory Access (RDMA) Performance Limits

52

PCI Express Version Transfer rate Throughput

X1 x4 x8 x16

1.0 - 2,5 GT/s 250 MB/s 1 GB/s 2 GB/s 4 GB/s

2.0 - 5 GT/s 500 MB/s 2 GB/s 4 GB/s 8 GB/s

3.0 - 8 GT/s 984 MB/s ~4 GB/s ~8 GB/s ~16 GB/s

4.0 - 16 GT/s 1969 MB/s ~8 GB/s ~16 GB/s ~32 GB/s

• Performance limit of the PCI Express

Example with Mellanox Connect X3 Pro Dual-Port 40/56 Gigabit:

• PCI Express 3.0 x8 Card

• The Dual-Port will not be able to deliver 80Gb/s or 112 Gb/s

o Maximum will be around 60 Gigabit/s in 8GT/s slot

o Maximum will be around 30 Gigabit/s in 5GT/s Slot

• For best performance use two singe port card.

Remote Direct Memory Access (RDMA) Technologies

53

• Infiniband (IB)

o IB

• Internet Wide Area RDMA Protocol (iWARP)

o iWARP

• RDMA over Converged Ethernet (RoCE)

o RoCE Version 1

o RoCE Version 2

Remote Direct Memory Access (RDMA) Hardware

54

• Infiniband (IB)

o Mellanox

• Internet Wide Area RDMA Protocol (iWARP)

o Chelsio T580-LP-CR (10-40Gbps)

o Chelsio T62100-LP-CR (40-50-100Gbps)

o QLogic FastLinQ QL45611HLCU (100Gbps)

o Intel

• RDMA over Converged Ethernet (RoCE)

o Mellanox (Connect X3 Pro, X4 EN and X5 EN)

▪ Connect X4 LX (10-25-50Gbps )

▪ Connect X4 EN (10-25-25-100Gbps)

▪ Connect X5 EN

o Cisco (UCS VIC 1385)

o QLogic FastLinQ QL45611HLCU (100Gbps)

o Emulex/Broadcom (XE100 Series)

RDMA Network Layers (iWARP and RoCE v2)

55

iWARP RoCE v.2

RDMA Network Layers (RoCE v1 and v2)

56

RoCE v.2RoCE v.1

Azure Stack HCI support Guest RDMA (Mode 3) on the vmNIC

Virtual Machine vmNICs (Guest RDMA)

• Guest RDMA (Mode 3) on the vmNIC

• Device Manger will show both the Hyper-V Network Adapterand the VF for each vmNIC

• Driver for the pNIC/VF in the VMs

Virtual Machine vmNICs (Guest RDMA)

pSwitch

vSwitch

Host OS VM VM VM

vm

NIC

vm

NIC

vm

NIC

pN

IC

SM

B1

vN

IC

SM

B2

vN

IC

MG

MT

vN

IC

Server 01

pSwitch

SMB traffic

RDMA

VF

VF

VF

SR-IOV

Mapping vNICs (vRDMA) to pNICs

59

• Needed to avoid two vNICs with vRDMA to end on one pNIC with RDMA

• By default logic uses round robin and problem might happenhttps://technet.microsoft.com/en-us/library/mt732603.aspx

Invoke-Command -ComputerName $servers -ScriptBlock {$physicaladapters=Get-NetAdapter | where status -eq up | where Name -NotLike vEthernet* | Sort-ObjectSet-VMNetworkAdapterTeamMapping –VMNetworkAdapterName "SMB1" –ManagementOS –PhysicalNetAdapterName ($physicaladapters[0]).nameSet-VMNetworkAdapterTeamMapping –VMNetworkAdapterName "SMB2" –ManagementOS –PhysicalNetAdapterName ($physicaladapters[1]).name

}

Taken from snia.org: https://www.snia.org/sites/default/files/ESF/How_Ethernet_RDMA_Protocols_Support_NVMe_over_Fabrics_Final.pdf

What does “Lossless” mean anyway?

HCI enables Hyper-V Compute, S2D, and SDN to be collocated on the same host (and switchports)

But now you have congestion…

- iWARP uses TCP Transport

- RoCE uses IB Transport

- RoCE uses UDP as Tunnel

And congestion causes packet drops…

Data Center Bridging is REQUIREDfor RoCE to handle congestion

Data Center Bridging (DCB)

• Data Center Bridging (DCB): Can make RoCE “lossless”

• Priority Flow Control (PFC)

• Required for RoCE

• Optional for iWARP

• Enhanced Transmission Selection (ETS)

• TX Reservation Requirements (Minimum not limits)

• ECN and DCBX not used in Windows

• Implementation Guide: https://aka.ms/ConvergedRDMA

• (Windows) Configuration Collector: https://aka.ms/Get-NetView

• (Windows) Validation Tool: https://aka.ms/Validate-DCB

• RDMA Connectivity Tool: https://aka.ms/Test-RDMA

• Not stress tool – Connectivity only!

• Instructions in the Deployment Guide

Must be configured across all network hops for RoCE to be lossless under congestion.

QoS Inspection

Inspect NetQos

Use configuration guides

S2D/SDDChttps://aka.ms/ConvergedRDMA

Similar guide with developer annotationshttps://github.com/Microsoft/SDN/blob/master/Diagnostics/WS2016_ConvergedNIC_Configuration.docx

LocalAdminUser @ TK5-3WP07R0512:

PS C:\DELETEME> Get-NetAdapterQos -Name "RoCE-01" -IncludeHidden -ErrorActionSilentlyContinue | Out-String -Width 4096

Name : RoCE-01

Enabled : True

Capabilities : Hardware Current

-------- -------

MacSecBypass : NotSupported NotSupported

DcbxSupport : IEEE IEEE

NumTCs(Max/ETS/PFC) : 8/8/8 8/8/8

OperationalTrafficClasses : TC TSA Bandwidth Priorities

-- --- --------- ----------

0 ETS 39% 0-2,4,6-7

1 ETS 1% 5

2 ETS 60% 3

OperationalFlowControl : Priorities 3,5 Enabled

OperationalClassifications : Protocol Port/Type Priority

-------- --------- --------

Default 0

NetDirect 445 3

Validate the host with Validate-DCB

• https://aka.ms/Validate-DCB

• Primary Benefit• Validate the expected configuration on one to

N number of systems or clusters

• Validate the configuration meets best practices

• Secondary Benefits• Doubles as DCB documentation for the

expected configuration of your systems.

• Answer "What Changed?" when faced with an operational issue

Test-RDMA

• https://aka.ms/Test-RDMA

• PoSH tool to test Network Direct (RDMA)

• Ping doesn’t do it!

65

>> C:\TEST\Test-RDMA.PS1 -IfIndex 3 -IsRoCE $true -RemoteIpAddress

192.168.2.111 –PathToDiskspd C:\TEST

VERBOSE: Diskspd.exe found at C:\TEST\Diskspd-

v2.0.17\amd64fre\\diskspd.exe

VERBOSE: The adapter Test-40G-2 is a physical adapter

VERBOSE: Underlying adapter is RoCE. Checking if QoS/DCB/PFC is

configured on each physical adapter(s)

VERBOSE: QoS/DCB/PFC configuration is correct.

VERBOSE: RDMA configuration is correct.

VERBOSE: Checking if remote IP address, 192.168.2.111, is reachable.

VERBOSE: Remote IP 192.168.2.111 is reachable.

VERBOSE: Disabling RDMA on adapters that are not part of this test.

RDMA will be enabled on them later.

VERBOSE: Testing RDMA traffic now for. Traffic will be sent in a

parallel job. Job details:

VERBOSE: 34251744 RDMA bytes sent per second

VERBOSE: 967346308 RDMA bytes written per second

VERBOSE: 35698177 RDMA bytes sent per second

VERBOSE: 976601842 RDMA bytes written per second

VERBOSE: Enabling RDMA on adapters that are not part of this test.

RDMA was disabled on them prior to sending RDMA traffic.

VERBOSE: RDMA traffic test SUCCESSFUL: RDMA traffic was sent to

192.168.2.111

Multiple RDMA NICs

Multiple NICs

Single RSS NIC

SMB Server

SMB Client

Full Throughput

• Bandwidth aggregation with multiple NICs

• Multiple CPUs cores engaged when using Receive Side Scaling (RSS)

Automatic Failover

• SMB Multichannel implements end-to-end failure detection

• Leverages NIC teaming if present, but does not require it

Automatic Configuration

• SMB detects and uses multiple network paths

SMB Multichannel

SMB Server

SMB Client

SMB Server

SMB Client

Sample Configurations

Team of NICs

SMB Server

SMB Client

NIC Teaming

NIC Teaming

Switch10GbE

NIC10GbE

NIC10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch10GbE/IB

NIC10GbE/IB

NIC10GbE/IB

Switch10GbE/IB

NIC10GbE/IB

NIC10GbE/IB

Switch10GbE

RSS

RSS

SMB Multichannel – Single 10GbE NIC1 session, without Multichannel

SMB Server

SMB Client

Switch10GbE

NIC10GbE

NIC10GbE

CPU utilization per core

Core 1 Core 2 Core 3 Core 4RSS

RSS

SMB Server

SMB Client

SMB Multichannel – Single 10GbE NIC

• No failover

• Can’t use full 10Gbps

o Only one TCP/IP connection

o Only one CPU core engaged

• No failover

• Full 10Gbps available

o Multiple TCP/IP connections

o Receive Side Scaling (RSS) helps distribute load across CPU cores

1 session, without Multichannel

Switch10GbE

NIC10GbE

NIC10GbE

SMB Server

SMB Client

Switch10GbE

NIC10GbE

NIC10GbE

CPU utilization per core

Core 1 Core 2 Core 3 Core 4

CPU utilization per core

Core 1 Core 2 Core 3 Core 4

RSS

RSS

RSS

RSS

1 session, without Multichannel

SMB Multichannel – Multiple NICs

SMB Server 1

SMB Client 1

Switch10GbE

SMB Server 2

SMB Client 2

NIC10GbE

NIC10GbE

Switch10GbE

NIC10GbE

NIC10GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

RSS RSS

RSS RSS

No automatic failover

Can’t use full bandwidth

Only one NIC engaged

Only one CPU core engaged

1 session, with Multichannel1 session, without Multichannel

• No automatic failover

• Can’t use full bandwidth

o Only one NIC engaged

o Only one CPU core engaged

SMB Multichannel – Multiple NICs

• Automatic NIC failover

• Combined NIC bandwidth available

o Multiple NICs engaged

o Multiple CPU cores engaged

SMB Server 1

SMB Client 1

Switch10GbE

SMB Server 2

SMB Client 2

NIC10GbE

NIC10GbE

Switch10GbE

NIC10GbE

NIC10GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

SMB Server 1

SMB Client 1

Switch10GbE

SMB Server 2

SMB Client 2

NIC10GbE

NIC10GbE

Switch10GbE

NIC10GbE

NIC10GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

RSS RSS

RSS RSS

RSS RSS

RSS RSS

• Linear bandwidth scaling

o 1 NIC – 1150 MB/sec

o 2 NICs – 2330 MB/sec

o 3 NICs – 3320 MB/sec

o 4 NICs – 4300 MB/sec

• Leverages NIC support for RSS (Receive Side Scaling)

• Bandwidth for small IOs is bottlenecked on CPU

SMB Multichannel Performance

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

512 1024 4096 8192 16384 32768 65536 131072 262144 524288 1048576

MB

/sec

I/O Size

SMB Client Interface Scaling - Throughput

1 x 10GbE 2 x 10GbE 3 x 10GbE 4 x 10GbE

1 session, with NIC Teaming, no MC

SMB Multichannel + NIC Teaming

SMB Server 2

SMB Client 1

Switch1GbE

SMB Server 2

SMB Client 2

NIC1GbE

NIC1GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC Teaming

NIC Teaming

NIC Teaming

NIC Teaming

• Automatic NIC failover

• Can’t use full bandwidth

o Only one NIC engaged

o Only one CPU core engaged

1 session, with NIC Teaming and MC1 session, with NIC Teaming, no MC

• Automatic NIC failover

• Can’t use full bandwidth

o Only one NIC engaged

o Only one CPU core engaged

SMB Multichannel + NIC Teaming

• Automatic NIC failover (faster with NIC Teaming)

• Combined NIC bandwidth available

o Multiple NICs engaged

o Multiple CPU cores engaged

SMB Server 1

SMB Client 1

SMB Server 2

SMB Client 2

NIC Teaming

NIC Teaming

NIC Teaming

NIC Teaming

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch1GbE

NIC1GbE

NIC1GbE

SMB Server 2

SMB Client 1

Switch1GbE

SMB Server 2

SMB Client 2

NIC1GbE

NIC1GbE

Switch1GbE

NIC1GbE

NIC1GbE

Switch10GbE

Switch10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC10GbE

NIC Teaming

NIC Teaming

NIC Teaming

NIC Teaming

SMB Direct and SMB Multichannel

SMB Server 2

SMB Client 2

SMB Server 1

SMB Client 1

Switch10GbE

Switch10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

1 session, without Multichannel

• No automatic failover

• Can’t use full bandwidth

o Only one NIC engaged

o RDMA capability not used

1 session, with Multichannel1 session, without Multichannel

• No automatic failover

• Can’t use full bandwidth

o Only one NIC engaged

o RDMA capability not used

SMB Direct and SMB Multichannel

• Automatic NIC failover

• Combined NIC bandwidth available

o Multiple NICs engaged

o Multiple RDMA connections

SMB Server 2

SMB Client 2

SMB Server 1

SMB Client 1

SMB Server 2

SMB Client 2

SMB Server 1

SMB Client 1

Switch10GbE

Switch10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Switch10GbE

Switch10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

R-NIC10GbE

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Switch54GbIB

R-NIC54GbIB

R-NIC54GbIB

Single Root I/O Virtualization (SR-IOV)

78

• “Network Switch in the Network Adapter”

• Directly I/0 to VMs vNICsRemoves CPU from the process of moving datato/from a VMs. Data is DMA´d directly to/fromthe VM without the Virtual Switch "touching" it

• High I/O workloads

• New in Windows Server 2016Support for SR-IOV in the SET Switch

Single Root I/O Virtualization (SR-IOV)

79

HostVirtual Machine

Virtual Machine Network Stack

Synthetic NIC

Hyper-VExtensible Switch

SR-IOV NIC VF

Virtual Function

VFVF

• Directly I/0 to NIC

• High I/O workloads

Requires

• SR-IOV capable NICs

• Windows server 2012 or higher VMs

• SET Switch

Benefits

• Maximizes use of host system processors and memory

• Reduces host CPU overhead for processing network traffic(by up to 50%)

• Reduces network latency(by up to 50%)

• Provides higher network throughput(by up to 30%)

• Full support for Live Migration

Single Root I/O Virtualization (SR-IOV): Installation Workflow

80

2. Check NIC firmware and drivers to support SR-IOV

5. Install NIC drivers on the host

Check driver advanced properties to ensure SRIOV is enabled at the

driver level.

1. Check with the server vendor if the chipset support SR-IOV

Note that some older systems will have an SR-IOV menu item in their

BIOS, but did not enable all of the necessary IOMMU functionality

needed for Windows Server 2012

6. Create SR-IOV enabled Hyper-V Switch

3. Enable SR-IOV on the Server BIOS

7. Enable SR-IOV for each VM we want to use with (default)

4. Enabler SR-IOV on the NIC BIOS. Here we define the amount of

VF per NIC

8. Install NIC driver inside the VM

Single Root I/O Virtualization (SR-IOV): SETSwitch

81

Windows Server 2016

• Support of SR-IOV in SET Switch

• Host pNIC team with the SET Switch

• No need for Guest Team, only one vNIC in VM

Single Root I/O Virtualization (SR-IOV): SETSwitch

82

Windows Server 2016

• Support of SR-IOV in SET Switch

• Host pNIC team with the SET Switch

• No need for Guest Team, only one vNIC in VM

Single Root I/O Virtualization (SR-IOV): VM Performance

83

SR-IOV vNIC Performance

VM Guest with 4 VPs

30 Gigabit (Mellanox Connect X3-Pro 40G)

3100 MB/s

84

Single Root I/O Virtualization (SR-IOV): VM Performance

85

SR-IOV vNIC Performance

VM Guest with 8 VPs

40 Gigabit (Mellanox Connect X3-Pro 40G)

4510 MB/s

Single Root I/O Virtualization (SR-IOV): VM Performance

86

• Windows Server 2012 R2 VM Guest

• Performance example with SR-IOV vNIC

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 8 VPs in the VM Guest (vRSS no enabled)

• VM use only Core 2 (no vRSS in the VM Guest)

• Test with Microsoft ctsTraffic.exe (send/receive)

Single Root I/O Virtualization (SR-IOV): VM Performance

87

• Windows Server 2012 R2 VM Guest

• Performance example with SR-IOV vNIC

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 8 VPs in the VM Guest (vRSS Enabled)

• VM use only HT cores

• Test with Microsoft ctsTraffic.exe (send/receive)

Single Root I/O Virtualization (SR-IOV): VM Performance

88

• Windows Server 2012 R2 VM Guest

• Performance example with SR-IOV vNIC

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 16 VPs in the VM Guest (vRSS Enabled)

• VM use all cores

• Test with Microsoft ctsTraffic.exe (send/receive)

SR-IOV and Virtual Machine Mobility

89

Live Migration, Quick Migration, and Snapshots are supported by SR-IOV mobility

1

2

3

VF is presented to the Virtual Machine using SR-IOV on the

source Hyper-V Host

Set-VMNetworkAdapter VM –IOVWeight 1

VF on the VM is removed once migration is started

Set-VMNetworkAdapter VM –IOVWeight 0

• VF will be presented again on the Destination Host if it

supports SR-IOV.

• Network connectivity will continue without SR-IOV if

Destination Host does not support it

Set-VMNetworkAdapter VM –IOVWeight 1

VF

PF

PF

VF

VF

PF

Troubleshooting SR-IOV

90

Hyper-V Manager Networking tab on each

VM will show if SR-IOV is not operational1

Event Viewer also indicates if some error

exists, enabling SR-IOV on the VM network

adapter. Check the Hyper-V SynthNIC log3

Using Windows PowerShell, we can validate

why SR-IOV is not operational. In this

example, the BIOS and the NIC does not

support SR-IOV

PS C:\Windows\system32> (Get-VMHost).iovsupport False PS C:\Windows\system32> (Get-VMHost).iovsupportreasons Ensure that the system has chipset support for SR-IOV and that I/O virtualization is enabled in the BIOS. The chipset on the system does not do DMA remapping, without which SR-IOV cannot be supported. The chipset on the system does not do interrupt remapping, without which SR-IOV cannot be supported. To use SR-IOV on this system, the system BIOS must be updated to allow Windows to control PCI Express. Contact your system manufacturer for an update. SR-IOV cannot be used on this system as the PCI Express hardware does not support Access Control Services (ACS) at any root port. Contact your system vendor for further information. PS C:\Windows\system32>

2

Dynamic VMMQ

Dynamic ~5Gbps

Dynamic ~20Gbps

Static

WS2016

Multiple VMQs for the same virtual NIC (VMMQ)

Statically assigned queues

WS2019\Azure Stack HCI

Autotunes queues to:

Maximize virtual NIC throughput

Maintain consistent virtual NIC throughput

Maximize Host CPU efficiency

Premium Certified Adapters Required

Supports Windows and Linux Guests

Try it out! (without special drivers)

https://aka.ms/DVMMQ-Validation

Virtual Machine Multi-Queue (VMMQ)

93

• Feature that allows Network traffic for a VM to be spread across multiple queues

• VMMQ is the evolution of VMQ with Software vRSS

• High traffic VMs, will benefit from the CPU load spreading that multiple queues can provide.

• Default disabled in the SET Switch

• Default disabled in the VM Guest

Virtual Machine Multi-Queue (VMMQ)

94

For VMMQ for a VM to be enabled, RSS in the VM needs to be enabled

• VrssEnabled: true

• VmmqEnabled: false (Off by default)

• VmmqQueuePairs: 16

• Use VMQ for low Traffic VMs

• Use VMMQ for high Traffic VMs

Virtual Machine Multi-Queue (VMMQ) Performance

95

Performance example with VMQ vs. VMMQ

Hardware used

• Mellanox Connect X3-Pro 40 Gigabit

• Dell T430 with Intel E5-2620v4

Software used:

• Microsoft NTttcp.exe

• Microsoft ctsTraffic.exe

Note:

• VM111-VM140 Windows Server 2016

• VM151-VM152 Windows Server 2012R2

Virtual Machine Multi-Queue (VMQ) Performance

96

• Performance example with VMMQ Disabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest

• VM use VMQ BaseProcessor 2

Virtual Machine Multi-Queue (VMMQ) Performance

97

• Performance example with VMMQ enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest

• 4 Queues used (VMMQ use 4 core from 8 to 14)

Virtual Machine Multi-Queue (VMMQ) Performance

98

• Performance example with VMMQ enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 8 VPs in the VM Guest

• 8 Queues used (VMQ use 8 cores from 0 to 14)

Virtual Machine Multi-Queue (VMMQ) Performance

99

• Performance example with VMMQ enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 8 VPs in the VM Guest

• 7 Queues used (VMQ use 7 cores from 2 to 14)

• Test with Microsoft ctsTraffic.exe (send/receive)

Virtual Machine Multi-Queue (VMQ) Performance

100

• Windows Server 2012 R2 VM Guest

• Performance example with VMMQ Disabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest (vRSS default Disabled)

• VM use Core 10 (100%)

Virtual Machine Multi-Queue (VMQ) Performance

101

• Windows Server 2012 R2 VM Guest

• Performance example with VMMQ Disabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest (vRSS Enabled in VM)

• VM use Core 10 (100%)

Virtual Machine Multi-Queue (VMMQ) Performance

102

• Windows Server 2012 R2 VM Guest

• Performance example with VMMQ Enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest (vRSS Enabled in VM)

• VM use Core 2-14

Virtual Machine Multi-Queue (VMMQ) Performance

103

• Windows Server 2012 R2 VM Guest

• Performance example with VMMQ Enabled

• Host pNIC Mellanox Connect X3-Pro 40 Gigabit

• 4 VPs in the VM Guest (vRSS Enabled in VM)

• VM use Core 2-14

• Test with Microsoft ctsTraffic.exe (send/receive)

Virtual Receive-side Scaling (vRSS)

105

• vRSS default enabled in Windows Server 2016 and higher VMs

• vRSS is supported on Host vNIC

• vRSS works with VMQ or VMMQ(VMMQ = use RSS queues in hardware)

• Not compatible with SR-IOV vmNIC

Virtual RSS Azure Stack

106

Node 0 Node 1 Node 2 Node 3

2

2

3

3

1

1

0

0

Incoming

packets

vProcvProcvProcvProc

vNIC

• vRSS provides near line rate to a Virtual Machine on existing hardware, making it possible to virtualize traditionally network intensive physical workloads

• Maximizes resource utilization by spreading Virtual Machine traffic across multiple virtual processors

• Helps virtualized systems reach higher speeds with 10 to 100 GbpsNICs

• Requires no hardware upgrade and works with any NICs that support RSS

Virtual RSS

107

• Supported Guest OS with the latest Integration Services Installed

• NIC must support RSS and VMQ

• VMQ must be enabled

• SR-IOV must be disabled for the Network Card using vRSS

• RSS enabled inside the Guest

• Enable-NetAdapterRSS -Name "AdapterName"

• RSS configuration inside the Guest is Required (same as physical computer)