+ All Categories
Home > Technology > VMworld 2016: vSphere 6.x Host Resource Deep Dive

VMworld 2016: vSphere 6.x Host Resource Deep Dive

Date post: 16-Apr-2017
Category:
Upload: vmworld
View: 1,423 times
Download: 1 times
Share this document with a friend
57
vSphere 6.x Host Resource Deep Dive Frank Denneman Niels Hagoort INF8430 #INF8430
Transcript
Page 1: VMworld 2016: vSphere 6.x Host Resource Deep Dive

vSphere 6.x Host Resource Deep DiveFrank DennemanNiels Hagoort

INF8430

#INF8430

Page 2: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Agenda• Compute• Storage•Network•Q&A

Page 3: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Introduction

www.cloudfix.n l

Niels Hagoort• Independent Archi tect• VMware VCDX #212• VMware vExpert (NSX)

Frank Denneman• Enjoying Summer 2016• VMware VCDX #29• VMware vExpert

www.frankdenneman.nl

Page 4: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Compute( N U M A , N U M A , N U M A )

Page 5: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Insights In Virtual Data Centers

Page 6: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Modern dual sockets CPU servers are Non-Uniform Memory Access (NUMA) systems

Page 7: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Local and Remote Memory

Page 8: VMworld 2016: vSphere 6.x Host Resource Deep Dive

NUMA Focus Points

• Caching Snoop modes

• DIMM configuration

• Size VM match CPU topology

Page 9: VMworld 2016: vSphere 6.x Host Resource Deep Dive

CPU Cache( t h e f o r g o t t e n h e r o )

Page 10: VMworld 2016: vSphere 6.x Host Resource Deep Dive

CPU Architecture

Page 11: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Caching Snoop Modes

Page 12: VMworld 2016: vSphere 6.x Host Resource Deep Dive

DIMM Configuration( a n d w h y 3 8 4 G B i s n o t a n o p t i m a l c o n fi g u r a t i o n )

Page 13: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Memory Constructs

Page 14: VMworld 2016: vSphere 6.x Host Resource Deep Dive

3-DPC - 384 GB – 2400 MHz DIMM

Page 15: VMworld 2016: vSphere 6.x Host Resource Deep Dive

DIMMS Per Channel

Page 16: VMworld 2016: vSphere 6.x Host Resource Deep Dive

2-DPC - 384 GB – 2400 MHz DIMM

Page 17: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Current Sweet Spot: 512GB

Page 18: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Right Size your VM( A l i g n m e n t e q u a l s c o n s i s t e n t p e r f o r m a n c e )

Page 19: VMworld 2016: vSphere 6.x Host Resource Deep Dive

ESXi NUMA focus points

• CPU scheduler al locates Core or HT cycles

• NUMA scheduler init ial placement + LB

• vCPU configuration impacts IP & LB

Page 20: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Scheduling constructs

Page 21: VMworld 2016: vSphere 6.x Host Resource Deep Dive

12 vCPU On 20 Core System

Page 22: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Align To CPU Topology

• Resize vCPU configuration to match core count

• Use vcpu.numa.preferHT

• Use cores per socket (CORRECTLY)

• Attend INF8089 at 5 PM in this room

Page 23: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Prefer HT + 12 Cores Per Socket

Page 24: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Storage( H o w f a r a w a y i s y o u r d a t a ? )

Page 25: VMworld 2016: vSphere 6.x Host Resource Deep Dive

The Importance of Access LatencyLocation of operands CPU Cycles Perspective

CPU Register 1 Brain (Nanosecond)

L1/L3 cache 10 End of this room

Local Memory 100 Entrance of building

Disk 10^6 New York

Page 26: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Every Layer = CPU Cycles & Latency

Page 27: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Industry Moves Toward NVMe• SSD bandwidth capabil i t ies exceeds current

control ler bandwidth

• Protocol inefficiencies dominant contributor to access t ime

• NVMe architected from the ground up for non-volati le memory

Page 28: VMworld 2016: vSphere 6.x Host Resource Deep Dive

I/O Queue Per CPU

Page 29: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Driver Stack

Page 30: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Not All Drivers Are Created Equal

Page 31: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Network

Page 32: VMworld 2016: vSphere 6.x Host Resource Deep Dive

pNIC considerations for VXLAN performance

Page 33: VMworld 2016: vSphere 6.x Host Resource Deep Dive

• Additional layer of packet processing

• Consumes CPU cycles for each packet for encapsulation/de-capsulation

• Some of the offload capabil i t ies of the NIC cannot be used (TCP based)

• VXLAN offloading! (TSO / CSO)

VXLAN

Page 34: VMworld 2016: vSphere 6.x Host Resource Deep Dive
Page 35: VMworld 2016: vSphere 6.x Host Resource Deep Dive

1.

2.

3.

Page 36: VMworld 2016: vSphere 6.x Host Resource Deep Dive

[root@ESXi02:~] vmkload_mod -s bnx2xvmkload_mod module information input file: /usr/lib/vmware/vmkmod/bnx2x Version: Version 1.78.80.v60.12, Build: 2494585, Interface: 9.2 Built on: Feb 5 2015 Build Type: release License: GPL Name-space: com.broadcom.bnx2x#9.2.3.0 Required name-spaces: com.broadcom.cnic_register#9.2.3.0 com.vmware.driverAPI#9.2.3.0 com.vmware.vmkapi#v2_3_0_0 Parameters: skb_mpool_max: int Maximum attainable private socket buffer memory pool size for the driver. skb_mpool_initial: int Driver's minimum private socket buffer memory pool size. heap_max: int Maximum attainable heap size for the driver. heap_initial: int Initial heap size allocated for the driver. disable_feat_preemptible: int For debug purposes, disable FEAT_PREEMPTIBLE when set to value of 1 disable_rss_dyn: int For debug purposes, disable RSS_DYN feature when set to value of 1 disable_fw_dmp: int For debug purposes, disable firmware dump feature when set to value of 1 enable_vxlan_ofld: int Allow vxlan TSO/CSO offload support.[Default is disabled, 1: enable vxlan offload, 0: disable vxlan offload] debug_unhide_nics: int Force the exposure of the vmnic interface for debugging purposes[Default is to hide the nics]1. In SRIOV mode expose the PF enable_default_queue_filters: int Allow filters on the default queue. [Default is disabled for non-NPAR mode, enabled by default on NPAR mode] multi_rx_filters: int Define the number of RX filters per NetQueue: (allowed values: -1 to Max # of RX filters per NetQueue, -1: use the default number of RX filters; 0: Disable use of multiple RX filters; 1..Max # the number of RX filters per NetQueue: will force the number of RX filters to use for NetQueue........

Page 37: VMworld 2016: vSphere 6.x Host Resource Deep Dive

[root@ESXi01:~] esxcli system module parameters list -m bnx2xName Type Value Description ---------------------------- ---- ----- -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------RSS int Control the number of queues in an RSS pool. Max 4. autogreeen uint Set autoGrEEEn (0:HW default; 1:force on; 2:force off) debug uint Default debug msglevel debug_unhide_nics int Force the exposure of the vmnic interface for debugging purposes[Default is to hide the nics]1. In SRIOV mode expose the PF disable_feat_preemptible int For debug purposes, disable FEAT_PREEMPTIBLE when set to value of 1 disable_fw_dmp int For debug purposes, disable firmware dump feature when set to value of 1 disable_iscsi_ooo uint Disable iSCSI OOO support disable_rss_dyn int For debug purposes, disable RSS_DYN feature when set to value of 1 disable_tpa uint Disable the TPA (LRO) feature dropless_fc uint Pause on exhausted host ring eee set EEE Tx LPI timer with this value; 0: HW default enable_default_queue_filters int Allow filters on the default queue. [Default is disabled for non-NPAR mode, enabled by default on NPAR mode] enable_vxlan_ofld int Allow vxlan TSO/CSO offload support.[Default is disabled, 1: enable vxlan offload, 0: disable vxlan offload] gre_tunnel_mode uint Set GRE tunnel mode: 0 - NO_GRE_TUNNEL; 1 - NVGRE_TUNNEL; 2 - L2GRE_TUNNEL; 3 - IPGRE_TUNNEL gre_tunnel_rss uint Set GRE tunnel RSS mode: 0 - GRE_OUTER_HEADERS_RSS; 1 - GRE_INNER_HEADERS_RSS; 2 - NVGRE_KEY_ENTROPY_RSS heap_initial int Initial heap size allocated for the driver. heap_max int Maximum attainable heap size for the driver. int_mode uint Force interrupt mode other than MSI-X (1 INT#x; 2 MSI) max_agg_size_param uint max aggregation size mrrs int Force Max Read Req Size (0..3) (for debug) multi_rx_filters int Define the number of RX filters per NetQueue: (allowed values: -1 to Max # of RX filters per NetQueue, -1: use the default number of RX filters; 0: Disable use of multiple RX filters; 1..Max # the number of RX filters per NetQueue: will force the number of RX filters to use for NetQueuenative_eee uint num_queues uint Set number of queues (default is as a number of CPUs) num_rss_pools int Control the existence of a RSS pool. When 0,RSS pool is disabled. When 1, there will bea RSS pool (given that RSS > 0). ........

Page 38: VMworld 2016: vSphere 6.x Host Resource Deep Dive

• Check the supported features of your pNIC

• Check the HCL for supported features in the driver module

• Check the driver module; does it requires you to enable features?

• Other async (vendor) driver avai lable?

Driver Summary

Page 39: VMworld 2016: vSphere 6.x Host Resource Deep Dive

RSS & NetQueue • NIC support required (RSS / VMDq) • VMDq is the hardware feature, NetQueue is

the feature baked into vSphere

• RSS & NetQueue similar in basic functional ity

• RSS uses hashes based on IP/TCP port/MAC

• NetQueue uses MAC filters

Page 40: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Without RSS for VXLAN (1 thread per pNIC)

Page 41: VMworld 2016: vSphere 6.x Host Resource Deep Dive

RSS enabled (>1 threads per pNIC)

Page 42: VMworld 2016: vSphere 6.x Host Resource Deep Dive

How to enable RSS (Intel)

1. Unload module: esxcfg-module -u ixgbe

2. Enable inbox: vmkload_mod ixgbe RSS="4,4”

Enable async: vmkload_mod ixgbe RSS=“1,1”

Page 43: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Receive throughput with VXLAN using 10GbE

Page 44: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Intel examples:Intel Ethernet products RSS for VXLAN technologyIntel Ethernet X520/540 series Scale RSS on VXLAN Outer UDP informationIntel Ehternet X710 series Scale RSS on VXLAN Inner or Outer header

information

X710 series = better at balancing over queues > CPU threads

Page 45: VMworld 2016: vSphere 6.x Host Resource Deep Dive

“What is the maximum performance of the vSphere (D)vSwitch?”

Page 46: VMworld 2016: vSphere 6.x Host Resource Deep Dive

• By default one transmit (Tx) thread per VM• By default, one receive (Netpol l) thread per

pNIC• Transmit (Tx) and receive (Netpol l) threads

consume CPU cycles• Each additional thread provides capacity

(1 thread = 1 core)

Network IO CPU consumption

Page 47: VMworld 2016: vSphere 6.x Host Resource Deep Dive
Page 48: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Netpoll Thread

%SYS i s ±100% dur ing tes t . pN IC rece ives .( th i s i s the NETPOLL th read )

Page 49: VMworld 2016: vSphere 6.x Host Resource Deep Dive

NetQueue Scaling

{"name": "vmnic0", "switch": "DvsPortset-0", "id": 33554435, "mac": "38:ea:a7:36:78:8c", "rxmode": 0, "uplink": "true", "txpps": 247, "txmbps": 9.4, "txsize": 4753, "txeps": 0.00, "rxpps": 624291, "rxmbps": 479.9, "rxsize": 96, "rxeps": 0.00,"wdt": [ {"used": 0.00, "ready": 0.00, "wait": 41.12, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 39, "name": "242.vmnic0-netpoll-10"}, {"used": 0.00, "ready": 0.00, "wait": 41.12, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 39, "name": "243.vmnic0-netpoll-11"}, {"used": 82.56, "ready": 0.49, "wait": 16.95, "runct": 8118, "remoteactct": 1, "migct": 9, "overrunct": 33, "afftype": "pcpu", "affval": 45, "name": "244.vmnic0-netpoll-12"}, {"used": 18.71, "ready": 0.75, "wait": 80.54, "runct": 6494, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "vcpu", "affval": 19302041, "name": "245.vmnic0-netpoll-13"}, {"used": 55.64, "ready": 0.55, "wait": 43.81, "runct": 7491, "remoteactct": 0, "migct": 4, "overrunct": 5, "afftype": "vcpu", "affval": 19299346, "name": "246.vmnic0-netpoll-14"}, {"used": 0.14, "ready": 0.10, "wait": 99.48, "runct": 197, "remoteactct": 6, "migct": 6, "overrunct": 0, "afftype": "vcpu", "affval": 19290577, "name": "247.vmnic0-netpoll-15"}, {"used": 0.00, "ready": 0.00, "wait": 0.00, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 45, "name": "1242.vmnic0-0-tx"}, {"used": 0.00, "ready": 0.00, "wait": 0.00, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 22, "name": "1243.vmnic0-1-tx"}, {"used": 0.00, "ready": 0.00, "wait": 0.00, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 24, "name": "1244.vmnic0-2-tx"}, {"used": 0.00, "ready": 0.00, "wait": 0.00, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 39, "name": "1245.vmnic0-3-tx"} ],

3 Ne tPo l l th reads a re used (3 word le t s ) .

Page 50: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Tx Thread

PKTGEN i s po l l i ng , consuming near 100% CPU

%SYS = ±100%Th is i s the Tx th read

Page 51: VMworld 2016: vSphere 6.x Host Resource Deep Dive

• VMXNET3 is required!• example for vNIC2:

ethernet2.ctxPerDev = "1“

Additional Tx Thread

Page 52: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Additional Tx thread

%SYS = ±200%CPU th reads i n same NUMA node as VM

{"name": "pktgen_load_test21.eth0", "switch": "DvsPortset-0", "id": 33554619, "mac": "00:50:56:87:10:52", "rxmode": 0, "uplink": "false", "txpps": 689401, "txmbps": 529.5, "txsize": 96, "txeps": 0.00, "rxpps": 609159, "rxmbps": 467.8, "rxsize": 96, "rxeps": 54.09, "wdt": [ {"used": 99.81, "ready": 0.19, "wait": 0.00, "runct": 1176, "remoteactct": 0, "migct": 12, "overrunct": 1176, "afftype": "vcpu", "affval": 15691696, "name": "323.NetWdt-Async-15691696"}, {"used": 99.85, "ready": 0.15, "wait": 0.00, "runct": 2652, "remoteactct": 0, "migct": 12, "overrunct": 12, "afftype": "vcpu", "affval": 15691696, "name": "324.NetWorldlet-Async-33554619"} ],

2 w o r l d l e t s

Page 53: VMworld 2016: vSphere 6.x Host Resource Deep Dive

• Transmit (Tx) and receive (Netpol l) threads can be scaled!

• Take the extra CPU cycles for network IO into account!

Summary

Page 54: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Q&A

Page 55: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Keep an eye out for our upcoming book!

@frankdenneman@NHagoort

Page 56: VMworld 2016: vSphere 6.x Host Resource Deep Dive

@frankdenneman@NHagoort

Page 57: VMworld 2016: vSphere 6.x Host Resource Deep Dive

Recommended