+ All Categories
Home > Education > Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Date post: 31-Oct-2014
Category:
Upload: suresh-kumar
View: 6 times
Download: 1 times
Share this document with a friend
Description:
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Popular Tags:
54
VMware Performance Troubleshooting Presented by Chris Kranz
Transcript
Page 1: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

VMware Performance Troubleshooting

Presented by Chris Kranz

Page 2: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Topics Covered• Introduction• Root Cause Analysis• Performance Characteristics• CPU• Networking• Memory• Disk• Virtual Machine optimisation• ESXTop• vm-support• Service Console• Resource Groups• Design Guidelines• Capacity Planner limitations and cautions• Conclusion• Reference Articles

Page 3: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Introduction

Multiple layers of virtualisation are used to increase service levels, availability and manageability

However, multiple layers of virtualisation often mask performance and configuration issues making it more of a challenge to troubleshoot and correct

The worst out come is that performance issues after a virtualisation project lead to the perception that VMware results in reduced performance and future confidence in VMware can be affected

Page 4: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

• Virtual Machine Resources– CPU– Memory– Disk– Networking

Performance Basics

Page 5: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Resource Maximums

Host Guest

Logical Processors 64 N/A

Virtual CPUs N/A 8

Virtual CPU’s per Core 20 N/A

Memory 1TB 256GB

http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf

Page 6: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Typical Host

vSphere 1U Host

CPU’s 2 x Quad Core

Memory 32-64GB RAM

Typical 3 VMs per core, 24VM’s per HostEach has 2GB of RAM = 48GB of RAM

Page 7: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Root Cause Analysis

http://www.vmware.com/resources/techresources/10066

Page 8: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Root Cause ...

Page 9: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

• Do not rely on guest tools, but– Can show high CPU, & Memory Utilisation– Measurement of Latency & throughput of Disk &

Network Interfaces• Use the virtualisation layer, to diagnose cause:– Guest is unaware of virtualisation workload– The way in which guest OS’s account time is

different– No visibility of available resources

Monitoring Performance

Page 10: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

• esxtop (service console only)• resxtop (remote command line utilities)• Performance graphs in vCentre

Performance Analysis Tools

Page 11: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

• esxtop can be run:– Interactively – Batch (eg. esxtop -a -b > analysis.csv)– Load batch into windows perfmon or MS Excel

• Two keys to remember– H : help– F : fields to display

esxtop

Page 12: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

esxtop basics

Number of WorldsName of Resource Pool, Virtual Machine or World

Host Resources

Page 13: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Performance Characteristics

CPU NetworkingMemory DiskSlow ProcessingHigh CPU Wait

Packet LossSlow Network

Slow ProcessingDisk Swapping

Log StallsDisk Queue

Slow Application PerformanceReduced User ExperienceData Loss and Corruption

Page 14: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

CPUESX Scheduler

ServiceConsole

VirtualMachine

Limits / Shares / Reservations

Basic World StatesRead / Run / Wait

CPU StatesReady / Usage / Wait

Page 15: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

CPUesxtop

•PCPU(%): CPU utilization•%USED: Utilization•%RDY: Ready Time•%RUN: Run Time•%WAIT: Wait and idling time

High %RDY + High %User can imply over commitment

Page 16: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

CPUVI-Client

Used Time > Ready Time: Possible CPU over-committment

Used Time

Ready Time

Page 17: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

CPUFurther Investigation

%MLMTD shows this VM has been limited

Page 18: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

CPUFurther Investigation

High ready time caused by CPU resource limit

Page 19: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

VMware Memory Management• Transparent Page Sharing• VMware Tools Balloon Driver to force the VM to swap to disk• Virtual Machine Page File

Page 20: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

MemoryBallooning vs. Swapping

Ballooning driver causes the host to swap pages that it chooses to disk

ESX Swapping will swap any pages to disk.

Page 21: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

• Ballooning can be disabled (0 value) or controlled on a per Virtual Machine basis using:sched.mem.maxmemctl

• Default is set to 65%, can be controlled at host level.

• Only is an issue in resource contention scenarios. (or VM’s with low latency eg Citrix)

Memory

Page 22: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Memory - Host

VI Client shows memory usage of the host. This is calculated as “consumed + overhead memory + Service Console”.

Performance charts are a very good way of showing the Virtual Machine memory breakdown.

• Consumed Memory• Ballooned Memory• Shared Memory• Swapped Memory

Page 23: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Memory - Guest

Host Memory = Consumed + Overhead MemoryGuest Memory = Active Memory for Guest OS

Page 24: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Memory – Guest Overhead

Page 25: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Memory

Metric DescriptionMemory Active (KB) Physical pages touched recently by a VM

Memory Usage (%) Active memory / configured memory

Memory Consumed (KB) Machine memory mapped to a virtual machine, including its portion of shared pages. Doesn’t include overhead memory

Memory Granted (KB) Physical pages allocated to a virtual machine. May be less than configured memory. Includes shared pages. Doesn’t include overhead memory.

Memory Shared (KB) Physical pages shared with other virtual machines

Memory Balloon (KB) Physical memory ballooned from a virtual machine

Memory Swapped (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative

Overhead Memory (KB) Machine pages used for virtualisation

Virtual Machine Memory Metrics – VI Client

Page 26: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Memory

Metric DescriptionMemory Active (KB) Physical pages touched recently by the host

Memory Usage (%) Active memory / configured memory

Memory Consumed (KB) Total host physical memory – free memory on host. Includes Overhead and Service Console memory

Memory Granted (KB) Sum of physical pages allocated to all virtual machines. Doesn’t include overhead memory.

Memory Shared (KB) Physical pages shared by virtual machines on host

Shared Common (KB) Total machine pages used by shared pages

Memory Balloon (KB) Machine pages ballooned from virtual machines

Memory Swap Used (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative

Overhead Memory (KB) Machine pages used for virtualisation

Host Memory Metrics – VI Client

Page 27: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Memoryesxtop

PMEM: Total physical memory breakdownVMKMEM: Memory managed by vmkernelCOSMEM: Service Console memory breakdownPSHARE: Page sharing statisticsSWAP: Swap statisticsMEMCTL: Balloon driver data

Page 28: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Memory

VI Client esxtopActive Memory TCHDMemory Usage %ACTVConsumed Memory N/AMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared SHRD (+SHRDSVD per VM). Must enable COW stats in ESXTOPMemory Balloon MCTLSZMemory Swapped SWCUR (SWR/s & SWW/s are rates)Overhead Memory OVHD & OVHDMAX

esxtop / VI Client metrics : Virtual Machines

Page 29: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Memory

VI Client esxtopMemory Active N/A (try /proc/vmware/sched/mem-verbose)Memory Usage N/A (try /proc/vmware/sched/mem-verbose)Memory Consumed PMEM total – PMEM freeMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared PSHARE (shared)Memory Shared Common PSHARE (common)Memory Balloon MEMCTLMemory Swap Used SWAP (r/w and w/s are rates)Overhead Memory OVHD & OVHDMAX

esxtop / VI Client metrics : Host Usage

Page 30: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

MemoryVI Client memory usage graph

Page 31: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

MemoryTroubleshooting Memory usage issues

Page 32: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Networking

Network configuration is more likely to blame than resource contention

•Switch Assisted Teaming (IP Hash)•VLAN Trunking•Flow Control (full)•Speed & Duplex (1000Mb / Full)•Port Fast•BPDU Disabled•STP Disabled•Link State Tracking•Jumbo Frames

Page 33: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Networkingesxtop

Transmit and Receive in Mb/s

Transmit and Receive in Packets

Page 34: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Networkingesxtop

Drop Packets Received

Dropped Packets Transmit

Page 35: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Disk

Varying Factors• File system performance• Disk subsystem configuration (SAN, NAS, iSCSI, local disk)• Disk caching• Disk formats (thick, sparse, thin)

ESX Storage Stack• Different latencies for different disks• Queuing within the kernel

K: KernelD: DeviceG: Guest

Page 36: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Disk

Quite Coarse Statistics• Disk read / write rate (KB/s)• Disk usage: sum of read BW and write BW (KB/s)• Disk read / write requests (per 20s interval)• Bus resets / Command aborts (per 20s interval)• Per LUN or aggregated stats

VI Client statistics

Page 37: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

DiskAggregated stats similar to VI Client• Disk read / write per sec (READS/s, WRITES/s)• MB read / write per sec (MBREAD/s, MBWRTN/s)

Latency Statistics• Kernel Average / command (KAVG/cmd)• Device Average / command (DAVG/cmd)• Guest Average / command (GAVG/cmd)

Queuing Information• Adapter Queue Length (AQLEN)• LUN Queue Length (LQLEN)• VMKernel (QUED)• Active Queue (ACTV)• %Used (%USD = ACTV/LQLEN)

esxtop statistics

Page 38: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

DiskSAN Rough Estimates

Purely looking at a single ESX host, roughly:Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec

FC, rough maximums:Effective Link Bandwidth = ~80/90% of Real Bandwidth

Effective (2Gbps) = 200 – 230 MBpsEffective (4Gbps) = 410 – 460 MBpsEffective (8Gbps) = 820 – 920 MBps

iSCSI / NFS / FCoE, rough maximums:Effective Link Bandwidth = ~70/80% of Real Bandwidth

Effective (1GigE) = 90 – 100 MBpsEffective (10GigE) = 900 – 1000 MBps

Page 39: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

DiskDesired Latency CalculationsDesired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per host

Example:Number of Hosts: 16Effective Link Bandwidth: 90 MBpsThroughput per host: 90 / 16 = 5.6 MBpsDesired Latency: (32 * 32) / (5.6) = 182.86 msec

Workload Cached Sequential Read Cached Sequential Write

Desired Latency (msec) 182.86 182.86

Observed Latency (msec) ~350 ~180

Throughput Drop? Yes No

Throughput (MBps) ~45 ~90

Page 40: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

DiskVI Client

SAN Cache disabled Poor throughput

SAN Cache enabledHigh throughput

Page 41: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Diskesxtop

Latency is quite high

After enabling cache,Latency is reduced

Page 42: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Virtual Machine OptimisationDeploy all machines from an optimised template!

• VMware tools MUST be installed• The disks MUST be block aligned to the storage (even when using NFS and SAN)• Where possible, always separate data disks from OS disks• Windows performance settings should be optimised for application performance• Guest operating system timeouts should be set as defined by the SAN vendor• Pagefile should be separated where appropriate (this can impact VMware SRM however)• Unused Windows services should be disabled (wireless config, print spooler, audio, etc.)• Last access update time should be disabled (unless where required)• Logging of the VM should be disabled (only enabled for troubleshooting)• Remove any unused virtual hardware (floppy drives, USB, etc.)• Disable screen savers and power saving features, including logon screen saver• Enable Remote Desktop, avoid using the VI Client for remote administration• Install standard applications into template (bginfo, AntiVirus, any host agents, etc)• Multiple-CPU’s should be allocated sparingly

Page 43: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Virtual Machine OptimisationBlock alignment is vital to good disk performance!

Page 44: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

esxtopCommand Actionspace Update the display? Show the help pageq quitf / F Add or Remove columns from the displayo / O Change the order the display is sorteds change the update interval# change the number of instances to displayW Write configuration to filee Expand / Rollup CPU StatsV View only VM instancesL Change the length of the NAME fieldm Display memory statisticsn Display network statisticsi Display interrupt statisticsd Display disk adapter statisticsu Display disk device statisticsv Display disk VM statistics

Command Options when inside esxtop

Page 45: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

esxtop

Command Action-b batch mode-l locks the objects available in the first snapshot-s enables secure mode-a show all statistics-c sets the configuration file-R enables replay mode (used with “vm-support –S”)-d sets the update interval-n runs esxtop for n iterations

Command Line Optionsfrom the console

Page 46: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

esxtop

Expand the default window size for your session to get all statistics

Page 47: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

vm-supportCreates a packaged zip file containing the following sections:• boot

• contains the grub configuration• etc

• contains the Console OS configuration files (cron, tcpwrappers, syslog, etc)• proc

• contains much of the hardware configuration modules and variables• tmp

• contains a lot of the ESX specific configuration output• var

• contains log files and any core dumps• vmfs

• contains the structure of the VMFS datastores• esx3-installation (where appropriate)

• contains a copy if the previous esx3 configuration variables

Page 48: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

vm-supportUsing vm-support to extract performance information:

vm-support –S –d <duration> -i <interval><duration> and <interval> are in seconds

The output from this can then be replayed in esxtop for review after it has been extracted.

esxtop –R <path_to_vm-support_output>

Page 49: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Service Console Performance

•Multiple Service Console networks – for network resiliency•Increased Service Console memory – upto 800MB•Use host agents supplied by your vendors•Make storage recommended tweaks such as HBA Queue Depth and IO timeouts•Minimal use of the VI Client console – RDP or SSH instead•Properly sized vCenter server – 64bit OS where possible

Page 50: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Resource Groups

Dynamically reallocate resource shares

Additional VM, shares allow you to over-commit resources and have a graceful re-allocation

Remove a VM and exploit extra resources across all remaining VM’s

Page 51: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Design Guidelines• Full Resilience / Multiple paths• Standard configuration across all aspects (ESX, Storage, Networking, etc.)

• Standard naming conventions• Learn from others mistakes• Follow guidelines from vendors best-practices• Rule out the basics before requesting support

Page 52: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Capacity Planner & P2V Cautions and Limitations

• Peak CPU usage can sometimes be misleading• Back-end storage system performance• P2V machines will require block-aligning to the storage• P2V machines will still require guest OS optimisation

Page 53: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Conclusion• Performance issues can often be traced with simple root cause analysis using basic tools (VI Client / esxtop)• Performance tools help diagnose issues and help rule out non-issues• Performance tools are useful in different contexts, not always either/or• Real-time data and troubleshooting: esxtop• Historical data: VI Client• Coarse resource / cluster usage: VI Client• Detailed resource usage: esxtop

• Combine information from various tools to get a complete picture• Always benchmark your systems first so you not what the optimal performance is that you can receive

Page 54: Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)

Reference Articles• http://www.vmware.com/pdf/esx3_memory.pdf• http://www.vmworld.com/docs/DOC-2370• http://blogs.vmware.com/performance/• http://communities.vmware.com/docs/DOC-5420• http://kb.vmware.com/kb/1008205 • http://communities.vmware.com/community/vmtn/general/performance• http://www.vmware.com/products/vmmark/ • http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf • http://www.vmware.com/pdf/GuestOS_guide.pdf • http://www.vmware.com/resources/techresources/10066 • http://www.vmware.com/resources/techresources/10059• http://www.vmware.com/resources/techresources/10062


Recommended