+ All Categories
Home > Documents > IBM SAN Volume Controller Performance Analysis

IBM SAN Volume Controller Performance Analysis

Date post: 07-Dec-2014
Category:
Upload: brettallison
View: 6,347 times
Download: 3 times
Share this document with a friend
Description:
IntroductionStorage Problems and Limitations with Native StorageSVC OverviewSVC Physical and Logical OverviewPerformance and Scalability ImplicationsTypes of ProblemsPerformance Analysis TechniquesPerformance Analysis Tools for SVCPerformance Analysis Metrics for SVCOnline Banking Example
Popular Tags:
26
IBM Global Technology Services © 2008 IBM Corporation SAN Volume Controller Performance Analysis July 25, 2008
Transcript
Page 1: IBM SAN Volume Controller Performance Analysis

IBM Global Technology Services

© 2008 IBM Corporation

SAN Volume Controller Performance AnalysisJuly 25, 2008

Page 2: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Trademarks & Disclaimer

The following terms are trademarks of the IBM Corporation:

Enterprise Storage Server® - Abbreviated: ESS

TotalStorage® Expert TSE

FAStT/DS4000/DS8000

AIX®

IBM SAN Volume Controller

Other trademarks appearing in this report may be considered trademarks of their respective companies.

SANavigator,EFCM McDATA

UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Limited.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.EMC is a registered trademark of EMC Inc.

HP-UX is a registered trademark of HP Inc.

Solaris is a registered trademark of SUN Microsystems, Inc

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Disclaimer

The views in this presentation are those of the author and are not necessarily those of IBM

Page 3: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Abstract

SAN Volume Controller(SVC) is a flexible, scalable platform for block level storage virtualization. While the SVC adds flexibility in provisioning of storage and provides enhancements to support higher availability potentials, it adds complexity in performance design. This impact is most acute in performance analysis as a new stripping layer is added in your data path and can and does make the analysis more complex. We will provide a technical overview of a SAN environment with SVC and explore the performance and scalability considerations when using SVC. We will review some of the tools, metrics, and methods necessary to identify root causes for the most common performance issues.

Page 4: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Table of Contents

Introduction

Storage Problems and Limitations with Native Storage

SVC Overview

SVC Physical and Logical Overview

Performance and Scalability Implications

Types of Problems

Performance Analysis Techniques

Performance Analysis Tools for SVC

Performance Analysis Metrics for SVC

Online Banking Example

Summary

Page 5: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Page 6: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

SVC High Level Logical View

Page 7: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

SVCCluster

I/O Group 1

VirtualDisk

LUN

ManagedDisk

mdisk010GB

mdisk110GB

mdisk210GB

mdisk310GB

mdisk620GB

mdisk520GB

mdisk420GB

FAStT10GB

FAStT10GB

FAStT10GB

FAStT10GB

ESS20GB

ESS20GB

ESS20GB

ManagedDisk

Groupsmdiskgrp0 [FAStT Group] - 40GB mdiskgrp1 [ESS Group] - 60GB

vdisk020GB

vdisk120GB

vdisk220GB

vdisk320GB

vdisk420GB

Virtual Disks Mapped to HostsSVC Combined Physical & Logical View

IBMIBM

I/O Group 2IBMIBM

Virtual Disks are associated with particular I/O Groups

Managed Disk Groups are accessible by all I/O Groups in the Cluster.

Page 8: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Performance and Scalability Limitations

Shared resources!

– Cache, fibre ports, CPU, Fabric

Cache implications– Completely random workload – ‘Cache Unfriendly’

– Highly sequential – ‘Large DB hot backups’

Fabric implications

– Increases the number of fabric hops!

– Additional fabric traffic to synchronize write data

– Traffic flows in and out of same ports – • Read cache misses• Write synchronizations

Page 9: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Types of Problems

Application– Configuration– Design issues– Defects– DB queries, etc

Host– Multi-pathing software compatibility– HBA microcode/device driver– OS compatibility

SVC – Microcode level, performance features– Front end contention - IO group, Node– Backend contention MDG, Mdisk

Backend Storage– Front end Port, Cache, NVS– Backend Controller, RAID Group (Disks)

Fabric– ISL Congestion

Page 10: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Performance Analysis Process

1. Gather Host multi pathing, SVC, and Storage configuration/firmware

2. Ensure device support and compatibility – SVC Support Matrix – If Host or Storage devices are unsupported Resolve!

• http://www-03.ibm.com/systems/storage/software/virtualization/svc/interop.html– Update SVC firmware to latest level (Ensure Host Multi-pathing is supported/configured right)

3. After resolving configuration issues:– Gather end to end response time (i.e. – Host iostat/perfmon data)– If elongated response time exists drill down to next layer

4. Measurement Points– Application – Transactional latency– Host – LV & Disk I/O Response Times, Disk Utilization, Throughput– Fabric – Throughput, Utilization– SVC – IO Group, MD Group, MDisks, Vdisks– Storage – Depends on technology

• EMC – FA, Cache, DA, Disk, Volume• DS8K/ESS – Front end Port, Array (Physical), Volume

Page 11: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Performance Analysis Tools for SVC

Tivoli Total Storage Productivity Center (TPC)

– Complex and expensive to deploy

– Provides lots of detail

Native command line interface – Data in XML format but no publicly available post-processing

– Custom written text parser not ideal

– XSL and ANT are good options or other XML parser/viewers

Page 12: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

SVC Key Performance Metrics

IO Group– Front-End & Backend Latency (Read/Write), Queue Time (Read/Write), Throughput (Read/Write), Transfer Size (Read/Write), I/O Rates

(Read/Write)– Cache Hits

Node– Same as IO Group +CPU + Port to Local Node Send I/O Rate & Receive

MD Group– Front-End & Backend Latency (Read/Write), Queue Time (Read/Write), Throughput (Read/Write), Transfer Size (Read/Write), I/O Rates

(Read/Write)

MDisk – Backend Latency (Read/Write), Queue Time (Read/Write), Throughput (Read/Write), Transfer Size (Read/Write), I/O Rates (Read/Write)

Vdisk – Front-End, Queue Time (Read/Write), Throughput (Read/Write), Transfer Size (Read/Write), I/O Rates (Read/Write)– NVS Full & Delays, Cache Hits

Explanations– Overall Response Time = vdisk response time– If an I/O is a cache hit, then you only have the vdisk response time– Backend Response Time = mdisk fabric response time (i.e. from the point we send the I/O to the controller to when we get it back)– Backend Queue = mdisk queue time (inside SVC waiting to be sent onto fabric + fabric response time)– Backend responses are also for 32K tracks, so a vdisk doing 256K I/O will need many backend I/O to complete (if its a cache miss) a lot of

these will be concurrent

Page 13: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Real World Example: Online Banking Application (OLB) – Problem Statement

An online banking application and other applications that rely SAN I/O are experiencing intermittent, severe performance impacts

Performance impacts typified by a daily performance degradation between 3:15 am and 6:00am.

SVC response time outside of problem window is acceptable.

Page 14: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB – Host Impact – Increase in copy times

0

100

200

300

400

500

600

9/20

/200

7 14

:00

9/20

/200

7 15

:00

9/20

/200

7 16

:00

9/20

/200

7 17

:00

9/20

/200

7 18

:00

9/20

/200

7 19

:00

9/20

/200

7 20

:00

9/20

/200

7 21

:00

9/20

/200

7 22

:00

9/20

/200

7 23

:00

9/21

/200

7 0:

00

9/21

/200

7 1:

00

9/21

/200

7 2:

00

9/21

/200

7 3:

00

9/21

/200

7 4:

00

9/21

/200

7 5:

00

9/21

/200

7 6:

00

9/21

/200

7 7:

00

9/21

/200

7 8:

00

Host2 - /apps/olbfs

Host4 - /apps/olbfs

Host4 - /data/olb_input

Host3 - /ora/SOMEDB/data001/DBs/export

host5 - /apps/olbfs

host6 - /apps/olbfs

host7 - /data/archive

host7 - /apps/olbfs

host8 - /apps/olbfs

host9 - /data/output

host9 - /apps/olbfs

host10 - /localtest

host10 - /apps/olbfs

host10 - /data/olb_input

host11 - /apps/olbfs

host12 - /ora/SOMEDB/data001/DBs/export

host13 - /apps/olbfs

host14 - /apps/olbfs

host15 - /apps/olbfs

host16 - /apps/olbfs

Host1 - /ora/SOMEDB/data001/DBs/export

Drop Page Fields Here

Sum of Copy time (s)

Date

Host

File System

Page 15: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB: Performance Analysis – Host Configuration

Collect host configuration data– Prior to microcode 4.3.1 it is very important that host multi-path sw communicates to

SVC preferred node! – Try to use IBM SDD/PCM as they work!– If using others DMP/MPxIO only 1 Multi-path software should be active– Special procedures and/or configuration changes may be required for non IBM MP

Hosts were running improperly configured MPxIO – Needed patch and SVC configuration change

http://www-1.ibm.com/support/docview.wss?rs=591&context=STC7HAC&context=STCWGAV&context=STCWGBP&dc=DB520&dc=DB530&dc=DB510&dc=DB550&q1=mpxio&uid=ssg1S1002938&loc=en_US&cs=utf-8&lang=en

Hosts running unsupported DMP configuration –– Needed patch from Veritas to fix– VxVM 5.0 Requires RP3 (Rolling Patch 3 and Hotfix 127320-02)

Identify and repair host configuration

Page 16: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB: Upgrade SVC to Latest Firmware

Make sure you are at least at 4.x.

Latest SVC Firmware (4.2.x) has many fixes

Fixes to increase mdisk q-depth settings

Vs. 3.x – SVC 4.x takes advantage of all node ports

Cache partitioning available for governing workloads

4.x provides enhanced performance metrics

Page 17: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB: Gather End to End Response Time

Initially gather enough information to confirm there are I/O related issues

Identify if I/O throughput degradation is systemic

– All devices on given host

– All devices on all hosts

– All devices on a given SVC or SAN component

In this case all hosts were impacted by throughput degradation

Watch for large transfer sizes as destages from cache to backend storage are done in 32 KB writes.

Page 18: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB: Gather SVC MD Group data

SVC MD GROUP

Avg Read IO Rate

Avg Write IO Rate

Avg Total IO Rate

Avg Read Data Rate MB

Avg Write Data Rate MB

Avg Total Data Rate MB

Avg Read Size KB

Avg Write Size KB

Avg Size KB

SVC001 SVC1_12345_R5_1 459.80 119.80 579.60 29.00 28.00 57.00 64.70 518.90 109.20

SVC001 SVC1_22222_R1_3 395.10 78.50 473.60 28.10 29.30 57.40 72.80 381.90 125.40

SVC001 SVC1_12345_R5_3 309.30 124.00 433.30 22.80 20.50 43.40 74.80 308.70 106.00

SVC001 SVC1_12345_R5_2 293.10 60.20 353.30 17.70 18.00 35.70 62.50 359.70 105.00

SVC001 SVC1_33333_R5_9 286.70 91.30 378.00 14.60 7.40 22.00 56.00 75.70 67.90

SVC001 SVC1_12345_R5_4 233.30 102.20 335.60 17.50 13.10 30.60 77.90 242.90 99.10

SVC001 SVC1_33333_R5_2 224.70 78.20 302.90 8.40 2.30 10.80 32.20 33.10 34.00

SVC001 SVC1_33333_R5_0 207.70 68.10 275.80 8.30 1.90 10.20 30.20 22.00 31.30

SVC001 SVC1_33333_R5_1 197.30 105.30 302.60 10.30 6.90 17.20 41.10 76.40 70.50

SVC001 SVC1_33333_R5_4 191.80 87.80 279.60 16.50 3.00 19.40 76.30 37.70 62.10

Focus on those the MDGs with the most throughput during period

Page 19: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB: Drill Down To Vdisk

What are these hosts doing during this time period!

VDISK

Servers Avg Read IO Rate

Avg Write IO Rate

Avg Total IO Rate

Avg Read Data Rate MB

Avg Write Data Rate MB

Avg Total Data Rate MB

Avg Read Size KB

Avg Write Size KB

Avg Size KB

vdisk1 Host1, Host2 72.8 1.1 73.9 3 0 3 13.7 8 13.7

vdisk2 Host1, Host2 69.4 0.1 69.5 2.9 0 2.9 13.8 8 13.8

vdisk3 Host1, Host2 68.4 39.7 108.1 3.2 1.6 4.8 58.7 41 47.2

vdisk4 Host1, Host2 40.9 4 45 2.6 0 2.6 17.3 8 17.1

vdisk5 Host3 19.7 2.4 22.1 1.9 0 1.9 12.2 6.7 17

vdisk6 Host4 5.4 2.1 7.4 0.5 0 0.5 11.1 6.1 15.3

Page 20: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB: Identify Processes and Scheduled Jobs Initiating I/O

Check Native schedulers (cront/at) for

– Application users

– DB users

– root

Check 3rd party schedulers (Autosys)

Cron entries for db servers on hosts with high I/O identified 103 database backup schedules within problem period!

Page 21: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB – Root Cause

The root cause of the online banking performance degradation is a flooding of the San Volume Controller by streaming read IO’s originating from RMAN Oracle backups initiated on 103 databases within a 20 minute period.

This read IO flood is cache hostile, causing other read and write requests to queue, creating performance degradation.

With the current host read-ahead settings, at peak (concurrent Oracle RMAN incremental backups between 3am - 6am), the SVC is not able to process the combination of volume and composition of IO without a flow-on performance impact.

Page 22: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB: Actions Taken During Analysis

 

SVC MB/s SVC CPU SVC Read Resp (ms)

Initial Inspection 600 80 120

SVC 4.2.03 upgd 800 80 90

Host - DMP patch 1300 70 65

Host - MPxIO corrected

2350 60 60

Target Peak 2500 60 15

Page 23: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

OLB Final Recommendations (by priority):

1. Implement production backup policy/strategy in the test environment.

Veritas snapshot backups for hosts operating large databases – Reduce data transferred/Schedule!

2. Tune RMAN, Oracle and the SVC to control IO composition and IO availability

Scheduling/Xfer Size/Isolation/Governance on vdisk

2. Add a new IO GROUP to SVC001 Isolation!

3. Replace the SVC 2145-8F4 nodes currently in use with 2145-8G4. Hardware Upgrade!

Page 24: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

SVC Performance Analysis Summary

Identify performance requirements/expectations!

Determine compatibility/Resolve incompatibilities

Utilize latest SVC firmware if possible

Measure hosts

Measure SVC

Measure backend storage

Identify bottlenecks and resolve

Page 25: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

Appendix A: Additional Resources

These publications are also relevant as further information sources:

IBM System Storage SAN Volume Controller, SG24-6423-05

Get More Out of Your SAN with IBM Tivoli Storage Manager, SG24-6687

IBM Tivoli Storage Area Network Manager: A Practical Introduction, SG24-6848

IBM System Storage: Implementing an IBM SAN, SG24-6116

IBM System Storage Open Software Family SAN Volume Controller: Planning Guide, GA22-1052

IBM System Storage Master Console: Installation and User’s Guide, GC30-4090

IBM System Storage Open Software Family SAN Volume Controller: Installation Guide , SC26-7541

IBM System Storage Open Software Family SAN Volume Controller: Service Guide, SC26-7542

IBM System Storage Open Software Family SAN Volume Controller: Configuration Guide , SC26-7543

IBM System Storage Open Software Family SAN Volume Controller: Command-Line Interface User's Guide , SC26-7544

IBM System Storage Open Software Family SAN Volume Controller: CIM Agent Developers Reference , SC26-7545

IBM TotalStorage Multipath Subsystem Device Driver User's Guide, SC30-4096 IBM System Storage Open Software Family SAN Volume Controller: Host Attachment Guide, SC26-7563

Page 26: IBM SAN Volume Controller Performance Analysis

Business Unit or Product Name

© 2008 IBM Corporation

BiographyBrett Allison has been doing distributed systems performance related work since 1997 including J2EE application analysis, UNIX/NT, and Storage technologies. His current role is Performance and Capacity Management team lead ITDS. He has developed tools, processes, and service offerings to support storage performance and capacity. He has spoken at a number of conferences and is the author of several White Papers on performance


Recommended