Intro to linux performance analysis

Post on 08-May-2015

278 views 2 download

description

LOPSA SD 2014.03.27 Presentation on Linux Performance Analysis An introduction using the USE method and showing how several tools fit into those resource evaluations.

transcript

Intro to Linux Performance

AnalysisChris McEniry

LOPSA-SD March 27, 2014

Me

• Systems Architect

• Sony Network Entertainment

• 18 years running stuff

• Majority of the last 14 years: medium-large Internet services

Read this book…

And look here:

http://www.brendangregg.com/

http://www.brendangregg.com/methodology.html

http://www.brendangregg.com/Slides/LISA2012_methodologies.pdf

http://www.amazon.com/Systems-Performance-Enterprise-Brendan-Gregg/dp/0133390098

The website is down!!! It’s just too slow! The DB is too slow! The disk is too slow!

SLOW!!!

http://farm4.staticflickr.com/3190/2976755407_6a6a574596_o.jpg

SLOW!!!!

• What does slow mean anyways?

• Is it not transferring fast enough?

• Is it handling (not) too many requests?

http://commons.wikimedia.org/wiki/File:United_States_sign_-_Slow_Traffic_Ahead.svg

Slow can mean…

• Latency: How long it takes

• ms, s, request time, etc

• Throughput: How much can happen at the same time

• bandwidth, IOPS, rps, tps, etc

http://upload.wikimedia.org/wikipedia/commons/2/2e/Miniature_DNF_Dictionary_055_ubt.JPG

Slowness comes from…

• Full utilization of a resource

• Waiting in a saturated queue

• Generated errors!

!

• The USE Method

http://farm6.staticflickr.com/5181/5614813544_a30d693a50_o.jpg

Utilization

• You have fully used up what’s been allocated

• aka 5 lb bag

http://farm3.staticflickr.com/2524/4000641774_3331fe06fb_o.jpg

Saturation

• Waiting for someone else to get done so you can do yours

• Typically because a resource is fully utilized, but not necessarily directly

http://www.fotocommunity.com/pc/pc/display/30396619

Errors

• Dropped packets

• Incorrect responses

• Deadlocks

• Timeouts

!

• Not all failures fail fast

http://farm8.staticflickr.com/7001/6509400855_aaaf915871_b.jpg

How do we determine?

• Different types of tools for different examinations

• Depends on what you’re looking for (which can be a problem in and of itself)

http://farm5.staticflickr.com/4083/5086955738_61f6455ace_b.jpg

Resource vs Transaction• Do you care if…

• a CPU is maxed out?

• processes are blocked?

• packets are lost?

• or if…

• a user’s request fails?

• a user gives up on waiting for a response?

Maturity

• Tracing tools, especially using in production, requires a level of maturity

• I’m not that mature… ;)

• No, really just focusing on the basics first

http://upload.wikimedia.org/wikipedia/commons/b/bd/OFLC_large_R18%2B.svg

http://image.slidesharecdn.com/scalelinuxperformance-130224171331-phpapp01/95/slide-15-638.jpg?cb=1362166290

http://image.slidesharecdn.com/scalelinuxperformance-130224171331-phpapp01/95/slide-16-638.jpg?cb=1362166290

General

?

/var/log/messages

Errors !(mostly - sometimes stats go here)

/var/log/messages

CPU

?

uptime

Saturation of the scheduler

uptime

?

top

topSaturation

Utilization

Memory

?

free

Utilization

free

?

vmstat

vmstat

SaturationUtilization

Counts

?

slabtop

Utilization

slabtop

Disk

?

df

Utilization

df

?

iostat -x

Maybe you can get additional utilization if you know the max r/s or w/s - but not as clear based on different properties.

iostat -x

IO (Network)

?

ping

Errors

ping

?

netstat

Saturation

netstat

?

netstat -s

Errors

netstat -s

?

ifconfig

ifconfigSaturation

UtilizationErrors

What are your examples?

http://upload.wikimedia.org/wikipedia/commons/f/f3/Uncle_Sam_(pointing_finger).jpg

Applications

Running out of Apache Threads

• Lots of incoming requests

• Apache hits ServerLimit of threads (Utilization!)

• Requests start to get stuck in TCP backlog (Saturation!)

• Apache endpoints are removed from load balancers (Error!)

• Fail!

http://upload.wikimedia.org/wikipedia/commons/9/96/Colorful_Threads_(3965274345).jpg

Cold DB Start• DB’s like to be in memory, but

can’t start that way

• All data requests go to disk (which is SAN backed)

• SAN controller CPU gets maxed out (Utilization!)

• HBA queues get deep (Saturation!)

• Requests timeout (Error!)

• Fail!

Summary

Methods > Tools

• Don’t let tools get in the way of solutions

• It’s easy to think that all your missing a tool.

• But are you actually following a method to your performance madness?

http://upload.wikimedia.org/wikipedia/commons/6/6d/Three_Card_Monte.jpg

Anti-Methods• Blame Someone Else

• Streetlight

• Drunk Man

• Random Change

• Passive Benchmark

!

• Don’t do these…

http://www.brendangregg.com/methodology.html http://upload.wikimedia.org/wikipedia/commons/a/af/Villainc.svg

Methods• Ad Hoc Checklist

• Problem Statement

• Scientific

• Workload Characterization

• Drill-down Analysis

• By-layer

• Latency Analysis

• Tools

• Stack Profile

• Off-CPU Analysis

• Thread State Analysis

• Active Benchmarkhttp://www.brendangregg.com/methodology.html http://memegenerator.net/instance/9192015

Linux Performance Tools

Chris McEniry LOPSA-SD

March 27, 2014