Post on 08-May-2015
description
transcript
Intro to Linux Performance
AnalysisChris McEniry
LOPSA-SD March 27, 2014
Me
• Systems Architect
• Sony Network Entertainment
• 18 years running stuff
• Majority of the last 14 years: medium-large Internet services
Read this book…
And look here:
http://www.brendangregg.com/
http://www.brendangregg.com/methodology.html
http://www.brendangregg.com/Slides/LISA2012_methodologies.pdf
http://www.amazon.com/Systems-Performance-Enterprise-Brendan-Gregg/dp/0133390098
The website is down!!! It’s just too slow! The DB is too slow! The disk is too slow!
SLOW!!!
http://farm4.staticflickr.com/3190/2976755407_6a6a574596_o.jpg
SLOW!!!!
• What does slow mean anyways?
• Is it not transferring fast enough?
• Is it handling (not) too many requests?
http://commons.wikimedia.org/wiki/File:United_States_sign_-_Slow_Traffic_Ahead.svg
Slow can mean…
• Latency: How long it takes
• ms, s, request time, etc
• Throughput: How much can happen at the same time
• bandwidth, IOPS, rps, tps, etc
http://upload.wikimedia.org/wikipedia/commons/2/2e/Miniature_DNF_Dictionary_055_ubt.JPG
Slowness comes from…
• Full utilization of a resource
• Waiting in a saturated queue
• Generated errors!
!
• The USE Method
http://farm6.staticflickr.com/5181/5614813544_a30d693a50_o.jpg
Utilization
• You have fully used up what’s been allocated
• aka 5 lb bag
http://farm3.staticflickr.com/2524/4000641774_3331fe06fb_o.jpg
Saturation
• Waiting for someone else to get done so you can do yours
• Typically because a resource is fully utilized, but not necessarily directly
http://www.fotocommunity.com/pc/pc/display/30396619
Errors
• Dropped packets
• Incorrect responses
• Deadlocks
• Timeouts
!
• Not all failures fail fast
http://farm8.staticflickr.com/7001/6509400855_aaaf915871_b.jpg
How do we determine?
• Different types of tools for different examinations
• Depends on what you’re looking for (which can be a problem in and of itself)
http://farm5.staticflickr.com/4083/5086955738_61f6455ace_b.jpg
Resource vs Transaction• Do you care if…
• a CPU is maxed out?
• processes are blocked?
• packets are lost?
• or if…
• a user’s request fails?
• a user gives up on waiting for a response?
Maturity
• Tracing tools, especially using in production, requires a level of maturity
• I’m not that mature… ;)
• No, really just focusing on the basics first
http://upload.wikimedia.org/wikipedia/commons/b/bd/OFLC_large_R18%2B.svg
http://image.slidesharecdn.com/scalelinuxperformance-130224171331-phpapp01/95/slide-15-638.jpg?cb=1362166290
http://image.slidesharecdn.com/scalelinuxperformance-130224171331-phpapp01/95/slide-16-638.jpg?cb=1362166290
General
?
/var/log/messages
Errors !(mostly - sometimes stats go here)
/var/log/messages
CPU
?
uptime
Saturation of the scheduler
uptime
?
top
topSaturation
Utilization
Memory
?
free
Utilization
free
?
vmstat
vmstat
SaturationUtilization
Counts
?
slabtop
Utilization
slabtop
Disk
?
df
Utilization
df
?
iostat -x
Maybe you can get additional utilization if you know the max r/s or w/s - but not as clear based on different properties.
iostat -x
IO (Network)
?
ping
Errors
ping
?
netstat
Saturation
netstat
?
netstat -s
Errors
netstat -s
?
ifconfig
ifconfigSaturation
UtilizationErrors
What are your examples?
http://upload.wikimedia.org/wikipedia/commons/f/f3/Uncle_Sam_(pointing_finger).jpg
Applications
Running out of Apache Threads
• Lots of incoming requests
• Apache hits ServerLimit of threads (Utilization!)
• Requests start to get stuck in TCP backlog (Saturation!)
• Apache endpoints are removed from load balancers (Error!)
• Fail!
http://upload.wikimedia.org/wikipedia/commons/9/96/Colorful_Threads_(3965274345).jpg
Cold DB Start• DB’s like to be in memory, but
can’t start that way
• All data requests go to disk (which is SAN backed)
• SAN controller CPU gets maxed out (Utilization!)
• HBA queues get deep (Saturation!)
• Requests timeout (Error!)
• Fail!
Summary
Methods > Tools
• Don’t let tools get in the way of solutions
• It’s easy to think that all your missing a tool.
• But are you actually following a method to your performance madness?
http://upload.wikimedia.org/wikipedia/commons/6/6d/Three_Card_Monte.jpg
Anti-Methods• Blame Someone Else
• Streetlight
• Drunk Man
• Random Change
• Passive Benchmark
!
• Don’t do these…
http://www.brendangregg.com/methodology.html http://upload.wikimedia.org/wikipedia/commons/a/af/Villainc.svg
Methods• Ad Hoc Checklist
• Problem Statement
• Scientific
• Workload Characterization
• Drill-down Analysis
• By-layer
• Latency Analysis
• Tools
• Stack Profile
• Off-CPU Analysis
• Thread State Analysis
• Active Benchmarkhttp://www.brendangregg.com/methodology.html http://memegenerator.net/instance/9192015
Linux Performance Tools
Chris McEniry LOPSA-SD
March 27, 2014