Date post: | 26-Jan-2015 |
Category: |
Technology |
Upload: | jclarity |
View: | 119 times |
Download: | 1 times |
Hotspot Garbage Collection
Tuning Guide
http://www.jclarity.com1Thursday, 2 May 13
Who are we?
• Martijn Verburg (@karianna)– CTO of jClarity– aka "The Diabolical Developer"– co-leader of the LJC
• Dr. John Oliver (@johno_oliver)– Research Mentat at jClarity
• Strange title? Yes we're a start-up– Can read raw GC log files
• "Empirical Science Wins"
2Thursday, 2 May 13
What we're going to cover
• Part I - Shining a light into the Darkness– Retrospective from Talk I– Collector Flags Ahoy– Tooling and Basic Data
• Part II - Setting the stage– When to tune GC– Pause times vs Throughput vs Heap Size– Application Lifecycle
• Part III - Real World Scenarios– Possible Memory Leak(s), Long Pause Times– Premature Promotion, System GCs, Low Throughput– Healthy Application, Maxed Allocation Rate
3Thursday, 2 May 13
What we're not covering
• G1 Collector– It's supported in production now– But we doubt any of you are using it yet
• Non Hotspot JVMs– Again, most of you are using OpenJDK/Oracle.– Azul's Zing VM is a specialist VM you can look at
4Thursday, 2 May 13
Part I - Shining a light into the dark
• Retrospective
• Collector Flags ahoy
• Reading CMS Log records
• Tooling and basic data
5Thursday, 2 May 13
Java Heap Layout
Copyright - Oracle Corporation
6Thursday, 2 May 13
Weak Generational Hypothesis
Copyright - Oracle Corporation
7Thursday, 2 May 13
Copy Collectors
• aka "stop-and-copy"– Some literature will discuss "Cheney's algorithm"
• Used in many managed runtimes– Including Hotspot
• GC thread(s) trace from root(s) to find live objects
• Typically involves copying live objects– From one space to another space in memory– The result typically looks like a move as opposed to a copy
8Thursday, 2 May 13
Mark and Sweep Collectors
• Used by many modern collectors– Including Hotspot, usually for old generational collection
• Typically 2 mandatory and 1 optional step(s)1.Find live objects (mark)2.'Delete' dead objects (sweep)3.Tidy up - optional (compact)
9Thursday, 2 May 13
More Flags than your Deity
Copyright Frank Pavageau
10Thursday, 2 May 13
'Mandatory' Flags
• -Xloggc:<pathtofile>– Path to the log output, make sure you've got disk space!
• -XX:+PrintGCDetails– Minimum information for tools to help– Replace -verbose:gc with this
• -XX:+PrintTenuringDistribution– Premature promotion information
11Thursday, 2 May 13
Basic Heap Sizing Flags
• -Xms<size>– Set the minimum size reserved for the heap
• -Xmx<size>– Set the maximum size reserved for the heap
• -XX:MaxPermSize=<size>– Set the maximum size of your perm gen– Good for Spring apps and App servers
• We'll cover other flags in a tuning context
12Thursday, 2 May 13
Beware of Magic Happening
• When you touch GC Flags a Puppy dies
• Your Tenuring Threshold jumps to 15
• -XX:MaxTenuringThreshold=n– To reset this to what you really want
13Thursday, 2 May 13
Tooling• HPJMeter (Google it)
– Solid, but no longer supported / enhanced
• GCViewer (http://www.tagtraum.com/gcviewer.html)– Has rudimentary G1 support
• GarbageCat (http://code.google.com/a/eclipselabs.org/p/garbagecat/)– Best name
• IBM GCMV (http://www.ibm.com/developerworks/java/jdk/tools/gcmv/)– J9 support
• jClarity Censum (http://www.jclarity.com/products/censum)– The prettiest and most useful, but we're biased!
14Thursday, 2 May 13
Don't listen to the vendors ;-)
• Single log with consistent format?– You can probably grep for stuff– This doesn't scale
• Existing free tools are adequate*– *For older JVMs especially– Most are no longer actively maintained
• Latest tooling does more for you– Supports Latest JVMs & Collectors– Has more meaningful visualisations– Starts to do some of the Human analysis for you– Correlates and performs historical analysis– Parses certain data out that the others don't
15Thursday, 2 May 13
Summary Data
16Thursday, 2 May 13
Heap Usage After GC
17Thursday, 2 May 13
Recovered Heap
18Thursday, 2 May 13
Allocation Rates
19Thursday, 2 May 13
Pause Times
20Thursday, 2 May 13
Perm Space
21Thursday, 2 May 13
Tenuring Threshold
22Thursday, 2 May 13
Part II - Setting the stage
• When to Tune
• Latency / Throughput / Footprint– aka Performance goals
• Application Lifecycle
• Know your Hardware
23Thursday, 2 May 13
When to tune GC
• As part of a performance diagnostic process– After looking machine metrics– Before execution profiler
• It's cheap to switch on GC flags– It's cheap to eliminate or pin issue on GC– It's not cheap to setup execution profilers
• Result is either "GC is OK" or "GC is not OK"– Tune the GC and/or– Bring out the memory profiler
24Thursday, 2 May 13
Latency vs Throughput vs Footprint
• aka performance goals:– e.g. "Max Pause Times / 95th% Pause Times" vs– "Object Allocation Rate" vs– "Heap Size"– Throughput ~= % of time doing application work
• Tuning tradeoff– Latency x Throughput x Footprint = Z– You can typically tune for 2/3 of these– To increase Z you need to
• increase allocated hardware OR• Rewrite your app
• Decide what characteristics you want!– Before tuning
25Thursday, 2 May 13
Latency vs Throughput vs Footprint
• Better Throughput– Usually means worse Latency and Footprint
• Better Latency– Usually means worse Throughput
• Better footprint– Usually means worse Throughput
26Thursday, 2 May 13
Application Lifecycle
• Very little point in tuning based off limited information– Have you gathered enough data– Has your application gone through it's typical lifecycle?– This is why we don't run 'Live Demos'
• Very little point in tuning off incorrect information– Application start-up, shutdown and batch jobs are all outliers
• You can infer amazing things from GC logs– When Richard went to lunch– When John stopped playing Minecraft– When Ben kicked off the weekly customer report– .....
27Thursday, 2 May 13
Know your Hardware
• Number of CPU cores, matters– Allocate X threads to do GC work with a concurrent collector– How many is 'safe'?– How does that affect throughput?
• Memory Bandwidth, matters– How quickly can your hardware allocate?– See your manufacturer– Object Allocation Rates != Memory Bandwidth != Real Metric
• Use Hawkshaw to explore your hardware– Produces GC behaviour according to statistical models– http://www.github.com/jclarity/hawkshaw
28Thursday, 2 May 13
Part III - Tuning Scenarios
• Tuning can make it worse!
• Grain of Salt
• Scenarios– Possible Memory Leak(s)– Long Pause Times– Premature Promotion– System GCs– Low Throughput– Healthy Application– Maxed Allocation Rate
29Thursday, 2 May 13
Tuning can make it worse*
• Performance Tuning is an iterative process– Sometimes solving one problem uncovers a 2nd worse
problem– e.g. Fix the app, then the database gets hammered
• Overall performance goes down
• Only fix one aspect of GC at a time– Measure the next cycle with fresh eyes– Have you met your goals or made them worse?
• GC tuning still needs human interaction– Azul's Zing can/will claim otherwise.
30Thursday, 2 May 13
Grain of Salt
"Nothing that we say should be held as
performance tuning tips for *your* application"
"There is *always* more than one way to tune in order to meet your goal"
"Don't just use our numbers!"
31Thursday, 2 May 13
A Likely Memory Leak
• Memory leaks can't truly be ascertained by a GC log– It could just be an undersized heap!– Needs Human domain knowledge of app (periodicity)
• First rule of thumb is to increase your heap– Rule out having an undersized heap
• Second rule of thumb is to fire up the Memory profiler– Visual VM will do in most cases
32Thursday, 2 May 13
A Likely Memory Leak
• Only 1000 seconds, look at number of Full GC's, highly indicative. Note trend along the bottom.
33Thursday, 2 May 13
A Possible Memory Leak - I
• Note: trend along the bottom, slow leak possible. Look for cycles in the log e.g. A full day in an application's life.
34Thursday, 2 May 13
A Possible Memory Leak - II
• Note: Trend along the bottom, slow leak possible. Again, look for cycles in the log.
35Thursday, 2 May 13
Using a Memory Profiler
• Visual VM– Memory profiler - invasive and slow on large apps– Look at object ages (aka Generations)
• Look for high number of generations– They're a candidate– Make sure you switch on record allocation stack traces
• Use allocation stack trace to find root cause– Track back from core JRE classes to your code– Yes, it's always your code that's the problem!
• Can also try jmap -histo
36Thursday, 2 May 13
Visual VM - Memory Profiler
• Note: Objects in many generations! Indicative they're leaking
37Thursday, 2 May 13
Visual VM - Stack Trace
• NThreadedManagedCache$ProduceKey.run() root cause
38Thursday, 2 May 13
Long Pause Times
• The #1 complaint relating to GC– Lots of ways to mitigate– From small tuning tweaks --> off Heap solutions
• User reports paused/locked application!– e.g. Web pages taking ages to load– e.g. Progress bars stalling
• Tech Support want to uninstall Java!
39Thursday, 2 May 13
Long Pause Time Example
• User has set heap to: -Xms5G -Xmx5G
• NOTE: Resident Set Size ~1GB
40Thursday, 2 May 13
Long Pause Time Example
• ~125ms young gen pauses & ~500ms Full GC pauses– OK for web app, but this is a new prototype low latency trading app or
Media Streaming app or Advertising service, oh dear!
41Thursday, 2 May 13
Long Pause Time partial fix
• Reduce heap size -Xmx1500M, more frequent, shorter pauses
42Thursday, 2 May 13
Long Pause Time partial fix
• ~20ms young gen pauses & ~250ms Full GC pauses, Better!
43Thursday, 2 May 13
Long Pause Time 'fixed'
• Move to a CMS collector, hopefully shorter pauses
• No Full GC's! Therefore minimal Tenured pauses
44Thursday, 2 May 13
Long Pause Time 'fixed!'
• ~10ms young gen pauses, ~2ms tenured pauses, Better!
• BUT: Throughput decreased from 69% down to 49% :-(
45Thursday, 2 May 13
Other Long Pause Time Solutions
• Increase number of threads performing GC– -XX:ParallelGCThreads=N– Rule of thumb is to use 3/4 the available physical cores– Can reduce application throughput - can be bad– Can increase context switching - bad
• Try an alternative collector– ParNew/CMS vs PSScavenge/ParOld vs iCMS vs G1 etc– Match the collector to your application and hardware
• Special note on G1– You can set pause time goals– BUT: We haven't reliably succeed for <100ms pause times
46Thursday, 2 May 13
Extreme Long Pause Time Solutions
• Azul's Zing JVM– This has a proven low pause time goal settings– JCK/TCK compliant– Typically needs a very large heap (15GB+)
• Take memory off heap– Good for caches in particular
• GC in offline mode– Cluster app and offline nodes in order to run GC on them
47Thursday, 2 May 13
Premature Promotion
• User reports more pauses and/or longer pauses
• Tech support reports there are more full GC's
• Objects are promoted to Tenured too early– Recall the Young Generational Hypothesis!– This causes more Old Gen collections
• Which can lead to more Full GCs
48Thursday, 2 May 13
Premature Promotion Example
Customer had set:
-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:+PrintGCDetails-Xloggc:gc.log-Xmx1024m-XX:+PrintTenuringDistribution-XX:NewRatio=2-XX:MaxTenuringThreshold=4
NewRatio=2 means young gen gets ~1/3 of the total heap
49Thursday, 2 May 13
Premature Promotion Example
• Note: ~26% of objects promoted at age 1
50Thursday, 2 May 13
Premature Promotion 'Fixed'
• We dropped the NewRatio=1, Premature Promotion ~4%– Young Generational Hypothesis is a better fit– This gives the Young Gen ~1/2 the heap
51Thursday, 2 May 13
System GC's
• User reports frequent pauses– System GC's are Full GCs!
• Tech support reports there are more full GC's– With this funny System wording in the log
• System GCs often interfere with the GC subsystem– JVM no longer resizes heap based on runtime info
• Caused by System.gc() in code or an RMI call– Very occasionally used to solve a problem– System.gc() is almost always honoured– You can disable it -XX:+DisableExplicitGC
52Thursday, 2 May 13
System GC example
• NOTE: 34,000 system GC's, every 1/2 second– Throughput 51% - Unhappy Minecraft players!
53Thursday, 2 May 13
System GC calls 'Fixed'
• -XX:+DisableExplicitGC
• Throughput went to 99.8% - Happier Minecraft players
54Thursday, 2 May 13
Low Throughput
• User reports slow application– e.g. Batch job fails to complete on time
• Tech support reports there are lots of GC's
• Lots of small GC's can also be bad!– Your application threads aren't able to allocate objects– i.e. Low Throughput
• Throughput increases when system is quiet– Be careful in analysing the right period of activity
55Thursday, 2 May 13
Low Throughput example 1/4
• 61 seconds in total pause time, log is only 170 seconds long
• Throughput is 64% --> Rule of thumb, should be 95%+
56Thursday, 2 May 13
Low Throughput example 2/4
• Lots of small pauses from various collectors, which ones?
57Thursday, 2 May 13
Low Throughput example 3/4
• ~25% time spent in young GC & ~5-10% in Full GCs (CMFs)
58Thursday, 2 May 13
Low Throughput example 4/4
• Object allocation hitting max heap size– Able to recover memory, so no leak, needs a bigger heap!
59Thursday, 2 May 13
Low Throughput 'Fixed' 1/4
• Increased footprint to -Xmx1024M
60Thursday, 2 May 13
Low Throughput 'Fixed' 2/4
• Lots less pauses from Full GCs CMF's - just looks nicer!– Still lots of young gen pauses
61Thursday, 2 May 13
Low Throughput 'Fixed' 3/4
• ~15% time spent in young GC & ~0% in Full GCs
62Thursday, 2 May 13
Low Throughput 'Fixed' 4/4
• Note: 33 seconds out of 170, ~81% Throughput, Better!
63Thursday, 2 May 13
Low Throughput 'Really Fixed' 1/2
• Switched to PSYoungGen collector (from ParNew)– Worth trying as young gen collections are dominant
64Thursday, 2 May 13
Low Throughput 'Really Fixed' 2/2
• Note: 9 seconds out of 170, ~95% Throughput, Best!
65Thursday, 2 May 13
Healthy Application
• What is healthy? It depends!
• Throughput– Typically a 95%+ throughput is good
• Pause times– < 1sec is good for generic web apps
• Footprint– Smaller == Less live objects to track == Better?
66Thursday, 2 May 13
Healthy Application
• Saw tooth pattern
• Bottom of troughs trend line is flat
67Thursday, 2 May 13
Healthy Minecraft Client!
• Note: JVM resizing itself, you let IT do the work!
68Thursday, 2 May 13
Maxed Allocation Rate
• User reports slow application behaviour
• Tech support has no idea why!– Normally you'd do a full performance diagnostic– But we can look at GC cheaply
• GC logs can help with non GC problems!– Memory Bandwidth limits are being hit– Not a GC problem!
• More common in virtualised environments– What else on the hardware is using bandwidth?
69Thursday, 2 May 13
Not Maxed Allocation Rate Example
70Thursday, 2 May 13
Max Allocation Rate Example
• 8GB/sec - could be getting close to real memory bandwidth
71Thursday, 2 May 13
Max Allocation Rate Example
Hard limit at ~8GB (8e+06 on graph)
72Thursday, 2 May 13
Max Allocation Rate 'Fixes'
• Lots you can do!
• Stop allocating so much!– Get out your Memory profiler– Alter the applications's objection allocation behaviour
• Get better hardware!– CPU– Faster Bus– Faster RAM
• Don't virtualise/share– Have your application be the only thing on that hardware
73Thursday, 2 May 13
Summary
• You need to understand some basic GC theory– Work with the Weak Generational Hypothesis– See http://www.insightfullogic.com for blog posts
• Turn on GC logging!– It has low overhead*– Reading raw log files is hard– Use tooling!
• Tradeoff: Pause Times vs Throughput vs Heap Size– Use tools to help you tweak– "Empirical Science Wins!"
74Thursday, 2 May 13
Join our performance community
http://www.jclarity.com
Martijn Verburg (@karianna)Dr. John Oliver (@johno_oliver)
75Thursday, 2 May 13