@pingtimeout Scala.IO – 24&25 oct 13
Pimp my GC-
Supersonic Scala !
@pingtimeout Scala.IO – 24&25 oct 13
/me
● Pierre Laporte● “Java Performance Tuning” Trainer● Perfs issues, logs GC eye-compliant
http://www.pingtimeout.fr
@pingtimeout
@pingtimeout Scala.IO – 24&25 oct 13
Agenda
● 42 minutes of– Fun (Theory)
– Fun (Practice)
– Fun (Feedbacks)
– Fun (Questions/Answers)
– Fun (Trolls)
● Because performance is fun !
@pingtimeout Scala.IO – 24&25 oct 13
Disclaimer
● Be critical with the information contained in this talk
● JVM Tuning is always made on a case-by-case basis. There is no magic, no special set of flags that produces good results on every project.
● The resemblance of any opinion, recommendation or comment made during this presentation to performance tuning advice is merely coincidental.
@pingtimeout Scala.IO – 24&25 oct 13
Weak Generational Hypothesis 101
@pingtimeout Scala.IO – 24&25 oct 13
Theory – Weak Generational Hypothesis
@pingtimeout Scala.IO – 24&25 oct 13
Theory – Weak Generational Hypothesis
@pingtimeout Scala.IO – 24&25 oct 13
Theory – Weak Generational Hypothesis
● “Most objects die young”
● Possible scales :– MB, GB, TB
– Minutes, hours, days
@pingtimeout Scala.IO – 24&25 oct 13
Examples – Weak Generational Hypothesis
GB
3j
Total : 145 GBAvg : 48 GB/j
@pingtimeout Scala.IO – 24&25 oct 13
Examples – Weak Generational Hypothesis
TB
10j
Total : 30 TBAvg : 3TB/j
@pingtimeout Scala.IO – 24&25 oct 13
Examples – Weak Generational Hypothesis
● 35 GB/j– Scala
– Play 2
– Akka
● 3 TB/j– Java
– Tomcat
– Jax-RS / Spring / Hibernate...
@pingtimeout Scala.IO – 24&25 oct 13
Examples – Weak Generational Hypothesis
Don't forget !– Be critical
– Case-by-case analysis
– Please don't do that
---->
@pingtimeout Scala.IO – 24&25 oct 13
JVM Heap 101
@pingtimeout Scala.IO – 24&25 oct 13
Theory – Memory pools
● Java Heap – 2 memory pools
(Except for G1 GC)
● Young Generation for... young objects● Old Generation for... old objects !!!
Amazing, right ?
@pingtimeout Scala.IO – 24&25 oct 13
Theory – Memory pools
@pingtimeout Scala.IO – 24&25 oct 13
Theory – Memory pools
● Young Generation = Eden + Survivors● Every object is created in Eden*
* : except when it is too big to fit in Eden
* : except in special cases for G1 GC
@pingtimeout Scala.IO – 24&25 oct 13
Theory – Memory pools
@pingtimeout Scala.IO – 24&25 oct 13
Why memory pools ?!
● Always 2 GC per JVM*
* Except for G1 GC
● Young GC– Cheap
– Duration mostly ≈ O(Live data in YG)
● Old GC– Expensive
– Duration mostly ≈ O(Live data in OG)
@pingtimeout Scala.IO – 24&25 oct 13
Why memory pools ?!
Common GC Name
Young Gen GC Old Gen GC
“Parallel GC” PSYoungGen ParOldGen
“CMS” ParNew CMS
“G1 GC” G1
@pingtimeout Scala.IO – 24&25 oct 13
GC Duration ?!
Prove it!
@pingtimeout Scala.IO – 24&25 oct 13
App with small live set
@pingtimeout Scala.IO – 24&25 oct 13
App with big live set
@pingtimeout Scala.IO – 24&25 oct 13
Experiment 1
● 1st run (SmallLiveSet)– 50 GB heap (-ms50g -mx50g)
– 49.9GB Young Gen (-Xmn49900m)
– GC logs
@pingtimeout Scala.IO – 24&25 oct 13
Experiment 1
● 1st run (SmallLiveSet)– 50 GB heap
● -ms50g -mx50g
– 49.9GB Young Gen● -Xmn49900m
– GC logs
● Result :– 6ms YGC pauses to free 38GB of memory
@pingtimeout Scala.IO – 24&25 oct 13
Experiment 1 : Result
[PSYoungGen: 38329728K->6496K(44710400K)] 38329744K->6512K(46041600K), 0.0067050 secs] //...
● 38.329.728K data before GC in YG, 6.496K after● YG size is 44.710.400K● 38.329.744K data before GC in heap, 6.512K after● Heap size is 46.041.600K● Total pause time : 6.7ms
@pingtimeout Scala.IO – 24&25 oct 13
Experiment 2
● 2nd run (SmallLiveSet)– 50 GB heap (-ms50g -mx50g)
– 10MB Young Gen (-Xmn10m)
– GC logs
@pingtimeout Scala.IO – 24&25 oct 13
Experiment 2
● 1st run (SmallLiveSet)– 50 GB heap
● -ms50g -mx50g
– 10MB Young Gen● -Xmn10m
– GC logs
● Result :– 322ms Full GC pauses to free 52GB of memory
@pingtimeout Scala.IO – 24&25 oct 13
Experiment 2 : Result
[Full GC [PSYoungGen: 3072K->0K(7168K)] [ParOldGen: 52418151K->30287K(52418560K)] 52421223K->30287K(52425728K)//... 0.3229410 secs]
● 52.418.151K data before GC in OG, 30.287K after
● OG size is 52.418.560K
● 52.421.223K data before GC in heap, 30.287K after
● Heap size is 52.425.728K
● Total pause time : 322.9ms
@pingtimeout Scala.IO – 24&25 oct 13
Experiments 1->4, Wrap up
● 1st and 2nd runs with BigLiveSet– Ran out of time* :-(
*: Stopped measuring at Heap occupancy ≈ 22GB
● GC Pauses :
Live setSmall 6 millis 322 millis (Full GC)Big 55 secs (Full GC)* 250 secs (Full GC)*
@pingtimeout Scala.IO – 24&25 oct 13
Immutability
Is immutability a problem ?
@pingtimeout Scala.IO – 24&25 oct 13
Immutability
● What does this code do ?
(GC point of view ?)
@pingtimeout Scala.IO – 24&25 oct 13
Immutability
@pingtimeout Scala.IO – 24&25 oct 13
Immutability
● What does this code do ?– Create more temporary objects that dies young
– Respect Weak Generational Hypothesis
@pingtimeout Scala.IO – 24&25 oct 13
Immutability
● Consequences compared to mutable state– GC will run more frequently
– GC time will be short
O(Live data in YG)
@pingtimeout Scala.IO – 24&25 oct 13
Tuning for immutability
● Reduce YGC frequency (for ParallelGC and CMS)– Identify allocation rate (MB/seconds)
– Define the GC interval (seconds between GCs)
=> Set Eden = Allocation rate * GC interval
@pingtimeout Scala.IO – 24&25 oct 13
Tuning for immutability
● Reduce YGC frequency (for ParallelGC and CMS)– AR = 200 MB/s
– Desired interval = 1 YGC every 4 seconds
=> Set Eden to 800 MB (Young to 1 GB)
-Xmn1g
@pingtimeout Scala.IO – 24&25 oct 13
Poney Pause
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC time !
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC
● Idea– Split the heap in
2048 regions
– Associate on-the-fly oneregion to a memory pool
– Increase/Shrink memorypool at runtime
http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC
● Memory pools :– Young (Eden, Survivors)
– Old
– Humongous
● Humongous:– Objects >= 50% region
http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC
● 1 ½ GC algorithm:– Always collect Young Gen
– Collect Old Gen if possible● Best regions only● Time budget large enough● Preconditions● “mixed” collection
● G1 is self-tuning
http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC Tuning
● Define GC time budget
-XX:MaxGCPauseMillis=<N>
-XX:GCPauseIntervalMillis=<M>
● Set Xms == Xmx● Drop all other GC-related flags
-Xmn, -XX:TenuringThreshold, -XX:NewRatio
-XX:InitiatingHeapOccupancyPercent, …
● Don't try to outsmart the GC
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC Tuning
● Enable GC logs
-Xloggc:gc.log
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+PrintGCCause
-XX:+PrintAdaptiveSizePolicy
● Wait and see
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC Tuning – Low hanging fruits
● Eliminate Humongous allocations– Humongous regions collected only at Full GC
– Or when empty
[G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 0 bytes, allocation request: 79012360 bytes, threshold: 47185920 bytes (45.00 %), source: concurrent humongous allocation]
[G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: requested by GC cause, GC cause: G1 Humongous Allocation]
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC Tuning – Low hanging fruits
● Eliminate Humongous allocations– Humongous regions collected only at Full GC
– Or when empty
2013-10-21T19:23:48.758+0200: [GC pause (G1 Humongous Allocation) (young) (initial-mark) Desired survivor size 1572864 bytes, new threshold 15 (max 15) , 0.0015120 secs]
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC Tuning – Low hanging fruits
● Eliminate Humongousallocations– Track your big allocations
– Kill'em !
● Why ?– Fragments the heap
– Can cause evacuationsfailures
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC Tuning – Low hanging fruits
● Get rid of “mixed collections”– Increase heap size
– Set a higher threshold for mixed collections
-XX:InitiatingHeapOccupancyPercent=<N>
● Why ?– Some phases of G1 are STW (like “baaaaad”)
– G1 goal : find the best candidates among all old regions
@pingtimeout Scala.IO – 24&25 oct 13
G1 GC Tuning – Low hanging fruits
● Eliminate “Evacuation/Allocation failures”– They are our good old Full Gcs
[GC pause (G1 Evacuation Pause) (young)
//...
[Full GC (Allocation Failure)
5860M->2690M(7000M), 0.9824032 secs]
@pingtimeout Scala.IO – 24&25 oct 13
Summary
● Performance is fun !● Understand what you do● Immutability is not an issue (by itself)
– Bad code is.
● GC Duration ≈ O(Live data) ● G1 is self-tuning
– Try it :-)
@pingtimeout Scala.IO – 24&25 oct 13
Thank you for listening !
For more information :
http://www.pingtimeout.fr
@pingtimeout