Post on 04-Aug-2015
transcript
Garbage Collectors
Haim Yadid, Head of Performance and Application Infra Group
Motivation & Goals
Why Java has GC?
Memory management is hard malloc/free
Memory leaks (dangling pointers) Heap corruption (Free an object twice)
Reference counting Cyclic graphs
Garbage Collectors
GCMe allocate
myObject = new MyClass(2015) GC cleans So everything is cool !
Goal: Minimize Memory Overhead
Garbage collector memory overhead Internal structures Additional memory required for GC Policy and amount of memory generated before GC
Garbage Collectors
Goal: Application Throughput
No garbage collection at all --> Application throughput 1 (100% of the time) If in average garbage collection
consumes x milliseconds Every y milliseconds
Throughput is (y-x)/y E.g. if GC consumes 50 ms every second Throughput is 95% (GC overhead is 5%)
Garbage Collectors
GCtime ————— Total Time
Goal: Responsiveness
Pause Time Garbage collector stops the application During that time the application is not responsive What is the maximal delay your application can sustain:
Batch applications seconds… Web applications ½ second? Swing UI 100 ms max Trading application milliseconds Robotics microseconds….
Garbage Collectors
Goals
Memory footprint Throughput Max pause time
Garbage Collectors: G1
Contradicting! Choose 2 of 3
Goal: Fast Allocation
Maintaining free list of objects tend to lead to fragmentation Fragmentation increases allocation time A well known problem of C/C++ programs
Garbage Collectors
Goal: Locality
TLAB - thread local allocation buffer Maintaining locality utilizes CPU cache Linear allocation mechanism Per thread buffer allocated on first object alloc after new Gen GC. Small objects allocated linearly on that buffer FAST 10 machine instructions
Resized automatically ( ResizeTLAB=true) Based on TLABWasteTargetPercent=1
Garbage Collectors
Terminology
Sizing Heap
Heap the region of objects reside |H| - The size of the heap #H - number of object in the heap
Garbage Collectors
GC Root
An object which is references from outside the heap or heap region.
Java Local JNI Native stack System class Busy monitor Etc…
Garbage Collectors
Sizing : Live Set
Live Set : Object that are reachable from garbage collection roots #LS - number of object in the live set |LS| - The size of live set
Garbage Collectors
or heap region. Java Local JNI Native stack System class Busy monitor Etc…
Mutator
Application threads The stuff that create new objects and changes state The stuff that makes the garbage collector suffer Allocation rate - how fast new objects are allocated Mutation rate - how fast app changes references of live objects
Garbage Collectors
Collector
Pauseless Collector (Concurrent Collector )vs Stop the world (STW) Collector Serial Collector -Single threadedvs Parallel collector -Multi threaded Incremental vs Monolithic Conservative vs Precise
Garbage Collectors
GC Safepoint
A point in thread execution where it is safe for a GC to operate References inside thread stacks are identifiable If a thread is in a safe point GC may commence Thread will not be able to exit a safe-point until GC ends Global Safe-point : All threads enter a safe point Safe points must be frequent
Garbage Collectors
Building Blocks: Mark and Sweep
Mark: O(#LS) Every object holds a bit initially set to 0 Start with GC roots DFS traversal every visited object change to 1
Sweep O(|H|) Objects with 0 are added to the empty list
Downside: Heap fragmentation
Garbage Collectors
AB
C
E F
D A B C D F
Building Blocks: Copy CollectorMark: O(#LS)
Same as mark and sweep Copy: O(|LS|) = O(#LS)
All live objects are copied to an empty region(to space) No fragmentation Downside:
Need twice as much memory Copy is very expensive STW - mutators cannot work at the same time as copy
Garbage Collectors
A B C D F A B C D F
from space to space
Building Blocks: Mark (Sweep)Compact
Mark: O(#LS) Same as mark&sweep
Sliding compaction O(|H|+|LS|) move objects (relocate) fix pointers(remap)
Compact to the beginning of the heap Do not need twice the memory Copy is more delicate and may be slower
Garbage Collectors
The Weak Generational Hypothesis
Most objects survive for only a short period of time. Low number of references from old to new objects
Garbage Collectors
Generational Garbage Collectors
Since JDK 1.2 all collectors are generational and advantage of the WGH Different Collectors can be chosen for each generation
New generation collector Tenured generation collector
GC roots to young gen are maintained by a “remembered set”
Garbage Collectors
Oracle Hotspot Garbage Collectors
Serial (Serial, MSC) Train Collector (history) Parallel Collectors (a.k.a throughput collector) Concurrent collector (CMS) iCMS incremental CMS G1GC (Experimental)
Garbage Collectors
Major Memory Regions
Monitoring the JVM
Young PermTenured Code Cache
Young generation Further divided into:
Eden A “from” survivor space A “to” survivor space
Tenured (old) generation Permanent generation/Meta Space Code Cache
Heap Non Heap
NativeNative
Object Life Cycle
Most objects are allocated in Eden space. When Eden fills up a minor GC occurs Reachable object are copied “to” survivor space. There are two survivor spaces surviving objects are copied alternately from one to the other.
Eden S0S1
Eden S0S1
Eden S0S1
1 2 3
Object Promotion
Objects are promoted to the old generation (tenured) when:
surviving several minor GCs Survival spaces fill up
Serial Collectors
Single threaded Stop the world Monolithic Efficient (no communication between threads) The default on certain platforms
Garbage Collectors:Serial
Serial New Collector
-XX:+ UseSerialGC Serial new Single threaded young generation collector Triggered when:
Eden space is full. An explicit invocation or call to System.gc().
Garbage Collectors: Serial
MSC (Serial Old)
Single threaded tenured generation collector Mark & Sweep Compact Events which initiate a serial collector garbage collection
Tenured generation space is unable to satisfy an object promotion coming from young generation. An explicit invocation or call to System.gc().
Garbage Collectors: Serial
Serial Collector: Suitability
Well suited for single processor core machines CPU affinity (one-to-one JVM to processor core configuration) Tends to work well for applications with small Java heaps, i.e. less than 128mb- 256mb
Garbage Collectors:Serial
Train Collector
Introduced in Java 1.3 Divides the heap into small chunks Incremental Experimental Discontinued on Java 1.4
Garbage Collectors
Parallel Collector
Multi-threaded Monolithic Stop the world Three variants
-XX:+UseParallelGC -XX:+UseParallelOldGC
The default on most platforms
Garbage Collectors
Managing collector threads
Number parallel collector threads controlled by -XX:ParallelGCThreads=<N>
Defaults to Runtime.availableProcessors(). In a JDK 6 update release, 5/8ths available processors if > 8 Multiple JVM per machine configurations, set -XX: ParallelGCThreads=<N> such that sum of all threads < NCPU
Garbage Collectors
Parallel GC Triggering
Same as serial collector Events which initiate a minor garbage collection
Eden space is unable to satisfy an object allocation request. Results in a minor garbage collection event.
Events which might initiate a full garbage collection Tenured generation space unable to satisfy an object promotion coming from young generation. An explicit invocation or call to System.gc().
Garbage Collectors
Parallel Collector:Suitability
Reduce garbage collection overhead on multi-core processor systems Reduce pause time on multi-core systems Best throughput Pause time may be reasonable when heap size < 1GB
Garbage Collectors
CMS Collector
(Mostly) Concurrent mark and sweep tenured space collector Runs mostly concurrently with application threads Do not compact heap! Enabled with -XX:+UseConcMarkSweepGC ParNew - Parallel, multi-threaded young generation collector enabled by default. working with CMS
Garbage Collectors
Alas!
Lower throughput Requires more memory 20-30% more Concurrent mode failure will fallback to a stop the world full GC can occur when :
objects are copied to the tenured space faster than the concurrent collector can collect them. (“loses the race”) space fragmentation -XX:PrintFLSStatistics=1
Garbage Collectors
Concurrent Collector Phases
Concurrent collector cycle contains the following phases:
Initial mark (*) Concurrent mark Concurrent Pre-clean Remark (*) - second pass Concurrent sweep Concurrent reset
Garbage Collectors
The Concurrent Collector
Initial mark phase(*) Objects in the tenured generation are “marked” as reachable including those objects which may be reachable from young generation. Pause time is typically short in duration relative to minor collection pause times.
Concurrent mark phase Traverses the tenured generation object graph for reachable objects concurrently while Java application threads are executing.
Garbage Collectors
The Concurrent Collector Phases
Remark(*) Finds objects that were missed by the concurrent mark phase due to updates by Java application threads to objects after the concurrent collector had finished tracing that object.
Concurrent sweep Collects the objects identified as unreachable during marking phases.
Concurrent reset Prepares for next concurrent collection.
Garbage Collectors
PermGen Collection
Classes will not be collected during CMS concurrent phases Only during Full (STW) collection Explicitly instructed to do so using -XX:+CMSClassUnloadingEnabled and -XX:+PermGenSweepingEnabled, (the 2nd switch is not needed in post HotSpot 6.0u4 JVMs).
Garbage Collectors
Suitability
Application responsiveness is more important than application throughput More than one core Large Heaps > 1GB
Garbage Collectors
ExplicitGC and CMS
Explicit GC used to invoke the stop the world GC This can cause a problem with large heaps-XX:+ExplicitGCInvokesConcurrent (Java 6) -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses (requires Java6u4 or later).
Garbage Collectors
CMS Minor Collection Triggering
Minor collections are triggered as with the parallel collector The ParNew collector is built in such a way it can work in parallel with the CMS
Garbage Collectors
CMS Collection Triggering
Start if the occupancy exceeds a percentage threshold Default value is 92% -XX:CMSInitiatingOccupancyFraction=n where n is the % of the tenured space size.
Garbage Collectors
iCMS
Deprecated on Java 8 CMSIncrementalMode enables the concurrent modes to be done incrementally. Periodically gives additional processor back to the application resulting in better application responsiveness by doing the concurrent work in small chunks.
Garbage Collectors
Tune iCMS
CMSIncrementalMode has a duty cycle that controls the amount of work the concurrent collector is allowed to do before giving up the processor. Duty cycle is the % of time between minor collections the concurrent collector is allowed to run. Duty cycle by default is automatically computed using what's called automatic pacing. Both duty cycle and pacing can be fine tuned.
Garbage Collectors
Enabling iCMS Java 6
On JDK 6, recommend using the following two switches together:
-XX:+UseConcMarkSweepGC and -XX:+CMSIncrementalMode
Or use: -Xincgc
Garbage Collectors
Enabling iCMS Java5
On JDK 5 use all of the following switches together: -XX:+UseConcmarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
JDK 5 settings mirror the default settings decided upon for JDK 6. JDK 5's -Xincgc!= CMSIncrementalMode, it enables CMS
Garbage Collectors
iCMS Fine Tuning
If full collections are still occurring, then: Increase the safety factor using -XX:CMSIncrementalSafetyFactor=n The default value is 10. Increasing safety factor adds conservatism when computing the duty cycle. Increase the minimum duty cycle using-XX:CMSIncrementalDutyCycleMin=n The default is 0 in JDK 6, 10 in JDK 5. Disable automatic pacing and use a fixed duty cycle using -XX:-CMSIncrementalPacing and -XX:CMSIncrementalDutyCycle=n The default duty cycle is 10 in JDK 6, 50 in JDK 5.
Garbage Collectors
G1 Garbage collector
Garbage Collectors: G1
Garbage First GC
Stop the world Incremental Beta stage in Java6 and Java 7 Supported since Java7u4
Garbage Collectors: G1
Garbage First GC
Apply New generation harvesting to the tenured Gen Achieve soft real time goal consume no more than x ms of any y ms time slice while maintaining high throughput for
programs with large heaps and high allocation rates, running on large multi-processor machines.
Garbage Collectors: G1
Heap Layout
divided into equal-sized heap regions, each a contiguous range of virtual memory Region size is 1-32MB based on heap size target 2000 regions A linked list of empty regions Heap is divided to New generation regions and old generation regions
Garbage Collectors: G1
Each region is either Marked Eden Survivor Space Old Generation Empty Humongous
Heap Layout
Garbage Collectors: G1
E E S SEE E E
O O O O OH H H
E E E
G1 New Gen GC
Live objects from young generation are moved to Survivor space regions Old gen regions
STW pause Calculate new size of eden and new Survivor space
Garbage Collectors: G1
G1 Concurrent Mark
Triggered when entire heap reaches certain threshold Mark regions Calculate liveliness information for each region Concurrent Empty regions are reclaimed immediately
Garbage Collectors: G1
OldGen collection
Choose regions with low liveliness Piggyback some during next young GC Denoted GC Pause (mixed)
Garbage Collectors: G1
Humongous Objects
1/2 of the heap region size allocated in dedicated (contiguous sequences of) heap regions; these regions contain only the humongous object GC is not optimized for these objects
Garbage Collectors: G1
Command line Options
-XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=45
Garbage Collectors: G1
Valid Combinations
Tenured
Garbage Collectors: G1
Young
G1GC
Parallel ScavengeParNewSerial
Serial OldCMS Parallel
Old
Summary: -XX flags for JDK6
Garbage Collectors: G1
Collector -XX: Param "Serial" + "Serial Old“ UseSerialGC "ParNew" + "Serial Old“ UseParNewGC "ParNew"+"CMS" + "Serial Old“ * UseConcMarkSweepGC
"Parallel Scavenge" + "Serial Old" UseParallelGC
"Parallel Scavenge" + "Parallel Old" UseParallelOldGC
G1 Garbage collector ** UseG1GC
Alternatives
Garbage Collectors: G1
Azul C4
A proprietary JVM Commercial Pause-less Requires changes in the OS (Linux only) Achieves Throughput and Pause time May require more memory…. 32GB and above Useful for large heaps (1TB is casual)
Garbage Collectors: C4
Shenandoah
JEP 189 for Open JDK Developed in RedHat An Ultra-Low-Pause-Time Garbage Regions same as G1 Concurrent collection w/o memory barrier Not a generational collector Brooks forwarding pointer
Garbage Collectors: Shenandoah
Ref
Object
Weird References
Garbage Collectors: G1
Object Life Cycle
Garbage Collectors: References
Created
Initialized Strongly Reachable
Softly Reachable
Weakly Reachable
Finalized
Phantom Reachable
Finalizers
Create performance issues For example: do not rely on a finalizer to close file descriptors. Try to limit use of finalizer as safety net Use other mechanisms for releasing resources. Keep the work being done as short as possible.
Garbage Collectors: References
Finalizers and GC
Inside a finalizer you have a reference to your object and technically you may resurrect it. Objects which have a finalize method will need two cycles of GC in order to be collected Can lead to OOM errors A resurrected object may be reachable again but it finalize method will not run again. In order to prevent this problem use phantom references….
Garbage Collectors: References
Finalizers and Thread Safety
Finalizers are executed from a special thread According to the Java memory modelupdates to local variable may not be visible to the Finalization thread Occurs when GC happen too soon In order to ensure correct memory visibility one need to use a sync block to force coherency
Garbage Collectors: References
Reference Objects drawbacks
lots of reference objects also give the garbage collector more work to do since unreachable reference objects need to be discovered and queued during garbage collection.
Reference object processing can extend the time it takes to perform garbage collections, especially if there are consistently many unreachable reference objects to process.
Garbage Collectors: References
GC Tuning
Tuning Memory: Tuning GC
Sizing Generations
VisualVM's Monitor tab Jconsole's Memory tab VisualGC's heap space sizes GCHisto jstat's GC options -verbose:gc heap space sizes -XX:+PrintGCDetails heap space sizes
Tuning Memory: Tuning GC
Logging GC activity
The three most important parameters are: -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:<logfile>
If you want less data -verbose:gc
Tuning Memory: Tuning GC
Log File Rotation
-XX:+UseGCLogFileRotation Control log file size
-XX:GCLogFileSize=1000 Number of log files
-XX:NumberOfGCLogFiles=10 Starting from Java 7
Tuning Memory: Tuning GC
Verbose GC on runtime
Java.lang.Memory Set verbose attribute to true
Tuning Memory: Tuning GC
The GC Log
Log every GC occurs PrintGCDetails (more verbose)
[GC [DefNew: 960K->64K(960K), 0.0047410 secs] 3950K->3478K(5056K), 0.0047900 secs]
Verbose:gc (less verbose) [GC 327680K->53714K(778240K), 0.2161340 secs]
Before->After(Total), time
Tuning Memory: Tuning GC
CMS Collector
Minor collections follow serial collector format. [Full GC [CMS: 5994K->5992K(49152K),0.2584730 secs] 6834K->5992K(63936K), [CMS Perm: 10971K->10971K(18404K)], 0.2586030 secs]
[GC [1 CMS-initial-mark: 13991K(20288K)] 14103K(22400K), 0.0023781 secs] [CMS-concurrent-preclean: 0.044/0.064 secs] [GC [1 CMS-remark: 16090K(20288K)] 17242K(22400K), 0.0210460 secs]
Tuning Memory: Tuning GC
Additional information
More information with the flags -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintTenuringDistribution
Tuning Memory: Tuning GC
GC Log Quirks
CMS GC logs may get garbled Due to concurrency of the different GC mechanisms Analysis tools should be able to overcome this problem Not easy at all and may need to fix manually.
Tuning Memory: Tuning GC
J9
-Xverbosegclog:<file path> Structured xml
Tuning Memory: Tuning GC
<gc-end id="92" type="scavenge" contextid="88" durationms="4.464" usertimems="4.000" systemtimems="0.000" timestamp="2014-01-21T16:37:18.953"> <mem-info id="93" free="5472664" total="8388608" percent="65"> <mem type="nursery" free="655360" total="2097152" percent="31"> <mem type="allocate" free="655360" total="1179648" percent="55" /> <mem type="survivor" free="0" total="917504" percent="0" /> </mem> <mem type="tenure" free="4817304" total="6291456" percent="76"> <mem type="soa" free="4502936" total="5977088" percent="75" /> <mem type="loa" free="314368" total="314368" percent="100" /> </mem> <pending-finalizers system="2" default="0" reference="24" classloader="0" /> <remembered-set count="1770" /> </mem-info> </gc-end>
Visualization Tools
Tuning Memory: Tuning GC
JClarity: Cesnum
Commercial product Give recommendations
Tuning Memory: Tuning GC
GCViewer
Open source originally by Tagtraum industry A new fork exists in github latest version 1.33 https://github.com/chewiebug/GCViewer Supports G1GC Java6/7 Not bullet proof
Tuning Memory: Tuning GC
GCViewer
Tuning Memory: Tuning GC
GCViewer View
Choose which info to view
Tuning Memory: Tuning GC
GCViewer -Summary
Tuning Memory: Tuning GC
GCViewer - Memory
Tuning Memory: Tuning GC
GCViewer -Pause
Tuning Memory: Tuning GC
VisualGC
Tuning Memory: Tuning GC
Explicit GC
Do not use System.gc() unless there is a specific use case or need to. Disable Explicit GC: -XX:+DisableExplicitGC. Default RMI distributed GC interval is once per minute, (60000 ms).
Use: -Dsun.rmi.dgc.client.gcInterval =3600000 -Dsun.rmi.dgc.server.gcInterval =3600000 When using JDK 6 and the Concurrent collector also use -XX:+ExplicitGCInvokesConcurrent
Tuning Memory: Tuning GC
Interpretation
Tuning Memory: Tuning GC
The Jigsaw effect
Garbage collection makes heap diagrams over time look like a jigsaw. Minor collections visualize as small teeth Full collections visualize as large teeth
Tuning Memory: Tuning GC
Measuring Memory Usage
Best way is heap dump From GC logs look on the lower points of the Full GC lines.
Tuning Memory: Tuning GC
60.0
90.0
120.0
150.0
180.0
Memory Leak
Draw the line between the full gc points Increasing over time --> memory leak
Tuning Memory: Tuning GC
60.0
90.0
120.0
150.0
180.0
Low Throughput
Under 95% Increase Heap:
Low throughput is usually a result of insufficient memory GC kicks in too frequently and frees small amounts
New Gen too small New gen GC kicks in too fast Not able to release enough as a result mid term objects spill to old gen.
Old Gen too small Application state spills to new Gen. Breaks the Generational Hypothesis
Tuning Memory: Tuning GC
Contradicting Goals
Goal1: Retain as many objects as possible in the survivor spaces
Less promotion into the old generation Less frequent old GCs
Goal2: Do not copy very long- lived objects between the survivors
Unnecessary overhead on minor GCs
Tuning Memory: Tuning GC
High Pause Time
Choose The correct GC scheme: CMS/G1 when low pause is required
Reduce footprint of user interactive processes Reduce memory allocations do not allocated too many related temporary objects chunking of work.
Tuning Memory: Tuning GC