Post on 23-Jun-2020
transcript
©2012 Azul Systems, Inc.
Understanding Java
Garbage Collection
and what you can do about it
Graham Thomas, EMEA Technical Manager, Azul Systems
A presentation at Orange11
July 5, 2012
©2012 Azul Systems, Inc.
This Talk’s Purpose / Goals This talk is focused on GC education
This is not a “how to use flags to tune a collector” talk
This is a talk about how the “GC machine” works
Purpose: Once you understand how it works, you can
use your own brain...
You’ll learn just enough to be dangerous...
The “Azul makes the world’s greatest GC” stuff will only
come at the end, I promise...
©2012 Azul Systems, Inc.
White Papers to Accompany Slides
Listed in order of complexity:
1_ Understanding Java Garbage Collection v1.pdf
2_ Azul Pauseless Garbage Collection - wp_pgc_zing_v2.pdf
3_ C4-The Continuously Concurrent Compacting Collector -
c4_paper_acm.pdf
4_ AzulVmemMetricsMRI.pdf www.managedruntime.org/files/downloads/AzulVmemMetricsMRI.pdf
©2012 Azul Systems, Inc.
About Azul Systems
We deal with Java performance issues on a daily
basis
Our solutions focus on consistent response time under load
We enable practical, full use of hardware resources
As a result, we often help characterize problems
In many/most cases, it’s not the database, app, or
network - it’s the JVM, or the system under it…
GC Pauses, OS or Virtualization “hiccups”, swapping, etc.
We use and provide simple tools to help discover
what’s going on in a JVM and the underlying platform
Focus on measuring JVM/Platform behavior with your app
Non-intrusive, no code changes, easy to add
©2012 Azul Systems, Inc.
About Azul Supporting mission-critical deployments around the
globe
©2012 Azul Systems, Inc.
About Azul – 2002 to Now
We make scalable Virtual
Machines
Have built “whatever it takes to
get job done” since 2002
3 generations of custom SMP
Multi-core HW (Vega)
“Industry firsts” in Garbage
collection, elastic memory,
Java virtualization, memory
scale
Vega • 54 Cores per Chip
• Up to 16 Chips (864 cores)
• 640 GB Heaps
©2012 Azul Systems, Inc.
About Azul – 2002 to Now
We make scalable Virtual
Machines
Have built “whatever it takes to
get job done” since 2002
3 generations of custom SMP
Multi-core HW (Vega)
Now Pure software for
commodity x86 (Zing)
“Industry firsts” in Garbage
collection, elastic memory,
Java virtualization, memory
scale
Vega • 54 Cores per Chip
• Up to 16 Chips (864 cores)
• 640 GB Heaps
C4
©2012 Azul Systems, Inc.
Azul Systems Vega processor Each Vega chip contains 54 fully independent processor cores and an integrated
quad-channel memory controller.
Each appliance contains up to 16 Vega chips,
providing up to 864 total processor cores (96, 192, 384, 768 core models are also
available)
Each processor core is a 64-bit RISC processor with optimizations for multi-threaded
VM execution
Three banks of four ECC memory modules are attached to each Vega chip
for a total of 192 memory modules in a 16-chip configuration
Heaps 640 GB
Cache coherent, uniform memory access through a passive, non-blocking interconnect
mesh
205 GBps aggregate memory bandwidth
544 GBps aggregate interconnect bandwidth
Instruction-level support for concurrent, pauseless, VM garbage collection.
Dual network processors for system control and I/O communications
©2012 Azul Systems, Inc.
Zing 5.2 Requirements Processors (Dual Socket preferred):
Intel Nehalem Xeon 5500, 56xx, 6500, 7500, E7-2xxx, E7-4xxx, E7-8xxx or
E5-26xx (with Intel-VT virtualization enabled)
AMD Opteron 2300, 2400, 4100, 6100, 8300 or 8400
(with AMD V virtualization enabled)
If virtualised, it is critical that Zing LX is run with reserved cores and memory
Memory and CPU Cores
32GB or greater
6 or more cores
OS – 64-bit only (Linux distro specific : rpm / deb packages)
Red Hat Enterprise Linux / CentOS
RHEL5.2++ / RHEL6.x | CentOS 5.2++ / CentOS 6
SUSE Linux Enterprise Server: SLES 11 SP1 / SLES 11 SP2
Ubuntu Linux (server / desktop): LTS 10.04 / LTS 12.04
Java SE 6 and most J2SE 5.0 apps*
<*J2SE 5.0 applications that use functionality not explicitly removed in Java SE 6>
Java update level from Sun Microsystems: J2SE v1.6.0 update 31
(“headless mode” only – no GUI)
©2012 Azul Systems, Inc.
High level agenda
GC fundamentals and key mechanisms
Some GC terminology & metrics
Classifying currently available collectors
The “Application Memory Wall” problem
The C4 collector: What an actual solution looks like...
©2012 Azul Systems, Inc.
Memory use
• How many of you use heap sizes of:
• more than ½ GB?
• more than 1 GB?
• more than 2 GB?
• more than 4 GB?
• more than 10 GB?
• more than 20 GB?
• more than 50 GB?
• more than 100 GB?
©2012 Azul Systems, Inc.
Why should you care about GC?
©2012 Azul Systems, Inc.
The story of the good little architect
A good architect must, first and foremost, be able to
impose their architectural choices on the project...
Early in Azul’s concurrent collector days, we encountered
an application exhibiting 18 second pauses
Upon investigation, we found the collector was performing 10s of millions
of object finalizations per GC cycle
*We have since made reference processing fully concurrent...
Every single class written in the project had a finalizer
The only work the finalizers did was nulling every reference field
The right discipline for a C++ ref-counting environment
The wrong discipline for a precise garbage collected environment
©2012 Azul Systems, Inc.
Trying to solve GC problems in application
architecture is like throwing knives
You probably shouldn’t do it blindfolded
It takes practice and understanding to get it right
You can get very good at it, but do you really want to?
Will all the code you leverage be as good as yours?
Examples:
Object pooling
Off heap storage
Distributed heaps
...
(In most cases, you end up building your own garbage collector)
©2012 Azul Systems, Inc.
Most of what People seem to “know”
about Garbage Collection is wrong
In many cases, it’s much better than you may think
GC is extremely efficient. Much more so than malloc()
newObject() with HotSpot is 10 machine instructions
malloc() in C average between 60 and 100 instructions per call
(See http://www.ibm.com/developerworks/java/library/j-jtp09275/index.html)
Dead objects cost nothing to collect
GC will find all the dead objects (including cyclic graphs)
In many cases, it’s much worse than you may think
Yes, it really does stop for ~1 sec per live GB.
No, GC does not mean you can’t have memory leaks
No, those pauses you eliminated from your 20 minute test are not
gone
©2012 Azul Systems, Inc.
Some GC Terminology
©2012 Azul Systems, Inc.
A Basic Terminology example:
What is a concurrent collector?
A Concurrent Collector performs garbage collection work
concurrently with the application’s own execution
Generally uses multiple collector threads and Mutator threads are not
stopped
A Parallel Collector uses multiple CPUs to perform
garbage collection
But stops all Mutator threads during collection (aka stop-the-world)
©2012 Azul Systems, Inc.
A Concurrent Collector performs garbage collection work
concurrently with the application’s own execution
A Parallel Collector uses multiple CPUs to perform
garbage collection
Classifying a collector’s operation
An Incremental collector performs a garbage collection
operation or phase as a series of smaller discrete
operations with (potentially long) gaps in between
A Stop-the-World collector performs garbage collection
while the application is completely stopped
Mostly means sometimes it isn’t (usually means a
different fall back mechanism exists)
©2012 Azul Systems, Inc.
Precise vs. Conservative Collection
A Collector is Conservative if it is unaware of some object
references at collection time, or is unsure about whether a
field is a reference or not (e.g is an integer a pointer)
A Collector is Precise if it can fully identify and process all
object references at the time of collection
A collector MUST be precise in order to move objects
The COMPILERS need to produce a lot of information (oopmaps)
All commercial server JVMs use precise collectors
All commercial server JVMs use some form of a moving collector
©2012 Azul Systems, Inc.
Safepoints
A GC Safepoint is a point or range in a thread’s execution
where the collector can identify all the references in that
thread’s execution stack
“Safepoint” and “GC Safepoint” are often used interchangeably
But there are other types of safepoints, including ones that require more
information than a GC safepoint does (e.g. deoptimization)
“Bringing a thread to a safepoint” is the act of getting a
thread to reach a safepoint and not execute past it
Close to, but not exactly the same as “stop at a safepoint”
e.g. JNI: you can keep running in, but not past the safepoint
Safepoint opportunities are (or should be) frequent
In a Global Safepoint all threads are at a Safepoint
©2012 Azul Systems, Inc.
What’s common to all
precise GC mechanisms?
Identify the live objects in the memory heap
Reclaim resources held by dead objects
Periodically relocate live objects
Examples:
Mark/Sweep/Compact (common for Old Generations)
Copying collector (common for Young Generations)
©2012 Azul Systems, Inc.
Mark (aka “Trace”)
Start from “roots” (thread stacks, statics, etc.)
“Paint” anything you can reach as “live”
At the end of a mark pass:
all reachable objects will be marked “live”
all non-reachable objects will be marked “dead” (aka “non-
live”).
Note: work is generally linear to “live set”
©2012 Azul Systems, Inc.
Sweep
Scan through the heap, identify “dead” objects and track them
somehow
(usually in some form of free list)
Note: work is generally linear to heap size
©2012 Azul Systems, Inc.
Compact
Over time, heap will get “swiss cheesed”: contiguous dead
space between objects may not be large enough to fit new
objects (aka “fragmentation”)
Compaction moves live objects together to reclaim contiguous
empty space (aka “relocate”)
Compaction has to correct all object references to point to new
object locations (aka “remap”)
Remap scan must cover all references that could possibly
point to relocated objects
Note: work is generally linear to “live set”
©2012 Azul Systems, Inc.
Copy
A copying collector moves all lives objects from a “from” space
to a “to” space & reclaims “from” space
At start of copy, all objects are in “from” space and all
references point to “from” space.
Start from “root” references, copy any reachable object to “to”
space, correcting references as we go
At end of copy, all objects are in “to” space, and all references
point to “to” space
Note: work generally linear to “live set”
©2012 Azul Systems, Inc.
Mark/Sweep/Compact, Copy, Mark/Compact
Copy requires 2x the max. live set to be reliable
Mark/Compact [typically] requires 2x the max. live set in order to
fully recover garbage in each cycle
Mark/Sweep/Compact only requires 1x (plus some)
Copy and Mark/Compact are linear only to live set
Mark/Sweep/Compact linear (in sweep) to heap size
Mark/Sweep/(Compact) may be able to avoid some moving work
Copying is [typically] “monolithic”
©2012 Azul Systems, Inc.
Generational Collection
“Weak Generational Hypothesis”: most objects die young
Focus collection efforts on young generation:
Use a moving collector: work is linear to the live set
The live set in the young generation is a small % of the space
Promote objects that live long enough to older generations
Only collect older generations as they fill up
“Generational filter” reduces rate of allocation into older generations
Tends to be (order of magnitude) more efficient
Great way to keep up with high allocation rate
Practical necessity for keeping up with processor throughput
©2012 Azul Systems, Inc.
Generational Collection /2
Requires a “Remembered set”: a way to track all references into
the young generation from the outside
Remembered set is also part of “roots” for young generation
collection
No need for 2x the live set: Can “spill over” to old gen
Usually want to keep surviving objects in young generation for a
while before promoting them to the old generation
Immediate promotion can dramatically reduce gen. filter efficiency
Waiting too long to promote can dramatically increase copying work
©2012 Azul Systems, Inc.
How does the remembered set work?
Generational collectors require a “Remembered set”: a way to
track all references into the young generation from the outside
Each store of a NewGen reference into and OldGen object
needs to be intercepted and tracked
Common technique: “Card Marking”
A bit (or byte) indicating a word (or region) in OldGen is “suspect”
Write barrier used to track references
Common technique (e.g. HotSpot): blind stores on reference write
Variants: precise vs. imprecise card marking,
conditional vs. non-conditional
©2012 Azul Systems, Inc.
The typical combos
in commercial server JVMS
Young generation usually uses a copying collector
Young generation is usually Monolithic stop-the-world
Old generation usually uses Mark/Sweep/Compact
Old generation may be STW, or Concurrent,
or mostly-Concurrent, or Incremental-STW,
or mostly-Incremental-STW
©2012 Azul Systems, Inc.
Mutator Your program…
Parallel Can use multiple CPUs
Concurrent Runs concurrently with program
Pause A time duration in which the mutator
is not running any code
Stop-The-World (STW) Something that is done in a pause
Monolithic Stop-The-World Something that must be done in it’s
entirety in a single pause
Useful terms for discussing garbage
collection Generational
Collects young objects and long lived
objects separately.
Promotion Allocation into old generation
Marking Finding all live objects
Sweeping Locating the dead objects
Compaction Defragments heap
Moves objects in memory
Remaps all affected references
Frees contiguous memory regions
©2012 Azul Systems, Inc.
Useful metrics for discussing garbage
collection
Cycle time How long it takes the collector to free up
memory
Marking time How long it takes the collector to find all
live objects
Sweep time How long it takes to locate dead objects
* Relevant for Mark-Sweep
Compaction time How long it takes to free up memory by
relocating objects
* Relevant for Mark-Compact
Heap population (aka Live set) How much of your heap is alive
Allocation rate How fast you allocate
Mutation rate How fast your program updates
references in memory
Heap Shape The shape of the live object graph
* Hard to quantify as a metric...
Object Lifetime How long objects live
©2012 Azul Systems, Inc.
Empty memory and CPU/throughput
©2012 Azul Systems, Inc.
Two Intuitive limits
If we had infinite empty memory, we would never have to
collect, and GC would take 0% of the CPU time
If we had exactly 1 byte of empty memory at all times, the
collector would have to work “very hard”, and GC would take
100% of the CPU time
GC CPU % will follow a rough 1/x curve between these two limit
points, dropping as the amount of memory increases.
©2012 Azul Systems, Inc.
Empty memory needs (empty memory == CPU power)
The amount of empty memory in the heap is the dominant
factor controlling the amount of GC work
For both Copy and Mark/Compact collectors, the amount of
work per cycle is linear to live set
The amount of memory recovered per cycle is equal to the
amount of unused memory (heap size) - (live set)
The collector has to perform a GC cycle when the empty
memory runs out
A Copy or Mark/Compact collector’s efficiency doubles with
every doubling of the empty memory.
©2012 Azul Systems, Inc.
What empty memory controls
Empty memory controls efficiency (amount of collector work
needed per amount of application work performed)
Empty memory controls the frequency of pauses (if the
collector performs any Stop-the-world operations)
Empty memory DOES NOT control pause times (only their
frequency)
In Mark/Sweep/Compact collectors that pause for sweeping,
more empty memory means less frequent but LARGER
pauses
©2012 Azul Systems, Inc.
Some non monolithic-STW stuff
©2012 Azul Systems, Inc.
Concurrent Marking
Mark all reachable objects as “live”, but object graph is
“mutating” under us.
Classic concurrent marking race: mutator may move reference
that has not yet been seen by the marker, into an object that has
already been visited
If not intercepted or prevented in some way, will corrupt the heap
Example technique: track mutations, multi-pass marking
Track reference mutations during mark (e.g. in card table)
Re-visit all mutated references (and track new mutations)
When set is “small enough”, do a STW catch up (mostly concurrent)
Note: work grows with mutation rate, may fail to finish
©2012 Azul Systems, Inc.
Incremental Compaction
Track cross-region remembered sets (which region points to
which)
To compact a single region, only need to scan regions that point
into it to remap all potential references
identify regions sets that fit in limited time
Each such set of regions is a Stop-the-World increment
Safe to run application between (but not within) increments
Note: work can grow with the square of the heap size
The number of regions pointing into a single region is generally linear
to the heap size (the number of regions in the heap)
©2012 Azul Systems, Inc.
Classifying common collectors
©2012 Azul Systems, Inc.
The typical combos
in commercial server JVMS
Young generation usually uses a copying collector
Young generation is usually Monolithic stop-the-world
Old generation usually uses a Mark/Sweep/Compact collector
Old generation may be STW, or Concurrent, or mostly-Concurrent, or
Incremental-STW, or mostly-Incremental-STW
©2012 Azul Systems, Inc.
HotSpot™ ParallelGC Collector mechanism classification
Monolithic Stop-the-world copying NewGen
Monolithic Stop-the-world Mark/Sweep/Compact OldGen
©2012 Azul Systems, Inc.
HotSpot™ CMS Collector mechanism classification
ConcMarkSweepGC (CMS)
Monolithic Stop-the-world copying NewGen (ParNew)
Mostly Concurrent, non-compacting OldGen (CMS)
Mostly Concurrent marking
Mark concurrently while mutator is running
Track mutations in card marks
Revisit mutated cards (repeat as needed)
Stop-the-world to catch up on mutations, ref processing, etc.
Concurrent Sweeping
Does not Compact (maintains free list, does not move objects)
Fallback to Full Collection (Monolithic Stop the world).
Used for Compaction, etc.
©2012 Azul Systems, Inc.
HotSpot™ G1GC Collector mechanism classification
Monolithic Stop-the-world copying NewGen
Mostly Concurrent, OldGen marker
Mostly Concurrent marking
Stop-the-world to catch up on mutations, ref processing, etc.
Tracks inter-region relationships in remembered sets
Stop-the-world mostly incremental compacting old gen
Objective: “Avoid, as much as possible, having a Full GC…”
Compact sets of regions that can be scanned in limited time
Delay compaction of popular objects, popular regions
Fallback to Full Collection (Monolithic Stop the world).
Used for compacting popular objects, popular regions, etc.
©2012 Azul Systems, Inc.
Some Collectors Collector Name Young Generation Old Generation
Oracle HotSpot ParallelGC Monolithic stop-the-world,
copying
Monolithic stop-the-world,
Mark/Sweep/Compact
Oracle HotSpot CMS
(Concurrent Mark/Sweep)
Monolithic stop-the-world,
copying
Mostly concurrent, non-
compacting , fall back to
monolithic stop-the-world
Oracle HotSpot G1
(Garbage First)
Monolithic stop-the-world,
copying
Mostly concurrent marker, mostly
incremental compaction, fall back
to monolithic stop-the-world
Oracle JRockit * Dynamic Garbage Collector Monolithic stop-the-world,
copying
Mark/Sweep - can choose mostly
concurrent or parallel,
incremental compaction, fall back
to monolithic stop-the-world
IBM J9 * Balanced Monolithic stop-the-world,
copying
Mostly concurrent marker, mostly
incremental
compaction, fall back to
monolithic stop-the-world
IBM J9 * optthruput Monolithic stop-the-world,
copying
Parallel Mark/Sweep,
stop-the-world compaction
* Can choose a single or 2-generation collector
Zing C4
(Continuously Concurrent
Compacting Collector)
Concurrent and always
compacting
Concurrent and always
compacting
The “Application Memory Wall”
Reality check: servers in 2012
16 vCore, 96GB server ≈ $5K
16 vCore, 256GB server ≈ $9K
24 vCore, 384GB server ≈ $14K
32 vCore, 1TB server ≈ $35K
Retail prices, major web server store (US $, May 2012)
Cheap (< $1/GB/Month), and roughly linear to ~1TB
10s to 100s of GB/sec of memory bandwidth
The Application Memory Wall
A simple observation:
Application instances appear to be unable to
make effective use of modern server memory
capacities
The size of application instances as a % of a
server’s capacity is rapidly dropping
Maybe 1+ to 4+ GB is simply enough?
We hope not (or we’ll all have to look for new jobs soon)
Plenty of evidence of pent up demand for more heap:
Common use of lateral scale across machines
Common use of “lateral scale” within machines
Use of “external” memory with growing data sets
Databases certainly keep growing
External data caches (memcache, JCache, Data grids)
Continuous work on the never ending distribution problem
More and more reinvention of NUMA
Bring data to compute, bring compute to data
©2012 Azul Systems, Inc.
How much memory do applications need? “640KB ought to be enough for anybody”
WRONG!
So what’s the right number?
6,400K?
64,000K?
640,000K?
6,400,000K?
64,000,000K?
There is no right number
Target moves at 50x-100x per decade
“I've said some stupid things
and some wrong things, but
not that. No one involved in
computers would ever say that
a certain amount of memory is
enough for all time …” - Bill
Gates, 1996
©2012 Azul Systems, Inc.
“Tiny” application history
100KB apps on a ¼ to ½ MB Server
10MB apps on a 32 – 64 MB server
1GB apps on a 2 – 4 GB server
??? GB apps on 256 GB
Assuming Moore’s Law means:
“transistor counts grow at ≈2x
every ≈18 months”
It also means memory size grows
≈100x every 10 years
2010
2000
1990
1980
“Tiny”: would be “silly” to distribute
Application Memory
Wall
©2012 Azul Systems, Inc.
What is causing the
Application Memory Wall?
Garbage Collection is a clear and dominant cause
There seem to be practical heap size limits for
applications with responsiveness requirements
[Virtually] All current commercial JVMs will exhibit a multi-
second pause on a normally utilized 2-4GB heap.
It’s a question of “When” and “How often”, not “If”.
GC tuning only moves the “when” and the “how often” around
Root cause: The link between scale and responsiveness
What quality of GC is responsible
for the Application Memory Wall?
It is NOT about overhead or efficiency:
CPU utilization, bottlenecks, memory consumption and utilization
It is NOT about speed
Average speeds, 90%, 99% speeds, are all perfectly fine
It is NOT about minor GC events (right now)
GC events in the 10s of msec are usually tolerable for most apps
It is NOT about the frequency of very large pauses
It is ALL about the worst observable pause behavior
People avoid building/deploying visibly broken systems
Application Characterization
Mantra
Throughput without response time is meaningless
Sustainable throughput is all that matters
Sustainable Throughput: The throughput achieved while safely
maintaining service levels
Unsustainable
Throughout
GC Problems
Framing the discussion:
Garbage Collection at modern server scales
Modern Servers have 100s of GB of memory
Each modern x86 core (when actually used) produces
garbage at a rate of ¼ - ½ GB/sec +
That’s many GB/sec of allocation in a server
Monolithic stop-the-world operations are the cause of the
current Application Memory Wall
How to ignore Monolithic-STW GC events
Delaying the inevitable
Delay tactics focus on getting “easy empty space” first This is the focus for the vast majority of GC tuning
Most objects die young [Generational] So collect young objects only, as much as possible
But eventually, some old dead objects must be reclaimed
Most old dead space can be reclaimed without moving it
[e.g. CMS] track dead space in lists, and reuse it in place
But eventually, space gets fragmented, and needs to be moved
Much of the heap is not “popular” [e.g. G1, “Balanced”] A non popular region will only be pointed to from a small % of the heap
So compact non-popular regions in short stop-the-world pauses
But eventually, popular objects and regions need to be compacted
Young generation pauses are only small because heaps are tiny
A 200GB heap will regularly have several GB of live
How can we break through the Application Memory Wall?
©2012 Azul Systems, Inc.
We need to solve the right problems
Focus on the causes of the Application Memory Wall
Root cause: Scale is artificially limited by responsiveness
Responsiveness must be unlinked from scale
Heap size, Live Set size, Allocation rate, Mutation rate
Responsiveness must be continually sustainable
Can’t ignore “rare” events
Eliminate all Stop-The-World Fallbacks
At modern server scales, any STW fall back is a failure
©2012 Azul Systems, Inc.
Problems that need solving (areas where the state of the art needs improvement)
Robust Concurrent Marking
In the presence of high mutation and allocation rates
Cover modern runtime semantics (e.g. weak refs)
Compaction that is not monolithic-stop-the-world
Stay responsive while compacting many-GB heaps
Must be robust: not just a tactic to delay STW compaction
[current “incremental STW” attempts fall short on robustness]
Non-monolithic-stop-the-world Generational collection
Stay responsive while promoting multi-GB data spikes
Concurrent or “incremental STW” may both be ok
[ Surprisingly little work done in this specific area]
©2012 Azul Systems, Inc.
The things that seem “hard” to do in GC
Robust concurrent marking
References keep changing
Multi-pass marking is sensitive to mutation rate
Weak, Soft, Final references “hard” to deal with concurrently
[Concurrent] Compaction…
It’s not the moving of the objects…
It’s the fixing of all those references that point to them
How do you deal with a Mutator looking at a stale reference?
If you can’t, then remapping is a [monolithic] STW operation
Young Generation collection at scale
Young Generation collection is generally Monolithic stop-the-world
Young generation pauses are only small because heaps are tiny
A 100GB heap will regularly see multi-GB of live young stuff…
©2012 Azul Systems, Inc.
Azul’s “C4” Collector Continuously Concurrent Compacting Collector
Concurrent, compacting new generation
Concurrent, compacting old generation
Concurrent guaranteed-single-pass marker
Oblivious to mutation rate
Concurrent ref (weak, soft, final) processing
Concurrent Compactor
Objects moved without stopping mutator
References remapped without stopping mutator
Can relocate entire generation (New, Old) in every GC cycle
No stop-the-world fallback
Always compacts, and always does so concurrently
Sample responsiveness improvement
๏ SpecJBB + Slow churning 2GB LRU Cache
๏ Live set is ~2.5GB across all measurements
๏ Allocation rate is ~1.2GB/sec across all measurements
Instance capacity test: “Fat Portal” CMS: Peaks at ~ 3GB / 45 concurrent users
* LifeRay portal on JBoss @ 99.9% SLA of 5 second response times
©2012 Azul Systems, Inc.
Instance capacity test: “Fat Portal” C4: still smooth @ 800 concurrent users
Some fun with jHiccup
Idle App on Quiet System Idle App on Busy System
Idle App on Dedicated System Idle App on Quiet System
A good use for jHiccup
Oracle HotSpot CMS, 1GB in an 8GB Heap Oracle HotSpot CMS, 4GB in an 18GB Heap
Oracle HotSpot G1, 1GB in an 8GB Heap Oracle HotSpot ParallelGC, 1GB in an 8GB Heap
Oracle HotSpot CMS, 1GB in an 8GB Heap Zing 5, 1GB in an 8GB Heap
Oracle HotSpot CMS, 1GB in an 8GB Heap Zing 5, 1GB in an 8GB Heap
©2012 Azul Systems, Inc.
Java GC tuning is “hard”… •Examples of actual command line GC tuning parameters:
•Java -Xmx12g -XX:MaxPermSize=64M -XX:PermSize=32M -XX:MaxNewSize=2g
• -XX:NewSize=1g -XX:SurvivorRatio=128 -XX:+UseParNewGC
• -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0
• -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled
• -XX:+UseCMSInitiatingOccupancyOnly -XX:ParallelGCThreads=12
• -XX:LargePageSizeInBytes=256m …
•Java –Xms8g –Xmx8g –Xmn2g -XX:PermSize=64M -XX:MaxPermSize=256M
•-XX:-OmitStackTraceInFastThrow -XX:SurvivorRatio=2 -XX:-UseAdaptiveSizePolicy
•-XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled
•-XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled
•-XX:CMSMaxAbortablePrecleanTime=10000 -XX:+UseCMSInitiatingOccupancyOnly
•-XX:CMSInitiatingOccupancyFraction=63 -XX:+UseParNewGC –Xnoclassgc …
©2012 Azul Systems, Inc.
The complete guide to
Zing GC tuning
java -Xmx40g
C4 Algorithm fundamentals
C4 algorithm highlights Same core mechanism used for both generations
Concurrent Mark-Compact
A Loaded Value Barrier (LVB) is central to the algorithm Every heap reference is verified as “sane” when loaded
“Non-sane” refs are caught and fixed in a self-healing barrier
Refs that have not yet been “marked through” are caught Guaranteed single pass concurrent marker
Refs that point to relocated objects are caught Lazily (and concurrently) remap refs, no hurry
Relocation and remapping are both concurrent
Uses “quick release” to recycle memory Forwarding information is kept outside of object pages
Physical memory released immediately upon relocation
“Hand-over-hand” compaction without requiring empty memory
Summary
The C4 GC Cycle
Mark Phase
Mark phase finds all live objects in the Java heap
Concurrent, predictable: always complete in a single pass
Uses LVB to defeat concurrent marking races Tracks object references that have been traversed by using an “NMT”
(not marked through) metadata state in each object reference
Any access to a not-yet-traversed reference will trigger the LVB
Triggered references are queued on collector work lists, and reference
NMT state is corrected
“Self healing” corrects the memory location that the reference was loaded
from
Marker tracks total live memory in each memory page
Compaction uses this to go after the sparse pages first
(But each cycle will tend to compact the entire heap…)
Relocate Phase Compacts to reclaim heap space
occupied by dead objects in “from”
pages without stopping mutator
Protects “from” pages.
Uses LVB to support concurrent
relocation and lazy remapping by
triggering on any access to
references to “from” pages
Relocates any live objects to newly
allocated “to” pages
Maintains forwarding pointers
outside of “from” pages
Virtual “from” space cannot be
recycled until all references to
relocated objects are remapped
“Quick Release”: Physical memory
can be immediately reclaimed, and
used to feed further compaction or
allocation
Remap Phase
Scans all live objects in the heap
Looks for references to previously relocated objects, and
updates (“remaps”) them to point to the new object
locations
Uses LVB to support lazy remapping Any access to a not-yet-remapped reference will trigger the LVB
Triggered references are corrected to point to the object’s new location
by consulting forwarding pointers
“Self healing” corrects the memory location the reference was loaded
from
Overlaps with the next mark phase’s live object scan Mark & Remap are executed as a single pass
Per 2MB of allocation: map… remap/protect… unmap…
Need to keep up with sustained allocation rate
A modern x86 core will happily generate ~0.5GB/sec of garbage
(m)remaping pages is only small part of GC cycle
Healthy GC duty cycle at ~20%, mremap is ~5% of GC cycle
So need to sustain 100s of GB/sec in mremap rate…
Linux remaps sustain <1GB/sec
Dominated by unneeded semantics
TLB invalidates, 4KB mappings, global locking, …
Zing remaps sustain >6TB/sec (through ZST memory service)
Avoids in-process implicit TLB invalidates, uses 2MB mappings
Sustainable Remap Rates….
The C4 GC Cycle
Summary
The Application Memory Wall is HERE, NOW Driven by detrimental link between scale and responsiveness
Solving a handful of problems can lead to breakthrough Robust Concurrent Marking
[Concurrent] Compaction
non-monolithic STW young generation collection
All at modern server-scales
Solving it will [hopefully] allow applications to resume their
natural rate of consuming computer capacity
Implications of breaking past the Application
Memory Wall Improve quality of current systems:
Better & consistent response times, stability & availability
Reduce complexity, time to market, and cost
Scale Better: Large or variable number of concurrent users
High or variable transaction rates
Large data sets
Change how things are done: Aggressive Caching, in-memory data processing
Multi-tenant, SaaS, PaaS
Cloud deployments
Build applications that were not possible before…
©2012 Azul Systems, Inc.
How can we break through the Application Memory Wall?
Simple: Deploy Zing on Linux
©2012 Azul Systems, Inc.
How is Azul’s Java Platform Different? Same JVM standard -
Licensed Sun Java-source based JVM
Enhanced Hotspot, fully Java compatible
passes Java Compatibility Kit (JCK) server-level compatibility (~53000 tests )
A different approach
Garbage is good!
Designed with insight that worst case must eventually happen
Unique values
Highly scalable … 100s GB with consistent low pause times – other JVMs will have longer
“stop-the-world” pauses in proportion to size of JVM and memory allocation rate
Elastic memory … insurance for JVMs to handle dynamic load – unlike other JVMs which are
rigidly tuned
Collects New Garbage and Old Garbage concurrently with running application threads …
there is no “stop-the-world” for GC purposes (you will only see extremely short pause times to
reach safepoints) – unlike other JVMs which will eventually stop-the-world.
Compacts Memory concurrently with your application threads running … Zing will move
objects without “stop-the-world” or single-threading – which is a major issue with other JVMs
Measuring pause times from FIRST thread stopped (unlike other JVMs)
Rich non-intrusive production visibility with ZVision and ZVRobot
WYTIWYG (What You Test Is What You Get)
©2012 Azul Systems, Inc.
Azul’s Proposition
Performance
Pauseless Garbage Collection
Heap sizes of 100’s of GB
Concurrent Applications or Single Large Resource Pool
Assurance
Every application afforded ability to scale when needed
When spikes occur, additional resources granted in ms
Ensures performance & eliminates application crashes
Visibility
Visibility down to thread level
Quickly eliminate constraints
Reduce Development Cycles
“What You Test Is What You Get”
“WYTIWYG”
Consolidation
Lower TCO
Reduction – Power and Cooling
Reduction of Complexity
Reduction of Operations overhead
Cost Avoidance for Additional Servers
©2012 Azul Systems, Inc.
Use Cases: Zing Applications
Better & consistent response times
Greater stability & availability
Reduce complexity, time to market, and cost
Large or variable number of concurrent users
High or variable transaction rates
Large data sets
Caching, In-memory data processing
ESBs, SOA, Messaging
Multi-tenant, Platform-as-a-Service (PaaS)
Virtualized & Cloud deployments
©2012 Azul Systems, Inc.
Q & A How can we break through the Application Memory Wall?
http://www.azulsystems.com
Simple: Deploy Zing on Linux
©2012 Azul Systems, Inc.
G. Tene, B. Iyengar and M. Wolf
C4: The Continuously Concurrent Compacting Collector
In Proceedings of the international symposium on Memory
management, ISMM’11, ACM, pages 79-88
Jones, Richard; Hosking, Antony; Moss, Eliot (25 July 2011).
The Garbage Collection Handbook:
The Art of Automatic Memory Management.
CRC Press. ISBN 1420082795.
©2012 Azul Systems, Inc.
©2012 Azul Systems, Inc.
ZING VISION
©2012 Azul Systems, Inc.
Problem: JVMs are Black Boxes Java has a good ecosystem of Dev/Test profiling tools
Deep, sophisticated instrumentation
Always comes at a cost (Sometimes 5%, sometimes 10x)
Production applications run into problems
This is the real world…
Some problems make it through QA
Some real world loads were never seen in the lab
Production-time visibility is poor
When problems are escalated, problem solvers lack tools for
diagnosing and resolving cause
Can’t turn on lab visibility tools
The application is already having a problem
Adding any instrumentation load will make it worse
©2012 Azul Systems, Inc.
Solution: Zing Vision
Non-intrusive, Zero-overhead Visibility Zing Runtime ALWAYS collects instrumentation data
Side-effects of work the runtime has to do anyway
e.g. JIT Compilers need to track hot code anyway
e.g. GC needs to scan all objects anyway
May as well keep information for production-time viewing
Information used without fear of hurting application
Publishes data in accessible XML structures (can be saved to disk by ZVRobot)
No expensive polling or impacting requests, no JNI calls
Zing Vision provides deep, drill-down detail
Hot code, Hot Threads
Lock contention & deadlocks
Memory behavior, mix, live objects, allocation rates, GC stats.
Etc...
Production problems get diagnosed in 1/10th the time
Launch quickly, then optimize as-you-go •
©2012 Azul Systems, Inc.
Rich, Detailed Performance Data
Lock contention & deadlocks
Hotspots & method optimization
Down to the byte code level!
Memory profiling & GC statistics
Memory leak detection
Allocation rates
Thread profiling
Ticks data
Monitors
©2012 Azul Systems, Inc.
Application Host 1
Separate machine
or developers
PC / laptop
MyApp1 started
with Zing VM
ZVision
Zing Vision Development
Deployment
ZVision connects to ARTAPort on Zing JVM
and polls using HTTP get
©2012 Azul Systems, Inc.
Application Host 1
Separate machine
MyApp1 started
with Zing VM
ZVision MyApp2 started
with Zing VM
Application Host 2
MyApp3 started
with Zing VM
Zing Vision Production
Deployment
©2012 Azul Systems, Inc.
Application Host 1
Separate machine
MyApp1 started
with Zing VM
ZVision MyApp2 started
with Zing VM
Application Host 2
MyApp3 started
with Zing VM
Zing Vision Robot
Production Deployment
Separate machine
ZVRobot
ZVRobot connects to ARTAPort on Zing JVM and polls
using HTTP get at a configured time interval
– data collected is saved as XML files.
ZVRobot can be started & stopped as required to monitor application
©2012 Azul Systems, Inc.
Zing Vision : Always-On Visibility in
Production Systems
[Always] run with Zing ARTAPort enabled
if you want to use Zing Vision / ZVRobot to: Diagnose unplanned, unanticipated situations
Quickly resolve production issues as they arise
Increase application reliability and performance
Optimize performance of systems at full load
Deploy applications faster, with less tuning
Track resources in real-time at the instance level
What You Test Is What You Get
©2012 Azul Systems, Inc.
©2012 Azul Systems, Inc.
AZUL C4
More about how the Azul JVM and C4 works
Copyright ©2010-2012 Azul Systems, Inc. | Azul Company Confidential
©2012 Azul Systems, Inc.
Tiered Compilation and other optimizations
“Pauseless GC” a.k.a. C4
JVM Innovations
©2012 Azul Systems, Inc.
Tiered compilation by default
Dynamic self-sizing thread pools for
compilers
High-performance implementation of Locking
…
JVM Optimizations
©2012 Azul Systems, Inc.
Interpreter: “dictionary approach”, non-optimized
C1: 5x-10x better performance than interpretation
Quick, good for application start-up times
C2: 30%-50% better performance than C1
More profiling time/footprint, but much better performance for server applications
Tiered compilation – the best of both worlds
Using both C1 and C2
Re-use of C1-profiling, while also inserting additional performance counters
Fast code immediately, optimal code over time
Tiered Compilation
©2012 Azul Systems, Inc.
Remember?
Garbage is Good!
Fragmentation is Bad!
There are ways to delay the worst case, but it will
eventually happen
Compaction (moving objects together) is the only way to
deal with fragmentation
For other JVMs, moving objects is expensive, as the world
needs to stop to update references…
©2012 Azul Systems, Inc.
How Zing’s Continuous Concurrent
Compacting Collector Works…
Object
Ref A
Ref B
Virtual Address Space
Virtual Address Space
Fragmented Memory Page (From)
Empty Memory Page (To)
• Memory is Fragmented
©2012 Azul Systems, Inc.
How Zing’s Continuous Concurrent
Compacting Collector Works…
Object
Ref A
Ref B
Virtual Address Space
Virtual Address Space
Fragmented Memory Page (From)
Compacted Memory Page (To)
• Memory is compacted by
moving page contents
• Fragmented Memory Page
is now considered ‘empty’
and returned to C4 for reuse
©2012 Azul Systems, Inc.
How Zing’s Continuous Concurrent
Compacting Collector Works…
Object
Ref A
Ref B
Virtual Address Space
Virtual Address Space
Fragmented Memory Page (From)
Compacted Memory Page (To)
Check
Ref A is accessed • Memory reference is checked
by Zing
©2012 Azul Systems, Inc.
How Zing’s Continuous Concurrent
Compacting Collector Works…
Object
Ref A
Ref B
Virtual Address Space
Virtual Address Space
Fragmented Memory Page (From)
Compacted Memory Page (To)
• This instance of Ref A is
updated or ‘self-healed’
• Other live references to Ref A
will be updated during
Mark/Relocate Phase
©2012 Azul Systems, Inc.
Mark phase finds all live objects in the Java heap
Concurrent & predictable: always completes in a single pass
Uses LVB to defeat concurrent marking races
Tracks object references that have been traversed by using an
“NMT” (not marked through) metadata bit in each object reference
Any access to a not-yet-traversed reference will trigger the LVB
Triggered references are queued on collector work lists, and reference NMT state is
corrected
“Self healing” corrects the memory location that the reference was loaded from
Marker tracks the total live memory in each memory page
Compaction uses this to go after the sparse pages first
(But each cycle will tend to compact the entire heap…)
Mark Phase
©2012 Azul Systems, Inc.
The C4 GC Cycle
©2012 Azul Systems, Inc.
Compacts to reclaim heap space occupied
by dead objects in “from” pages without
stopping mutator
Protects “from” pages (virtual address
space)
Uses LVB to support concurrent relocation
and lazy remapping by triggering on any
access to references to “from” pages
Relocates any live objects to newly
allocated “to” pages
Maintains forwarding pointers outside of
“from” pages
Virtual “from” space cannot be recycled
until all references to relocated objects are
remapped
“Quick Release”: Physical memory can be
immediately reclaimed, and used to feed
further compaction or allocation
Relocate Phase
©2012 Azul Systems, Inc.
Scans all live objects in the heap
Looks for references to previously relocated objects, and updates (“remaps”)
them to point to the new object locations
Uses LVB to support lazy remapping
Any access to a not-yet-remapped reference will trigger the LVB
Triggered references are corrected to point to the object’s new location by consulting
forwarding pointers
“Self healing” corrects the memory location the reference was loaded from
Overlaps with the next Mark phase’s live object scan
Mark & Remap are executed as a single pass
Remap Phase
©2012 Azul Systems, Inc.
The C4 GC Cycle
©2012 Azul Systems, Inc.
High performance memory
functionality High performance functionality requires physical memory management and
control beyond that provided by standard Linux virtual memory functionality.
in-process recycling of memory and in-process memory free lists can dramatically dampen
TLB invalidate requirements on allocation or deallocation edges.
in-process physical memory free lists are necessary to sustain a high rate of new mappings
(e.g. 20GB/sec of sustained random, disjoint map/remap/unmap operations)
Support for mappings with multiple and mixed page sizes
Including transitioning of mapped addresses from large to small page mappings, or small to
large.
©2012 Azul Systems, Inc.
Virtual and Physical modules for Linux
Support for very high sustained mapping modification rates:
Allowing concurrent modifications within the same address space
Allowing user to [safely] indicate lazy TLB invalidation and thereby dramatically reduce per
change costs
Supporting fast, safe application of very large "batch" sets of mapping modifications
(remaps and mprotects), such that all changes become visible within the same, extremely
short period of time.
Support for large number of disjoint mappings with arbitrary manipulations at
high rates
See http://www.managedruntime.org/downloads
©2012 Azul Systems, Inc.
Per 2MB of allocation: map… remap/protect… unmap…
Need to keep up with sustained allocation rate
A modern x86 core will happily generate ~0.5GB/sec of garbage
(m)remaping pages is only small part of GC cycle
Healthy GC duty cycle at ~20%, mremap is ~5% of GC cycle
So need to sustain 100s of GB/sec in mremap rate…
Linux remaps sustain <1GB/sec
Dominated by unneeded semantics
TLB invalidates, 4KB mappings, global locking, …
Zing’s Enhanced kernel
supports >6TB/sec sustained remap rates
Avoids in-process implicit TLB invalidates, uses 2MB mappings
Sustainable Remap Rates….
©2012 Azul Systems, Inc.
Sustained Remap Rates
Active
threads
Mainline Linux w/Azul Memory Module Speedup
1 3.04 GB/sec 6.50 TB/sec >2,000x
2 1.82 GB/sec 6.09 TB/sec >3,000x
4 1.19 GB/sec 6.08 TB/sec >5,000x
8 897.65 MB/sec 6.29 TB/sec >7,000x
12 736.65 MB/sec 6.39 TB/sec >8,000x
©2012 Azul Systems, Inc.
Remap Commit Rates….
Remap/protection must be consistent across mutator threads
Each “batch” of relocated pages needs synchronization
In practical terms, we bring mutators to safe point, and flip pages
Using Linux mremap(), protecting 16GB would take ~20 sec.
Zing’s Enhanced kernel
supports >800TB/sec remap commit rates
Uses shadow table and batch remap/protect ops api
Accumulated batch operations are not visible until committed
Commits shadow table using ~1 pointer copy per GB
Protecting 16GB takes about ~22 usec…
©2012 Azul Systems, Inc.
Remap Commit Rates
Active
threads
Mainline Linux w/Azul Memory Module Speedup
0 43.58 GB/sec (360 ms) 4734.85 TB/sec ( 3 usec) >100,000x
1 3.04 GB/sec (5 sec) 1488.10 TB/sec (11 usec) >480,000x
2 1.82 GB/sec (8 sec) 1166.04 TB/sec (14 usec) >640,000x
4 1.19 GB/sec (13 sec) 913.74 TB/sec (18 usec) >750,000x
8 897.65 MB/sec (18 sec) 801.28 TB/sec (20 usec) >890,000x
12 736.65 MB/sec (21 sec) 740.52 TB/sec (22 usec) >1,000,000x
* Commit rate and (time it would take to commit 16GB)
©2012 Azul Systems, Inc.
Same approach used for both generations
Concurrent Mark-Compact
A Loaded Value Barrier (LVB) is central to the algorithm
Every heap reference is verified as “sane” when loaded
“Non-sane” refs are caught and fixed in a self-healing barrier
Refs that have not yet been “marked through” are caught
Guaranteed single pass concurrent marker
Refs that point to relocated objects are caught
Lazily (and concurrently) remap refs, no hurry
Relocation and remapping are both concurrent
Uses “quick release” to recycle memory
Forwarding information is kept outside of object pages
Physical memory released immediately upon relocation
“Hand-over-hand” compaction without requiring empty memory
Pauseless GC, a.k.a “C4” A taste of the secret sauce
©2012 Azul Systems, Inc.
Use Cases: Zing Applications
Better & consistent response times
Greater stability & availability
Reduce complexity, time to market, and cost
Large or variable number of concurrent users
High or variable transaction rates
Large data sets
Caching, In-memory data processing
ESBs, SOA, Messaging
Multi-tenant, Platform-as-a-Service (PaaS)
Virtualized & Cloud deployments
©2012 Azul Systems, Inc.
For more information on…
JDK internals: http://openjdk.java.net/ (JVM source code)
Memory management:
http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf
(a bit old, but very comprehensive)
Tuning:
http://download.oracle.com/docs/cd/E13150_01/jrockit_jvm/jrockit/geninfo/diagnos/tune
_stable_perf.html (watch out for increased rigidity and re-tuning pain)
Generational Pauseless Garbage Collection:
http://www.azulsystems.com/webinar/pauseless-gc (webinar by Gil Tene, 2011)
Compiler internals and optimizations:
http://www.azulsystems.com/blogs/cliff (Dr Cliff Click’s blog)
Additional Resources
©2012 Azul Systems, Inc.
©2012 Azul Systems, Inc.
Additional Material
©2012 Azul Systems, Inc.
Multimodal pauses in Financial Trading?
Mean = 10 milliseconds
Std. dev. = 1.0
you are in the market 90% of the time…
0 20 40 60 80 100 120 140
Latency (milliseconds)
Mode 2 Mode 3
Mode 1
Tolerable
latency
Not in the market – can’t make
money… but loss unlikely Out of the market for too long – may have a loss or
a risk position (can’t sell) = compliance issue
©2012 Azul Systems, Inc.
Multimodal pauses in eCommerce?
Mean = 10 deciseconds
Std. dev. = 1.0
you are in the market 90% of the time…
0 20 40 60 80 100 120 140
Latency (deciseconds)
Mode 2 Mode 3
Mode 1
Normal
Trading
Customer dissatisfaction … where
did my transaction go Abandoned shopping cart, went to another
web site. Lost Customer – perhaps for good
©2012 Azul Systems, Inc.
0
10
20
30
40
50
60
70
80
90
100
0 50 100 150 200 250
Example of naïve %’ile measurement
1 msec
System Stalled
for 100 Sec
Elapsed Time
System easily handles
100 requests/sec
Responds to each
in 1msec
Re
sp
on
se
tim
e
How would you characterize this system? Naïve results:
10,000 @ 1msec 1 @ 100 second
Naïve characterization: 99.99% below 1 sec !!!
©2012 Azul Systems, Inc.
0
10
20
30
40
50
60
70
80
90
100
0 50 100 150 200 250
Proper measurement
System Stalled
for 100 Sec
Elapsed Time
System easily handles
100 requests/sec
Responds to each
in 1msec
Re
sp
on
se
tim
e
10,000 results
Varying linearly
from 100 sec
to 10 msec 10,000 results
@ 1 msec each
Proper characterization: 50% below 1 second
©2012 Azul Systems, Inc.
©2012 Azul Systems, Inc.
How Zing ZST Memory Service provisions memory
©2012 Azul Systems, Inc.
©2012 Azul Systems, Inc.
Zing Resource Controller (ZRC)
©2012 Azul Systems, Inc.
©2012 Azul Systems, Inc.
OTHER FREE TOOLS
jHiccup | Fragger | Azul Inspector
©2012 Azul Systems, Inc.
So we built some new tools
jHiccup
Fragger
Both open sourced and available on Azul Website
©2012 Azul Systems, Inc.
jHiccup and Fragger: what they do
jHiccup – characterizes response time
Measures system/platform lag, under your application load
Reports accumulated counts of delay occurrences
Can measure latency between a client and Java application node or
within the node
Fragger – heap fragmentation inducer
Generates large sets of objects of a given size
Prunes each set to a smaller, remaining live set
Increases object size between passes until compaction is inevitable
Ages data sets to avoid artificial early compaction by young generation
collectors
©2012 Azul Systems, Inc.
Summary
Common pitfalls, but easy to overcome
Very simple tools can help a lot
Platforms that run smoothly are good
©2012 Azul Systems, Inc.
How to get Fragger and jHiccup
Fragger:
http://www.azulsystems.com/resources/tools/fragger
jHiccup:
http://www.azulsystems.com/dev_resources/jhiccup
Also useful: Azul Inspector Environment Checker
Azul Inspector is a Java program designed to collect information about a
target Java application and its server environment. Developers, IT and
performance engineers can use Azul Inspector to quickly determine the
JDK version in use, maximum heap size setting and the values of a
variety of other setup variables.
©2012 Azul Systems, Inc.
Azul Inspector
©2012 Azul Systems, Inc.
©2012 Azul Systems, Inc.
Azul application
characterization mantra
Throughput without response time is meaningless
Sustainable throughput is all that matters
©2012 Azul Systems, Inc.
Q & A
©2012 Azul Systems, Inc.
For More Information: Web: www.azulsystems.com
jHiccup: www.azulsystems.com/jhiccup
Fragger: www.azulsystems.com/resources/tools/fragger
Azul Inspector:
www.azulsystems.com/dev_resources/azul_inspector
Java Developer Webinars:
http://www.azulsystems.com/resources/tools#webinars
Zing Free Trial: www.azulsystems.com/trial
©2012 Azul Systems, Inc.
Azul application
characterization mantra
Throughput without response time is meaningless
Sustainable throughput is all that matters
©2012 Azul Systems, Inc.
Improve response times:
Increase Transaction rates:
Increase Concurrent users:
Forget about GC pauses:
Eliminate daily restarts:
Elastically grow during peaks:
Elastically shrink when idle:
Gain production visibility:
If you want to:…
The Zing Platform:
Application Benefits
Zing™ Platform
On commodity H/W
©2012 Azul Systems, Inc.
©2012 Azul Systems, Inc.
32-bit versus 64-bit JVMs