Heap Shape ScalabilityScalable Garbage Collection on Highly Parallel PlatformsKathy Barabash, Erez Petrank Computer Science Department Technion, Israel
ISMM 2010 2
Outline Is tracing GC ready for the many-core?
How the heap shape is related?
Evaluating the heap shape scalability Idealized Trace Utilization
Improving the heap shape scalability Solution 1: Reshaping with Shortcut References Solution 2: Tracing with Speculative Roots
Related work & conclusion
ISMM 2010 3
Is Tracing GC Ready for Many-core ?
a
Heap
he
b d
g
c
j
f i
k
l
m
Roots
GC tracing Traverse lots of objects
Sequential trace Each live object is
touched (BFS, DFS)
Parallel trace Load balancing 1K cores really soon
ISMM 2010 4
Can Heaps Spoil the Scalability?
Heap
1
2
Roots
3
4M live objects Single linked list
Sequential trace 4M steps
Parallel trace Not any faster
4K
4M
ISMM 2010 5
Deep Object Graphs Can be Evil Object Depth
Length of the minimal path from some root object
Object-Graph Depth Maximal live object depth
Heap
0
1
2
3
Object DepthsExample:
Definition:
How deep are object graphs of Java programs? SpecJVM, Dacapo, SpecJBB
Instrumented BFS trace
ISMM 2010 6
Name DescriptionHeap Size
(MB)GC
CyclesMax
DepthSpecJVM
javac Java compiler run 3 times 32 15 1,234
mtrt 3D raytracer 32 8 1,416
Dacapo
bloat Java byte code analyzer 48 344 1,195
pmd Java code analyzer 48 59 18,482
xalan Transforms XML into HTML
128 129 8,476
Other 15 benchmarks 128
Object-Graph Depths of Java Benchmarks
ISMM 2010 7
Name DescriptionHeap Size
(MB)GC
CyclesMax
DepthSpecJVM
javac Java compiler run 3 times 32 15 1,234
mtrt 3D raytracer 32 8 1,416
Dacapo
bloat Java byte code analyzer 48 344 1,195
pmd Java code analyzer 48 59 18,482
xalan Transforms XML into HTML
128 129 8,476
Other 15 benchmarks 128
Object-Graph Depths of Java Benchmarks
ISMM 2010 8
Object-Graph Depths of Java BenchmarksName Description
Heap Size (MB)
GC Cycles
Max Depth
SpecJVM
javac Java compiler run 3 times 32 15 1,234
mtrt 3D raytracer 32 8 1,416
Dacapo
bloat Java byte code analyzer 48 344 1,195
pmd Java code analyzer 48 59 18,482
xalan Transforms XML into HTML
128 129 8,476
Other 15 benchmarks 128
ISMM 2010 9
Not all Deep Object Graphs are Evil
Heap
1
2
Roots
3
4K
Object-graph 1K same sized linked lists
of 4K objects
Sequential trace 4M steps
Parallel trace Scales well for up to 1K
processors
…
4K 4K
ISMM 2010 10
Definition:
Deep and Narrow Object Graphs are Evil
Object Depths Distribution Amount of objects at different depths
Example:
Heap
2
4
3
1
1
#objectsGraphical Representation (Object-graph shape):
0
1
2
3
4
5
1 2 3 4 5
depth
# ob
ject
s
ISMM 2010 11
Object-Graph Shapes of Java Benchmarksjython
# ob
ject
s
depth
depth
xalan
# ob
ject
s
ISMM 2010 12
Object-Graph Shapes of Java Benchmarks
bloat
javac
mtrt
xalan
pmd
db
hsqldb
antlr
jython
jess
jack
lusearch
depth (log 10) depth (log 10)
# ob
ject
s (lo
g 10
)
ISMM 2010 13
The Idealized Trace UtilizationSimulate the idealized traversal by N threads
Perfect load balancing Perfect cache behavior
BFS traversal Single time tick object scan
During the traversal, count Objects available to be scanned at every time tick Processor slots: some are busy and some are wasted
At the end, report the utilization (ITU)Total Scanned ObjectsTotal Processor Slots
* 100%
ISMM 2010 14
Idealized Trace Utilization Example
Heap objects
Time ticks
Scanned objects
8
15
Total Scanned Objects
Total Processor Slots* 100%ITU = = 15
8*4* 100% = 47 %
4 Tracers
1
2
2
5
3
9
4
11
5
12
6
13
7
14
Core 1Core 2Core 3
Core 4
ISMM 2010 15
Graphical Representation
1. Simulate and compute 2. Draw the graph
depth
# ob
ject
s
0
20
40
60
80
100
1 2 4 8
Processors
Util
izat
ion
ISMM 2010 16
Worst Case ITU for Java Benchmarks
0
20
40
60
80
100
1 2 4 8 16 32 64 128 256 512 1024
Processors
Utiliz
atio
n
check
compress
db
jack
javacjess
mpegaudio
mtrtantlr
bloat hsqldb
jython
lusearch
pmdxalan
ISMM 2010 17
0
20
40
60
80
100
1 2 4 8 16 32 64 128 256 512 1024
Processors
Util
izat
ion
check
compress
db
jack
javac
jess
mpegaudio
mtrtantlr
bloat hsqldb
jython
lusearch
pmd
xalan
Average ITU for Java Benchmarks
ISMM 2010 18
What’s Next?
Problematic heaps exist javac, mtrt, pmd, bloat, xalan
Can we improve the trace scalability without modifying the benchmarks?
Reshape with Shortcut References
Trace with Speculative Roots
ISMM 2010 19
Reshape with Shortcut References
Heap
1
2
Roots
3
4
Sequential trace 16K steps
New references are added Invisible to the
program Useful for the
tracers
Parallel trace Scales for 4
processors4K
16K
ISMM 2010 20
Evaluation Prototype Devise a shortcut strategy
Where shortcuts are needed
When the program is stopped for GC Compute the Idealized Trace Utilization Run the shortcuts adding algorithm Compute the ITU for the modified heap
Report ITU improvement Amount of shortcuts added
ISMM 2010 21
Shortcut Strategy and Parameters Identify candidate subgraphs
With at least size objects With depth-to-size ratio no less than ratio
Add shortcut to the root of the subgraph Leading to the objects length pointers away Next shortcut introduced not closer than distance
pointers away
1 65432 987
Distance (2) Length (4)
Size=5
Depth=4
Ratio=0.8
ISMM 2010 22
Results for SpecJVM mtrt
0
20
40
60
80
100
1 2 4 8 16 32 64 128 256 512 1024
Processors
Util
izat
ion
Worst before Worst after Avg before Avg after
~ 500K of live objects
Max shortcuts – 110 Avg shortcuts – 94
Size=50
Ratio=0.2Length=50
Distance=25
ISMM 2010 23
Results for DaCapo xalan
~ 400K of live objects
Max shortcuts – 888 Avg shortcuts – 536
Size=50
Ratio=0.2Length=50
Distance=25
0
20
40
60
80
100
1 2 4 8 16 32 64 128 256 512 1024
Processors
Utilizat
ion
Worst before Worst after Avg before Avg after
ISMM 2010 24
Results for DaCapo bloat
~ 400K of live objects
Max shortcuts – 940 Avg shortcuts – 378
Size=50
Ratio=0.2Length=50
Distance=25
0
20
40
60
80
100
1 2 4 8 16 32 64 128 256 512 1024
Processors
Util
izat
ion
Worst before Worst after Avg before Avg after
ISMM 2010 25
Results for DaCapo pmd
~ 434K of live objects
Max shortcuts – 5,874 Avg shortcuts – 432
Size=600
Ratio=0.1Length=120
Distance=40
0
20
40
60
80
100
1 2 4 8 16 32 64 128 256 512 1024
Processors
Utilizat
ion
Worst before Worst after Avg before Avg after
ISMM 2010 26
Results for SpecJVM javac
~ 383K of live objects
Max shortcuts – 292 Avg shortcuts – 16
Size=500
Ratio=0.1Length=100
Distance=50
0
20
40
60
80
100
1 2 4 8 16 32 64 128 256 512 1024
Processors
Util
izat
ion
Worst before Worst after Avg before Avg after
ISMM 2010 27
Trace with Speculative Roots
Heap
Roots
4K
4M
Sequential trace 16M steps
Helper tracers Pick random roots Trace using custom
colors
Parallel trace Scales for 4
processors
ISMM 2010 28
Speculative Trace Helper tracer
Pick up the root Pick up the color, e.g. red Trace; if blue object is discovered, mark blue as
reachable from red
Regular trace Trace from root; if blue object is discovered, mark blue
as live
Complete trace All colors reachable from live colors marked live All objects marked by live colors survive the collection
ISMM 2010 29
Evaluation Prototype
Useful helpers work Live objects colored by live colors
Wasted helpers work Dead objects colored by dead
colors
Floating garbage Dead objects colored by live colors
a
Heap
he
b d
g
c
j
f i
k
l
m
4 regular tracers, 4 helper tracers Speculative roots – random unmarked objects ITU before and after the colored trace
ISMM 2010 30
Limit the floating garbage Maximal amount of objects colored by a
single color Helpers must save discovered but not traced objects Trace completion phase takes care of the saved fronts
Make the random roots choices smarter To avoid choosing dead objects To reach deeper parts of the live object graph
Filter for the recursive objects Objects with referents of their own type
ISMM 2010 31
Results Lots of floating garbage
Even with the filter
Hard to find good roots Progressively harder as the live objects are getting
marked
Trace completion phase is complex Can defeat the purpose
Modest improvement in the Idealized Trace Utilization scores
ISMM 2010 32
Results for DaCapo xalanWorst case ITU improvement, with the random choices filter
0
20
40
60
80
100
1 2 4 8 16 32 64 128 256 512 1024
Processors
Util
izat
ion
BeforeAfter
ISMM 2010 33
Results for DaCapo bloatWorst case ITU improvement, with the random choices filter
0
20
40
60
80
100
1 2 4 8 16 32 64 128 256 512 1024
Processors
Utiliz
atio
n
BeforeAfter
ISMM 2010 34
Related Work Parallel Garbage Collection Folklore
There are heap structures that can foil any clever load balancing scheme
Siebert (ISMM’08) Reported object graph depths for SpecJVM
benchmarks Proposed upper bound on the worst case
scalability as a way to compute RT guarantees for the GC tracing
Random tracing originally proposed by Click
ISMM 2010 35
Summary Studied the heap shape properties of Java
benchmarks Out of twenty considered benchmarks, five had not
scalable heap shapes during the run
Devised a measure to quantify the heap shape scalability Idealized Trace Utilization
Proposed, prototyped and evaluated two approaches to improve the tracing scalability Reshaping with Shortcuts appears to be more
promising than Tracing from Speculative Roots
ISMM 2010 36
Thank You!