Date post: | 21-Jan-2017 |
Category: |
Software |
Upload: | valeriia-maliarenko |
View: | 213 times |
Download: | 1 times |
JAVA JITC o m p i l a ti o n a n d o p ti m i z a ti o n
2
AGENDA
o What is JITo Types - Client, Server, Tieredo Main optimizations approacho JIT tuningo Conclusions
3
WHAT IS JITo Just In Time compilero Compilation done during execution of a
program – at run time – rather than prior to execution
o First presented at 1960 in LISPo Java, .NET, JS…o Oracle HotSpot, IBM J9, Azul…
4
WHAT IS JITo JIT separates optimization from SD (just update JVM
- not improve code, tune for your platform)o JIT'ing requires Profiling• Because you don't want to JIT everything
o Profiling allows better code-gen• Inline what’s hot• Loop unrolling, range-check elimination, etc• Branch prediction, spill-code-gen, scheduling
5
HOTSPOT JIT CLIENT (C1) WORKFLOWJava
Source Bytecode compiler
Bytecode
Optimized code JIT Compiler
Run time
1.5K invocations
6
JIT CLIENT (C1)
o Produced Compilations quicklyo Generated code runs relatively slowly
7
HOTSPOT JIT SERVER (C2) WORKFLOWJava
Source Bytecode compiler
Bytecode
Optimized code (native)
HotSpot info
Profiler
JIT compiler(optimization)
Run time
JIT compiler(deoptimization)
10K invocations
8
HOTSPOT JIT SERVER (C2)
o Produce compilations slowly (long warm-up)o Generated code runs fasto Profiler guidedo Speculative
9
HOTSPOT JIT TIERED (C2)o Available from Java 7o Default in Java 8o Best of C1 and C2 approaches
o Level0=Interpretero Level1-3=C1
o #1 – C1 w/o profilingo #2 – C1 with basic profiling (invocations)o #3 – C1 w full profiling (~35% overhead)
o Level4=C2
10
KEYS FOR JIT VERSIONo -cliento -server (-d64)o -server (-d64) -XX:+TieredCompilation
11
DEFAULT JIT VERSION
Install bits -client -server -d64Linux 32-bit 32-bit client compiler 32-bit server compiler Error
Linux 64-bit 64-bit server compiler 64-bit server compiler 64-bit server compiler
Mac OS X 64-bit server compiler 64-bit server compiler 64-bit server compiler
Windows 32-bit 32-bit client compiler 32-bit server compiler Error
Windows 64-bit 64-bit server compiler 64-bit server compiler 64-bit server compiler
OS Default compilerWindows, 32-bit, any number of CPUs -client
Windows, 64-bit, any number of CPUs -server
MacOS, any number of CPUs -server
Linux/Solaris, 32-bit, 1 CPU -client
Linux/Solaris, 32-bit, 2 or more CPUs -server
Linux, 64-bit, any number of CPUs -server
*In Java 8 the server compiler is the default in any of these cases
Information about default compiler% java -versionjava version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) Server VM (build 21.0-b17, mixed mode)
12
OPTIMIZATIONS IN HOTSPOT JVM• compiler tactics
• delayed compilation• tiered compilation• on-stack replacement• delayed reoptimization• program dependence graph rep.• static single assignment rep.
• proof-based techniques– exact type inference– memory value inference– memory value tracking– constant folding– reassociation– operator strength reduction– null check elimination– type test strength reduction– type test elimination– algebraic simplification– common subexpression elimination– integer range typing
• flow-sensitive rewrites– conditional constant propagation– dominating test detection– flow-carried type narrowing– dead code elimination
• language-specific techniques• class hierarchy analysis• devirtualization• symbolic constant propagation• autobox elimination• escape analysis• lock elision• lock fusion• de-reflection
• speculative (profile-based) techniques• optimistic nullness assertions• optimistic type assertions• optimistic type strengthening• optimistic array length strengthening• untaken branch pruning• optimistic N-morphic inlining• branch frequency prediction• call frequency prediction
• memory and placement transformationexpression hoistingexpression sinkingredundant store eliminationadjacent store fusioncard-mark eliminationmerge-point splitting
• loop transformations• loop unrolling• loop peeling• safepoint elimination• iteration range splitting• range check elimination• loop vectorization
• global code shaping• inlining (graph integration)• global code motion• heat-based code layout• switch balancing• throw inlining
• control flow graph transformation• local code scheduling• local code bundling• delay slot filling• graph-coloring register allocation• linear scan register allocation• live range splitting• copy coalescing• constant splitting• copy removal• address mode matching• instruction peepholing• DFA-based code generator
13
INLINING – MOTHER OF OPTIMIZATIONBefore After
*Using JVM Devirtualization if needed Frequency and size matter
int addAll(int max){ int accum=0; for (int i=0;i<max;i++) { accum = add(accum, i); } return accum; }}int add(int a, int b) {return a+b;}
int addAll(int max){ int accum=0; for (int i=0;i<max;i++) { accum = accum+i; } return accum; }}int add(int a, int b) {return a+b;}
14
OSR – ON-STACK REPLACEMENT
oRunning method never exits?oBut it’s getting really hot?oGenerally means loops, back-branchingoCompile and replace while runningoNot typically useful in large systemsoLooks great on benchmarks!
15
ESCAPE ANALYSISoObject is referenced only inside some loop; no
other code can ever access that object?o It needn’t get a synchronization lock when
calling the methods working with objecto It needn’t store the fields in memory; it can
keep that value in a registeroSimilarly it can store the objects references in a
register
16
ESCAPE ANALYSISpublic class Factorial { private BigInteger factorial; private int n;
public Factorial(int n) { this.n = n; }
public synchronized BigInteger getFactorial() { if (factorial == null) factorial =...; return factorial; }}
ArrayList< BigInteger > list = new ArrayList < BigInteger >(); for ( int i = 0 ; i < 100 ; i ++) { Factorial factorial = new Factorial ( i ); list.add(factorial.getFactorial ()); }
17
ESCAPE ANALYSIS (SIMPLE CASE)o It needn’t get a synchronization lock when
calling the getFactorial() method.o It needn’t store the field n in memory; it can
keep that value in a register. o It can just keep track of the individual fields of
the object.oSometime – it needn’t to execute it at all.
19
JIT TUNING (THESE MIGHT SAVE YOU )
o -client , -server or -XX:+TieredCompilationo -XX:ReservedCodeCacheSize=, -XX:InitialCodeCacheSize=
20
JIT TUNINGo -XX:CompileThreshold=invocation value for compilingo -XX:CICompilerCount= number of threadso -XX:MaxFreqInlineSize=for hot methods (default value 325
bytes)o -XX:MaxInlineSize= method smaller this will be inlined anyway
(default value 35 bytes)
21
WANT TO GET MORE DETAILS?(BE CAREFUL WITH USING THEM ON PRODUCTION)
o -XX:+UnlockDiagnosticVMOptionso -XX:+TraceClassLoadingo -XX:+LogCompilationo -XX:+PrintAssemblyo -XX:+PrintCompilation - info about compiled methods
o -XX:+PrintInlining – info about inlining decisions
o -XX:CompileCommand=… - to control compilation policy
22
WANT TO GET MORE DETAILS? – LOGS
23
WANT TO GET MORE DETAILS – JITWATCH, JSTAT
24
CONCLUSIONS
o KISS, SOLID, DRY, YAGNI – all well-known principles are perfect for JIT to make his job
o Your code will be optimized and compiled, de-compiledo There is a lot of various algorithms to do it inside JVMo You need to reserve memory for compiled code (CodeCache inside
Metaspace/Permgen)o To get full performance throttle JVM needs to warm-upo Micro benchmarks lie to you. All the time
25
WHAT WE DIDN’T TOUCH
o Deoptimazingo Specific benchmark for compilerso Specific compiled code exampleso…
26
Q&A