+ All Categories
Home > Software > HKG15-200: OpenJDK under the hood

HKG15-200: OpenJDK under the hood

Date post: 15-Jul-2015
Category:
Upload: linaro
View: 344 times
Download: 1 times
Share this document with a friend
Popular Tags:
48
Presented by Date OpenJDK under the Hood all you never wanted to know Edward Nevill Tue, Feb 10th, 2015
Transcript
Page 1: HKG15-200: OpenJDK under the hood

Presented by

Date

OpenJDK under the Hoodall you never wanted to know

Edward Nevill

Tue, Feb 10th, 2015

Page 2: HKG15-200: OpenJDK under the hood

OpenJDK UTH - Topics● Where do I get OpenJDK?● How do I build OpenJDK?● How do I install/run OpenJDK?● What secret options are there?● Source tree overview?● From Java to machine code● A look inside the Template Interpreter● A look inside the C1 Compiler● A look inside the C2 Compiler● OpenJDK Performance Optimisations● Further Resources

Page 3: HKG15-200: OpenJDK under the hood

Where do I get OpenJDK?The nice thing about standards is that you have so many to choose from - Andrew S. TanenbaumThe nice thing about OpenJDK is that there are so many places to get it - Edward C Nevill

1) From your distro

$ apt-get source$ cd openjdk-7-7u51-2.4.6/$ dpkg-buildpackage...dpkg-checkbuilddeps: Unmet build dependencies: ant ant-optional...$ sudo apt-get install ant ant-optional…$ dpkg-buildpackage…(deux heures après)$

Page 4: HKG15-200: OpenJDK under the hood

Where do I get OpenJDK (JDK7)?2) JDK7: From IcedTea - http://icedtea.classpath.org

- IcedTea is the repo used by most (all?) distros to build JDK7.- The aarch64 port has been backported into IcedTea

IcedTea has 2 main components maintained in mercurial

http://icedtea.classpath.org/hg/icedtea7/The IcedTea build environment which provides ./configure & make

http://icedtea.classpath.org/hg/icedtea7-forest/A clone of the upstream jdk7 http://hg.openjdk.java.net/jdk7u with additions(arm microJIT, aarch64 port)

Page 5: HKG15-200: OpenJDK under the hood

There are six sub repos within the icedtea7-forest.http://icedtea.classpath.org/hg/icedtea7-forest/corba/http://icedtea.classpath.org/hg/icedtea7-forest/hotspot/http://icedtea.classpath.org/hg/icedtea7-forest/jaxp/http://icedtea.classpath.org/hg/icedtea7-forest/jaxws/http://icedtea.classpath.org/hg/icedtea7-forest/jdk/http://icedtea.classpath.org/hg/icedtea7-forest/langtools/

To clone all clone the head(root?), and use the ‘get_source.sh’ script to clone the remainder. Eg.

$ hg clone http://icedtea.classpath.org/hg/icedtea7-forest/$ cd icedtea7-forest$ bash get_source.sh(clones corba, hotspot, jaxp, jaxws, jdk, langtools)

Page 6: HKG15-200: OpenJDK under the hood

IcedTea also maintains tarballs of every release Eg.http://icedtea.wildebeest.org/download/source/icedtea-2.5.4.tar.xz

The source tarball does not contain a copy of the OpenJDK forest but will pull in tarballs of all trees in the forest during the build.

This is most convenient if you just want to build OpenJDK7 and are not interested in any of the Mercurial data. In this case downloading and building IcedTea should just consist of doing

$ wget http://icedtea.wildebeest.org/download/source/icedtea-2.5.4.tar.xz$ cd icedtea-2.5.4$ bash ./configure$ make

Page 7: HKG15-200: OpenJDK under the hood

Where do I get OpenJDK (JDK8)?3) JDK8: From the aarch64 port http://hg.openjdk.java.net/aarch64-port/jdk8/

This is a branch of the jdk8 tree maintained under the aarch64 port on openjdk.java.net.

There are 7 sub trees:-corba, hotspot, jaxp, jaxws, jdk, langtools, nashorn

As with JDK7 clone the head (root?) and use the ‘get_source.sh’ script to pullin the remainder of the forest.

The aarch64 JDK8 port supports cross compilation and compilation for theaarch64 builtin sim.

Page 8: HKG15-200: OpenJDK under the hood

● Cross Compilation○ Supported via 2 scripts ‘cross_configure’ and ‘cross_compile’○ In order to cross compile you will need a ‘sysroots’

■ sysroots is a product of the OE leg-java build■ also contains all the cross compilations tools, gcc etc

○ Here is one I prepared eariler!■ http://openjdk.linaro.org/sysroots/sysroots_140918.tar.xz

● Compilation for the builtin sim.○ Supported via 2 scripts ‘sim_configure’ and ‘sim_compile’○ The builtin sim is a small aarch64 simulator which is linked into openjdk.○ When JIT code is called it uses trampoline functions to call the JIT code

using the aarch64 simulator. Otherwise all code executes in x86.○ Source code of builtin sim is on sourceforge

■ http://hg.code.sourceforge.net/p/smallaarch64sim/code○ Downloaded automagically by the ‘sim_configure’ script.

Page 9: HKG15-200: OpenJDK under the hood

Where do I get OpenJDK (JDK9)?2) JDK9: From the jdk9 trunk http://hg.openjdk.java.net/jdk9/jdk9

● aarch64 port fully merged into JDK9 trunk○ Upsides

■ No more merging, single code base for distros○ Downsides:-

■ No support for builtin sim■ cross compilation not working

● (can be made to work using patches and scripts from jdk8)■ client build not supported out of box

● (again can be made to work with patches from jdk8)■ Every change however small requires a JIRA bug id

● JTreg test suite not fully merged into JDK9 trunk

Page 10: HKG15-200: OpenJDK under the hood

How do I build OpenJDK?A good starting point for building OpenJDK is the “Build OpenJDK” page at http://openjdk.linaro.org

In order to build OpenJDK you must have a working Java on your system and that Java must be no more that 1 version older than the version you are trying to build. Eg. to build JDK8 you need either JDK7 or JDK8 installed on your system.

Download an engineering build from http://openjdk.linaro.org/releases.htm

Aside: The first build for a system is either built using cross compilation (eg using OE) or using ecj.

Page 11: HKG15-200: OpenJDK under the hood

The standard build for JDK7 (IcedTea) and JDK8, JDK9 is

$ bash ./configure$ make (or “make images” for JDK8/JDK8)

However the configure scripts and options are completely different between IcedTea (JDK7) and JDK8/JDK9. In both case ./configure --help will list the options available.

The first thing we must do is tell it to use the JDK we just downloaded.

For IcedTea (JDK7) do$ bash ./configure --with-jdk-home=<location of the jdk>

For JDK8/JDK9 do$ bash ./configure --with-boot-jdk=<location of the jdk>

Page 12: HKG15-200: OpenJDK under the hood

If you are building IcedTea (JDK7) from the mercurial repos rather than a downloaded tarball then you also need to run autogen. If you download a source tarball this is not necessary.

$ cd icedtea7$ bash ./autogen.sh

The IcedTea build environment will automatically download all the OpenJDK source component it needs (eg corba, hotspot, jdk, …). If you are hacking on the sources this is not what you want. Instead you want to point the IcedTea build environment at your own source forest. Use the --with-openjdk-src-dir to do this.

$ bash ./configure --with-openjdk-src-dir=<location of my icedtea7 forest>

Never, ever, put the source tree you are working on inside the icedtea build environment directory.

Page 13: HKG15-200: OpenJDK under the hood

The jdk8/jdk9 builds maintain a number of different configurations within the build directory. You can for example have builds configured for debug or release, for client or server, or for different archs all in the same build directory.

For example, a snapshot of my build directory for jdk8 is

$ ls buildlinux-aarch64-normal-server-releaselinux-aarch64-normal-server-slowdebuglinux-x86_64-normal-server-release

To build a specific specify the configuration in the CONF environment variable before the make. Eg

CONF=linux-aarch64-normal-server-release make images

Page 14: HKG15-200: OpenJDK under the hood

The following are some useful options to the IcedTea configure:-

--enable-zero Enable a pure C interpreter build--disable-docs Disable doc generation (speed up build)--disable-hotspot-test-in-build

By default, HotSpot does a sanity check of itself during thebuild. This can cause problems if you have broken HotSpotand it no longer builds. Disable with this option.

--disable-system-(zlib/jpeg/png/gif/lcms/gtk/gio/fontconfig)Disable linking against the specified system library and usebuilt in libraries instead. Useful if there is a problem with system libs.

--with-parallel-jobs=N Do as it says--disable-bootstrap Disable bootstrap building (speed up build)--with-jar=/usr/bin/fastjar

Speed up build using fastjar

Page 15: HKG15-200: OpenJDK under the hood

Useful JDK8/JDK9 configure options:-

--with-jvm-variants=<list> Comma separated list of variants to build from(client, server, zero)

--with-debug-level= One of (release, fastdebug, slowdebug)--with-jobs=N No. of parallel build jobs--with-sys-root= Pointer to the sysroot for cross compilation--with-jvm-interpreter= One of (cpp, template)

Useful make targets

make hotspot Rebuild only hotspotmake LOG=debug Print commands executed.

Page 16: HKG15-200: OpenJDK under the hood

How do I install/run OpenJDKIn general the tarball can just be untarred and run directly, no need to untar as root or for any special install. However OpenJDK depends on many packages. The easiest way to ensure these packages are installed is to ensure the default java for your distro is installed.

yum install java-1.7.0 / apt-get install openjdk-7-jdk

OpenJDK comes in 2 flavours, the JDK (eg openjdk-7-jdk), and JRE (openjdk-7-jre). The JRE contains only those component necessary to run Java. The JDK contains the development tools also (eg javac, etc).

In general, if you are doing any hacking with java you will want the JDK.

Page 17: HKG15-200: OpenJDK under the hood

The following is a list of all the commands provided by the OpenJDK package

appletviewer javac jdb jstack policytool tnameserv apt javadoc jhat jstat rmic unpack200 extcheck javah jinfo jstatd rmid wsgen idlj javap jmap keytool rmiregistry wsimport jar java-rmi.cgi jps native2ascii schemagen xjc jarsigner jcmd jrunscript orbd serialver java jconsole jsadebugd pack200 servertool

The most interesting of these are probably

java <class> - Run my Java classappletviewer <html> - Run a Java applet embedded in an HTML filejavac <source> - Compile my Java sourcejavap <class> - Disassemble java class file

Page 18: HKG15-200: OpenJDK under the hood

What secret options are there?All ‘secret’ options start with -XX: and to the best of my knowledge are only documented in the source code. Boolean options are specified as -XX:+<Option> or -XX:-<Option>, others (string, int) are specified as -XX:<Option>=<Value>

The following web sites give a reasonably complete list of options but they are rather dated and seem to be based on OpenJDK 6.

http://reins.altervista.org/java/A_Collection_of_JVM_Options_MP.html

http://stas-blogspot.blogspot.co.uk/2011/07/most-complete-list-of-xx-options-for.html

The only way to find all the options is to do$ find hotspot -name \*globals\*.hpp

Page 19: HKG15-200: OpenJDK under the hood

The following is a list of some useful -XX options

-XX:+UnlockDiagnosticVMOptions Enable use of diagnostic options such as PrintAssembly-XX:+PrintCompilation Print the name of each method as it is compiled-XX:+PrintAssembly Disassemble each method after compilation-XX:-UseCompressOops Do not compress 64 bit pointers into 32 bits-XX:+UseLargePages Use large pages if available-XX:-UseAES Do not generate AES instructions-XX:-UseSHA Do not generate SHA instructions-XX:-UseBiasedLocking Disable Biased locking-XX:-EnableContended Disable use of @Contended annotation-XX:+UseG1GC Use the G1 Garbage Collector-XX:+UseParallelOldGC Use the old parallel scavenge GC-XX:+UseConcMarkSweepGC Use Concurrent Mark-Sweep GC in the old generation-XX:-UseTLAB Disable use of Thread Local Allocation-XX:-Inline Disable Inlining-XX:+PrintInlining Print Inlining optimisations-XX:+PrintInterpreter Print the generated interpreter code-XX:-UseOnStackReplacement Disable On Stack Replacement

Page 20: HKG15-200: OpenJDK under the hood

The following are some more compiler specific -XX options

-XX:-UseCompiler Disable JIT compilation-XX:AllocatePrefetchStyle=N0=> no prefetch, 1=> prefetch for each allocation-XX:MaxInlineLevel=N Max no. of nested calls that are inlined-XX:MaxInlineSize=N Max bytecode size of a method to be inlined-XX:ReservedCodeCacheSize Max code cache size-XX:CompileCommand=<cmd> Issue a single compile command-XX:CompileCommandFile=<file> Read compile commands from <file>-XX:CIStart=N Start compiling at this method id-XX:CIStop=N Stop compiling at this method id-XX:CompileThreshold =N No. of interpreted method invocations before compilation-XX:BackEdgeThreshold=N Back edge threshold before OSR compilation-XX:-TieredCompilation Disable Tiered compilation-XX:+UseSuperWord Transform scalar operations into superword ops-XX:+UseJumpTables Use JumpTables instead of binary search-XX:+PrintIntrinsics Print all inlined intrinsics-XX:LoopUnrollLimit=N Unroll loop bodies with node count < N-XX:+UseNeon Enable use of Neon for CRC32 (aarch64 only)-XX:+UseCRC32 Enable use of crc32 instructions (aarch64 only)

Page 21: HKG15-200: OpenJDK under the hood

Source tree overview7 main components

corba - Provides CORBA support in Javajaxp - Java API for XML Processingjaxws - Java API for XML Web Servicesnashorn - JavaScript engine

langtools - Language tools (eg javac, javah, javap etc)

jdk - The JDK APIhotspot - The Java execution engine

Page 22: HKG15-200: OpenJDK under the hood

hotspot/agent - The hotspot serviceability agenthotspot/test - The hotspot JTreg test suitehotspot/src - The main source for hotspot

hotspot/src/share- Shared code (vm, tools)hotspot/src/cpu - CPU specific code (aarch64, ppc, sparc, x86, zero)hotspot/src/os - OS specific code (aix, bsd, linux, posix, solaris, windows)hotspot/src/os_cpu - OS and CPU specific code (linux_x86, windows_x86, linux_aarch64, …)

hotspot/src/share/vm - Shared VM components /adlc - The Architecture Description Language Compiler /asm - The assembler and macroAssembler interface /c1 - The C1 implementation /classfile - Classfile loading and verification /gc_implementation - The 4 gc implementations (CMS, g1, parallelScavenge, parNew) /interpreter - The CPP and Template interpreters /opto - The C2 compiler /runtime - Java runtime supprt

hotspot/src/share/aarch64/vm/* - The bulk of the aarch64 port (95 source files)hotspot/src/share/zero/vm/* - The ‘Zero’ CPU port

Page 23: HKG15-200: OpenJDK under the hood

From Java to machine code● The next few slides look at how the following Java program is run in OpenJDK

class Fib {static int fib(int x) {

if ((x == 1) || (x == 2)) return 1; else return (fib(x-1) + fib(x-2));

}

public static void main(String args[]) { int arg = Integer.parseInt(args[0]); System.out.println("Fib of " + arg + " = " + fib(arg));

}}

Page 24: HKG15-200: OpenJDK under the hood
Page 25: HKG15-200: OpenJDK under the hood

The Template Interpreter● The Template Interpreter is constructed dynamically at runtime.● The template interpreter can be disassembled using the -XX:+PrintInterpreter options.● Interpreter size is approx 120K on aarch64● The following is the entry for iadd

ldr w0, [x20],#8 ; Load TOSldr w1, [x20],#8 ; Load TOS-1add w0, w1, w0 ; Do the add - exit in itos state

ldrb w8, [x22,#1]! ; Load next bytecode add w9, w8, #0x300 ; Index to the itos table ldr x9, [x21,w9,uxtw #3] ; Load next routine br x9 ; & branch

● The TOS is cached in a register w0/x0 for 32/64 bit, s0/d0 for float/double● There are 9 TOS states (btos, ctos, stos, itos, ltos, ftos, dtos, atos, vtos)● Each 32 bit value takes 8 bytes on the Java stack, 64 bit values take 16 bytes!

Page 26: HKG15-200: OpenJDK under the hood

● The following is the entry for iconst_null

str x0, [x20,#-8]! ; btos/ctos/stos/itos entryb do_operationstr s0, [x20,#-8]! ; ftos entry

b do_operation str d0, [x20,#-16]! ; dtos entry b do_operation str x0, [x20,#-16]! ; ltos entry b do_operation str x0, [x20,#-8]! ; atos entrydo_operation movz x0, #0x0 ; 0 to TOS, exit in atos statedispatch ldrb w8, [x22,#1]! add w9, w8, #0x700 ldr x9, [x21,w9,uxtw #3] br x9

Page 27: HKG15-200: OpenJDK under the hood

● The code for the templates is generated from templateTable_aarch64.cpp. The following is the code which generates the ‘iadd’ code previously.

void TemplateTable::iop2(Operation op){ transition(itos, itos); // Enter in itos state, leave in itos state // r0 <== r1 op r0 __ pop_i(r1); // pop additional arg off java stack switch (op) { case add : __ addw(r0, r1, r0); break; // do the add case sub : __ subw(r0, r1, r0); break; case mul : __ mulw(r0, r1, r0); break; case _and : __ andw(r0, r1, r0); break; case _or : __ orrw(r0, r1, r0); break; case _xor : __ eorw(r0, r1, r0); break; case shl : __ lslvw(r0, r1, r0); break; case shr : __ asrvw(r0, r1, r0); break; case ushr : __ lsrvw(r0, r1, r0);break; default : ShouldNotReachHere(); } // dispatch to next bytecode done by caller}

Page 28: HKG15-200: OpenJDK under the hood

● The code for the bytecode dispatch is found in interp_masm_aarch64.cpp

void InterpreterMacroAssembler::dispatch_next(TosState state, int step) { // load next bytecode ldrb(rscratch1, Address(pre(rbcp, step))); dispatch_base(state, Interpreter::dispatch_table(state));}

void InterpreterMacroAssembler::dispatch_base(TosState state, address* table, bool verifyoop) { ... if (table == Interpreter::dispatch_table(state)) {

addw(rscratch2, rscratch1, Interpreter::distance_from_dispatch_table(state));ldr(rscratch2, Address(rdispatch, rscratch2, Address::uxtw(3)));

} else {mov(rscratch2, (address)table);ldr(rscratch2, Address(rscratch2, rscratch1, Address::uxtw(3)));

} br(rscratch2);}

Page 29: HKG15-200: OpenJDK under the hood

● Finally, the code for our second example, aconst_null, is simply

void TemplateTable::aconst_null(){ transition(vtos, atos); __ mov(r0, 0);}

Here, because the transition is from vtos -> itos it means that aconst_null can be entered in any state, whereas iadd could only be entered in vtos, or itos state (ie. it would be illegal to enter iadd in say, dtos state, where we have a double on top of the stack).

Hence, as we saw the transition code has to generate code for entering aconst_null from any state.

Page 30: HKG15-200: OpenJDK under the hood

The C1 Compiler● The aarch64 specific components of the c1 compiler are contained in the files c1_* in

hotspot/src/cpu/aarch64/vm

c1_CodeStubs_aarch64.cpp c1_FrameMap_aarch64.hpp c1_LIRAssembler_aarch64.hppc1_Defs_aarch64.hpp c1_globals_aarch64.hpp c1_LIRGenerator_aarch64.cppc1_FpuStackSim_aarch64.cpp c1_LinearScan_aarch64.cppc1_MacroAssembler_aarch64.cppc1_FpuStackSim_aarch64.hpp c1_LinearScan_aarch64.hppc1_MacroAssembler_aarch64.hppc1_FrameMap_aarch64.cpp c1_LIRAssembler_aarch64.cpp c1_Runtime1_aarch64.cpp

● The file c1_LIRAssembler_aarch64.cpp contains the bulk of the code for converting the LIR into aarch64 assembler.

● The follow code section shows the code for handling ‘lir_add’

Page 31: HKG15-200: OpenJDK under the hood

void LIR_Assembler::arith_op(LIR_Code code, LIR_Opr left, LIR_Opr right, LIR_Opr dest, CodeEmitInfo* info, bool pop_fpu_stack) { assert(info == NULL, "should never be used, idiv/irem and ldiv/lrem not handled by this method");

if (left->is_single_cpu()) {Register lreg = left->as_register();Register dreg = as_reg(dest);

if (right->is_single_cpu()) { // cpu register - cpu register

assert(left->type() == T_INT && right->type() == T_INT && dest->type() == T_INT, "should be"); Register rreg = right->as_register(); switch (code) { case lir_add: __ addw (dest->as_register(), lreg, rreg); break; case lir_sub: __ subw (dest->as_register(), lreg, rreg); break; case lir_mul: __ mulw (dest->as_register(), lreg, rreg); break; default: ShouldNotReachHere(); }

} else if (right->is_double_cpu()) {….

Page 32: HKG15-200: OpenJDK under the hood

The C2 Compiler● The bulk of the aarch64 C2 compiler is contained in a single file aarch64.ad (about 12000 lines)● .ad stands for Architecture Description and is compiled to C++ by adlc● There is some limited documentation on the syntax for adlc in src/share/vm/adlc/Doc/Syntax.doc,

however this file is dated 1997 so is quite out of date. Mostly it is a case of looking at x86_64.ad and doing your own experimentation to see what works.

● The ad file matches patterns from a sea of nodes to generate aarch64 output. For exmaple:-

instruct addI_reg_reg(iRegINoSp dst, iRegIorL2I src1, iRegIorL2I src2) %{ match(Set dst (AddI src1 src2)); // The pattern to match ins_cost(INSN_COST); // Basic cost, 1*INSN, for instruction selection format %{ "addw $dst, $src1, $src2" %} // Format for opto disassembly ins_encode %{

__ addw(as_Register($dst$$reg), // encode it in aarch64 as_Register($src1$$reg), as_Register($src2$$reg)); %} ins_pipe(ialu_reg_reg); // pipeline is ALU reg/reg, for instruction scheduling%}

Page 33: HKG15-200: OpenJDK under the hood

● More complex patterns can be used to handle specific cases. For example, the following pattern matches constant divide by 2.

instruct div2Round(iRegINoSp dst, iRegI src, immI_31 div1, immI_31 div2) %{ // The pattern matched here is // src + ((unsigned)(src >> 31) >> 31) match(Set dst (AddI src (URShiftI (RShiftI src div1) div2))); ins_cost(INSN_COST); format %{ "addw $dst, $src, LSR $div1" %}

ins_encode %{__ addw(as_Register($dst$$reg), // generate addw dst, src, src, lsr #31

as_Register($src$$reg), as_Register($src$$reg), Assembler::LSR, 31); %} ins_pipe(ialu_reg);%}

Page 34: HKG15-200: OpenJDK under the hood

● The .ad file also contains a pipeline description● This pipeline model consists of Resources, Pipeline Stages and Pipeline classes, Eg

resources( INS0, INS1, INS01 = INS0 | INS1, ALU0, ALU1, ALU = ALU0 | ALU1, MAC, DIV, BRANCH, LDST, NEON_FP);

pipe_desc(ISS, EX1, EX2, WR);

// Integer ALU reg-reg operation// Operands needed in EX1, result generated in EX2// Eg. ADD x0, x1, x2pipe_class ialu_reg_reg(iRegI dst, iRegI src1, iRegI src2)%{ single_instruction; dst : EX2(write); src1 : EX1(read); src2 : EX1(read); INS01 : ISS; // Dual issue as instruction 0 or 1 ALU : EX2;%}

Page 35: HKG15-200: OpenJDK under the hood
Page 36: HKG15-200: OpenJDK under the hood
Page 37: HKG15-200: OpenJDK under the hood
Page 38: HKG15-200: OpenJDK under the hood

OpenJDK Performance Optimisations● CRC Intrinsics using SIMD and CRC instructions

○ 2 separate implementations, 1st using built in CRC32 instructions, 2nd using basic SIMD.● AES Intrinsics using AArch64 Crypto extensions● SHA Intrinsics using SHA1 and SAH256 extensions● Allocation Prefetch

○ Add prefetch immediately after allocation to speed up subsequent accesses● String Intrinsics

○ Support for AryEq, encodeISOArray and String.indexOf● Pipeline Scheduling Optimisation for in-order cores (A53)● MathExact Intrinsics

○ Use Overflow flag to detect overflow on integer operations○ MultiplyExact uses smull / mul,smulh and cmp NE to detect overflow

● Other Intrinsics○ Sqrt, CountLeadingZeros, CountTrailingZeros

● Minor Optimisations○ Generate optimal code for int / 2○ Use adrp instead of MOVs for safepoint polling○ Optimise the C2 entry point verification○ Optimise addressing of card table

Page 39: HKG15-200: OpenJDK under the hood

● Instruction Selection vs Pipeline Scheduling○ Instruction selection based on OO cores (eg A57)○ Implemented using Instruction cost model in C2, eg

// Integer Subtractioninstruct subI_reg_reg(iRegINoSp dst, iRegIorL2I src1, iRegIorL2I src2) %{

match(Set dst (SubI src1 src2));

ins_cost(INSN_COST); // Subtract is 1 x INSN_COST…

%}

// Load Byte (8 bit signed)instruct loadB(iRegINoSp dst, memory mem) %{

match(Set dst (LoadB mem)); predicate(n->as_Load()->is_unordered());

ins_cost(4 * INSN_COST); // Load is 4 x INSN_COST…

%}

Page 40: HKG15-200: OpenJDK under the hood

● Instruction Selection vs Pipeline Scheduling○ Pipeline scheduling based on IO cores (eg A53)○ Implemented using pipeline classes in C2, eg

// Integer ALU reg-reg operation// Operands needed in EX1, result generated in EX2// Eg. ADD x0, x1, x2pipe_class ialu_reg_reg(iRegI dst, iRegI src1, iRegI src2)%{ single_instruction; dst : EX2(write); src1 : EX1(read); src2 : EX1(read); INS01 : ISS; // Dual issue as instruction 0 or 1 ALU: EX2;%}

Page 41: HKG15-200: OpenJDK under the hood

● Instruction Selection vs Pipeline Scheduling

// Integer ALU reg-reg operation with constant shift// Shifted register must be available in LATE_ISS instead of EX1// Eg. ADD x0, x1, x2, LSL #2pipe_class ialu_reg_reg_shift(iRegI dst, iRegI src1, iRegI src2, immI shift)%{ single_instruction; dst : EX2(write); src1 : EX1(read); src2 : ISS(read); INS01 : ISS; ALU: EX2;%}

Page 42: HKG15-200: OpenJDK under the hood

● Instruction Selection vs Pipeline Scheduling○ Pipeline classes are then used in instruction patterns like INSN_COST.

instruct addI_reg_reg(iRegINoSp dst, iRegIorL2I src1, iRegIorL2I src2) %{ match(Set dst (AddI src1 src2));

ins_cost(INSN_COST); format %{ "addw $dst, $src1, $src2" %}

ins_encode %{__ addw(as_Register($dst$$reg),

as_Register($src1$$reg), as_Register($src2$$reg)); %}

ins_pipe(ialu_reg_reg);%}

Page 43: HKG15-200: OpenJDK under the hood

● CRC optimisation implemented using SIMD & CRC instruction○ Implementation using CRC fairly straightforward

align(CodeEntryAlignment);BIND(CRC_by64_loop);

subs(len, len, 64); ldp(tmp, tmp3, Address(post(buf, 16))); crc32x(crc, crc, tmp); crc32x(crc, crc, tmp3); ldp(tmp, tmp3, Address(post(buf, 16))); crc32x(crc, crc, tmp); crc32x(crc, crc, tmp3); ldp(tmp, tmp3, Address(post(buf, 16))); crc32x(crc, crc, tmp); crc32x(crc, crc, tmp3); ldp(tmp, tmp3, Address(post(buf, 16))); crc32x(crc, crc, tmp); crc32x(crc, crc, tmp3); br(Assembler::GE, CRC_by64_loop);

● Scope for further optimisation using parallel CRC

Page 44: HKG15-200: OpenJDK under the hood

● Implementation using SIMD more complex● And SIMD implementation found to be slower than straight table lookup

BIND(L_by16_loop);subs(len, len, 16);ldp(tmp, tmp3, Address(post(buf, 16)));update_word_crc32(crc, tmp, tmp2, table0, table1, table2, table3, false);update_word_crc32(crc, tmp, tmp2, table0, table1, table2, table3, true);update_word_crc32(crc, tmp3, tmp2, table0, table1, table2, table3, false);update_word_crc32(crc, tmp3, tmp2, table0, table1, table2, table3, true);br(Assembler::GE, L_by16_loop);

void MacroAssembler::update_word_crc32(Register crc, Register v, Register tmp, Register table0, Register table1, Register table2, Register table3, bool upper) { eor(v, crc, v, upper ? LSR:LSL, upper ? 32:0); uxtb(tmp, v); ldrw(crc, Address(table3, tmp, Address::lsl(2))); ubfx(tmp, v, 8, 8); ldrw(tmp, Address(table2, tmp, Address::lsl(2))); eor(crc, crc, tmp); ubfx(tmp, v, 16, 8); ldrw(tmp, Address(table1, tmp, Address::lsl(2))); eor(crc, crc, tmp); ubfx(tmp, v, 24, 8); ldrw(tmp, Address(table0, tmp, Address::lsl(2))); eor(crc, crc, tmp);}

Page 45: HKG15-200: OpenJDK under the hood

● StringIntrinsic Optimisations○ AryEq - Basic optimisation to process 8 bytes / loop

next: ldr tmp1, [ary1], #8ldr tmp2, [ary2], #8subs cnt1, cnt1, #4eor tmp1, tmp1, tmp2cbnz tmp1, differbge next

○ EncodeISOArray - Optimise using SIMD and UQXTN instruction to process 64 bytes/loop

next: ld1 Vtmp1, Vtmp2, Vtmp3, Vtmp4, [src]uqxtn Vtmp1, Vtmp1 // Write bottom halfuqxtn2 Vtmp1, Vtmp2 // Write top halfuqxtn Vtmp2, Vtmp3uqxtn2 Vtmp2, Vtmp4get_fpsr tmp1cbnzw tmp1, loop_8st1 Vtmp1, Vtmp2, [dst], #32subs len, len, 32add src, src, 64bge next

Page 46: HKG15-200: OpenJDK under the hood

● Sting.indexOf○ Uses a combination of 2 algorithms, a simplified Boyer Moore, and a linear scan○ Boyer Moore used when pattern length >= 8 and source length >= 4 * pattern length○ Special case for 1, 2, 3 or 4 chars in pattern○ 2 x performance improvement on string searching

● Basic inner loop of Boyer Moore looks like the following:-

bmloopstr2:sub cnt1tmp, cnt1, #1ldrh ch1, [str1, cnt1tmp, lsl #1]ldrh skipch, [str2, cnt1tmp, lsl #1]cmp ch1, skipchbne bmskip…. // last ch matched, check last-1, last-2 to first

bmskip: cmp skipch, #128bhs bmadvldrb ch2, [sp, skipch]add str2, str2, cnt1, lsl #1 // skip whole pattern lengthsub str2, str2, ch2, lsl #1 // adjust if ch occurs elsewhere in patterncmp str2, str2endble bmloopstr2b nomatch

Page 47: HKG15-200: OpenJDK under the hood

Further resources● The main OpenJDK site is http://openjdk.java.net From here you can access the source,

documentation and subscribe to mailing lists.● The IcedTea main page http://icedtea.classpath.org/wiki/Main_Page also contains lots of useful

information and access to the source and documentation for IcedTea● The Linaro OpenJDK development page http://openjdk.linaro.org contains AARCH64 specific

information and binary/source downloads in addition to the CI test loop results.● Mailing lists. There are many mailing lists which can be subscribed to from the main OpenJDK site

[email protected] - for discussions relating to the aarch64 port○ [email protected] - the main IcedTea discussion list

● IRC - the #openjdk channel on irc.oftc.net is a low volume channel where you can get expert assistance or at least opinions.

● Contribute - In order to contribute you must sign the Oracle Contributors Agreement. Linaro has already signed this on behalf of its employees so if you are a Linaro employee you do not need to sign it again.

Thank You

Page 48: HKG15-200: OpenJDK under the hood

Recommended