Translating Java Programs into C++

12/14/14, 9:18 PMJava to C++ Translation

Page 1 of 21http://jklp.org/profession/papers/java2c++/paper.htm

Translating Java Programs into C++

James Peterson1,2,4, Glenn Downing2,3, and Ron Rockhold1,2

1IBM Austin Research LaboratoryAustin, Texas 78758

2Department of Computer Sciences The University of Texas at Austin

Austin, Texas 78712

3Have OOPL, Will Travel 15707 Racine Cove Austin, Texas 78717

4Netpliance 7600A Capital of Texas Highway

Austin, Texas 78731

July 1998

AbstractJava has become a very popular system for creating object-oriented applications. C++ is the more traditionallanguage for object-oriented programs. We have investigated the problem of converting programs from Java



to C++. While the two languages are in many ways similar, there are some interesting semantic problems intranslating from Java to C++.

A translator has been written to automatically convert Java programs into C++. Once this is done, a runtimesystem must be provided. The provision of a runtime system allows programs written in Java to be convertedto C++ and executed.

IntroductionJava [1,2] has become very popular in a short period of time. With its combination of object-orientedness andportability across platforms, it would seem to provide a solution to the code reuse problem. Being able toeffectively reuse code should result in higher programmer productivity.

The portability of Java across platforms is accomplished by compiling Java programs into a machineindependent byte code which is stored in a class file. The resulting code is then interpreted by a Java VirtualMachine (JVM). While there are many applications that may execute satisfactorily in an interpreted mode,computationally intensive applications tend to suffer poor performance when they are interpreted. In thiscase, a Just-In-Time (JIT) compiler can be used to translate, on the fly, the Java byte code instructions into thenative instruction set of the current processor. Compiling the Java program into native code should result infaster execution.

The quality of the native code, however, depends upon the algorithms, memory, and time spent first in theJava compiler, and then the algorithms, memory, and time spent in the JIT. While more sophisticated Javaand JIT compilers may produce better native code, the total execution time (which includes both the time forrunning the JIT compiler as well as the time to run the native code) may increase or decrease relative to thetime for the JVM to interpret the Java byte codes, or as compared to a different JIT compiler.

But while Java was designed to allow portability across many platforms, there are still significant platformsfor which Java is not available: no JVM has been written nor is a JIT available. In addition, some systems arenot designed to run a JVM, but only more conventional programs.

For a language such as Java, an alternative language would seem to be to C or C++ [3]. C is generally amature language with compilers which can produce very good performance. However, given the object-oriented nature of Java, it would seem that C++ might be a closer match. Programs in Java are designed asobject-oriented programs; C++ is also an object-oriented programming language.

Indeed, it has been suggested that Java is a simpler and cleaner version of C++, leaving out complex ortroublesome portions of the language (such as pointers, multiple inheritance, and operator overloading) whileproviding useful features which are not in C++ (such as garbage collection) but avoiding the complexity andobscurity of the more difficult parts of C++. For example, C++ allows multiple inheritance, while Javarestricts itself to single inheritance of classes. Seeing Java as a well-designed subset of C++, it would seemreasonable to believe that a Java program could be easily translated into C++, while a more complex andsophisticated C++ program would have difficulty being translated directly in Java. It seems plausible that aJava program can be automatically converted into C++.



Several other projects have attempted to convert Java into C or C++ [4]. Toba [5] and j2c [6] convert Javainto C. However both start, not with the Java source file, but with the Java class files. In some sense then,they are more like JIT compilers, converting a Java byte code sequence into a C program which is thencompiled to produce native code. The C program is not a useful program for its own sake, but only as ameans of creating native code. Similarly jcc [7] converts Java source code to C source code, but since C doesnot support objects directly, includes code to effectively interpret the Java program in C.

Our goal is to convert Java programs into equivalent C++ programs which closely match the original Javaprogram at the source level, to translate (not to compile) a Java program into a semantically equivalent C++program. Translation requires examining the various language features of Java and determining equivalentlanguage features in C++.

Translating from Java to C++ would allow an application programmer to begin developing in Java and, ifnecessary, convert the Java program to C++ and continue development from then on in C++. Since the C++program is a faithful replica of the Java program, we can continue with its implementation in C++ todetermine how much the various features of Java (as represented in C++) cost in performance, space orcomplexity. For example, for a correctly executing program, we could disable the runtime check for arraysubscript errors or the check for dereferencing a NULL pointer (both required parts of Java), allowing us toquantify the cost of such a runtime check.

In the interests of full discussion, we would note that coversion of Java to C++ would not be interesting forJava applets which are remotedly downloaded over a network. In fact, generally, C++ systems do not supportthe dynamic loading capabilities of Java systems. In addition, translating programs which use reflection andintrospection would probably not yield the expected result. Java programs which use reflection andintrospection examine their own class files to determine their own capabilities; the translated C++ programwould also examine the Java class files to determine the capability of the Java classes -- functionallyequivalent, but not what you might expect. None of these are really aspects of the Java language, but ratherof the Java runtime classes. We are interested only in the Java and C++ languages.

Language TranslationTo translate from Java to C++, we must consider each syntactic and semantic part of the Java language anddefine the equivalent C++ construct. We begin by examining the Java data types, then the control structures,and finally, the class structure.

Data TypesThe basic data types in Java are a set of builtin types (byte, short, int, long, float, double, boolean,char), objects, and arrays. The builtin types would seem to mostly translate directly to C++, but there areimportant differences. Java has well-defined statements of the number of bits and the representation of thesetypes, while C++, following C, is more vague.

In Java, an int is exactly a 32-bit two's complement integer. In most C++ implementations, an int is 32 bits,but it could be only 16 bits or maybe 64. We know only that an int in C (and C++) is at least as large as ashort (at least 16 bits) and no larger than a long (at least 32 bits). While most machines use a two's



complement representation of integers, C or C++ could also use ones' complement.

Accordingly, it is not, in general, possible to translate a Java int into a C++ int. However while it may notbe universally possible to translate directly from Java ints to C++ ints, it is practically possible for eachplatform to translate a Java int into some basic C++ type which is a guaranteed 32-bit integer for thatplatform. The specific mapping of a Java int may vary from platform to platform, but there should be somemapping for each platform. To allow the C++ definitions to be varied as necessary, we translate the Javabuiltin types to named types in C++, and assume that every C++ file will #include a general header file(JavaSupport.h) which will define them as necessary.

Java Type C++ Type Typical definitionbyte _j_byte typedef signed char _j_byte;short _j_short typedef short _j_short;int _j_int typedef int _j_int;long _j_long typedef long long _j_long;float _j_float typedef float _j_float;double _j_double typedef double _j_double;boolean _j_boolean typedef bool _j_boolean;char _j_char typedef unsigned short _j_char;

The inclusion of JavaSupport.h also gives us a place to define bool if it is not supported by the compiler, aswell as other declarations that may be needed to support the translation from Java to C++.

The major extensions in Java are the long (64-bit) type and the char (16-bit Unicode), but these are easilysupported on most C++ compilers. In the worst case, it might be necessary to provide a class, say for 64-bitarithmetic, and overload the C++ operators for that class for a system which did not provide direct support forthe needed type.

ObjectsObject variables in Java are not really objects, but are always references to objects. Since all Java objects arereferences, Java automatically dereferences as necessary. In C++, we can do the same, but since we can haveboth objects and references to objects in C++, it is necessary in C++ to explicitly reference and dereference.So Java objects are translated to C++ pointers to objects:

Java C++money m; money *m;loan mortgage; loan *mortgage;mortgage.m.value mortgage->m->value

Names



Most objects in both Java and C++ are referenced through variables. The names of the variables have bothtype and scope. In general we want to keep the same names in C++ as are used in Java.

Class names in Java are identified by a complete package name. Alternatively, using the Java importstatement, it is possible to refer to classes with partial names. For example, since an import java.lang.* isimplicit to each Java program, the java.lang.Object and java.lang.String classes can be referred to inJava as simply Object and String.

C++ has no corresponding notion of a package, and so we always use the complete package name for alltypes in our translation from Java to C++. Since we cannot use the period to delimit parts of the name, forsyntactic reasons, we transliterate the period separator to an underbar. Thus, a reference to Object in Java isfirst translated to the complete java.lang.Object and then to java_lang_Object in C++.

Although C++ has no concept of a package, it does have namespaces. We use a separate namespace for eachclass to provide appropriate isolation of names between the classes.

The use of namespaces also shows an interesting difference between the overloaded use of the period (or dot)as syntactic separator in Java and the differing delimiters used in C++. The Java namejava.lang.System.out.println is composed of three parts.

java.lang.System is the complete name of the class; we translate that to a corresponding C++namespace.out is then a field within that class.println is the invocation of a method on that field.

Although the C++ equivalent has the same parts, it uses different delimiters to separate them.

Java C++java.lang.System.out.println java_lang_System::out->println

Java has another troublesome situation with names. In Java, the compiler uses the syntactic type of the nameto allow different syntactic tokens to use the same name. Thus, in Java, you can use the same name for a localvariable, a method, a field, and a label. In C++ each of these names must be unique. From a programmer'spoint of view, the use of the same name for different purposes serves only to make programs more difficult tounderstand, edit, and maintain. However, since it is not necessary to write bad programs in Java, the ability tooverload names is not commonly used. (It is deliberately used in some Java obsfucation programs, to deterreverse engineering.)

It would be easy to prepend a "type of name" identifier to a Java name when converting it to C++. All labelscould be of the form "l_xxx", all local variables "v_xxx", all methods "m_xxx", and so on. This would assurethat each use of a name is matched only with the correct declaration of the name, but it would make theresulting C++ program more obtuse. Accordingly, we try to transfer Java names directly to C++ unless thereis a clear conflict.

Notice that, in general, we cannot translate names only when a conflict arises unless we have globalknowledge of all uses of a class. For example, assume a Java class A has a field X while an interface B has amethod X. Neither of these have a name conflict since each name is local to its declaring class. However if a



class C then extends A and implements B (which is to say becomes a sub-class of both the original class andthe interface), it inherits both the method X (from B) and the field X (from A). At least one of these will need tobe renamed both for C, and to maintain the inheritance, for either A or B.

For complete correctness, it would be necessary to translate names into unique names before any nameconflict arises. In our translator, we rename only when an actual conflict arises, and expect the programmer torename things at the Java level, if need be, to avoid conflict. This allows names in Java and names in C++ tobe the same.

We also must rename those Java names which are reserved words in C++, such as extern, auto, inline,union, and so on.

Classes and InterfacesJava requires, and C++ supports, programs structured as a set of classes. Inheritance among classes defines aclass hierarchy. In addition, Java also defines the concept of an interface which defines a set of constants andmethods, but provides no implementation for the methods. Classes which inherit (implement) the interfaceare required to provide the actual code for the methods.

Java allows a class to inherit from only one superclass and implement any number of interfaces, while C++allows multiple inheritance. Java interfaces can only inherit from other interfaces, with one exception. Allclasses, interfaces and arrays effectively inherit from java.lang.Object.

An initial possibility in translating Java to C++ is to completely ignore interfaces. Interfaces provide no codeto translate or run, but only a compile time requirement that those classes which implement an interface mustprovide the bodies for the interface methods. If we assume the class is complete and correct in Java (and usethe Java compiler to check or enforce this requirement), then the methods are provided by the classes; theinterfaces merely create a requirement for the existence of the methods, and the requirement is met.

However, since we are converting from Java source to C++ source, we would like to maintain the equivalentstructure and understanding in the C++ source. Discarding interfaces might create significant holes in theclass structure in the C++ code.

In addition, there are two properties of interfaces which require that they be part of the C++ code:

First, an interface may define static constants which are used by the Java program. These are generallyexplained as being the Java equivalent of C preprocessor variables, but can also be seen as the valuesof enumerated types or const declarations in C++.

Another reason for retaining interfaces is that, while they cannot be instantiated, they are sometimesused as the type of a parameter. If we have two classes, A and B, which implement an interface C, theremay be methods which do not care if their parameters are of class A or class B, but only that they obeythe interface C. These parameters may be declared to be of type C, restricting the methods that can beused on the parameter to the methods of the interface C, that is, the common methods of A and B.

Accordingly both classes and interfaces in Java are converted to classes in C++. Interfaces, since they havemethods with no code, are abstract and contain pure virtual functions. A class in C++ is defined to inherit



from all of the classes corresponding to its Java superclass and Java superinterfaces. If a class or interface inJava does not explicitly declare a superclass or a superinterface, it implicitly inherits fromjava.lang.Object. We make this inheritance explicit when we translate from Java to C++. So every class inC++ (except java_lang_Object of course) either directly or indirectly inherits from java_lang_Object.

Since C++ supports multiple inheritance, you might expect that inheritance from both classes and interfaceswould be no problem in C++. However, there is one difficulty. Since interfaces can inherit from otherinterfaces, a class may end up inheriting from a given interface, either directly or indirectly, multiple times. Inparticular, every class inherits from java_lang_Object. Classes which inherit from both a superclass andsome superinterface, inherit from java_lang_Object multiple times. This is resolved by makingjava_lang_Object and all interfaces virtual base classes [3, page 396].

The use of virtual base classes resolves the difficulties with multiple inheritance (especially ofjava_lang_Object). It also requires that explicit casting may be necessary in C++ to provide the compilerwith the ability to coerce types. Static casting can be used to cast up the class hierarchy; dynamic casts areneeded to cast down the class hierarchy.

ArraysArrays in Java are a unique construct having the properties of both classes and built-in types. Arrays have afield (length) as well as memory for the array elements themselves. Arrays can be represented in C++ by aclass containing two fields: the array length and a pointer to the dynamically allocated memory for the array.

Java arrays are thus translated into a class in C++. The array class inherits from java_lang_Object, althoughjava_lang_Object need not be a virtual base class in this case, since arrays inherit from no other class. Sincethe type of the array element can vary, a template class [6, Chapter 13] is used in C++, creating a separateC++ array class for each Java array type. The template class defines constructors and overloads the subscriptoperator. To correctly reflect Java semantics, the subscript operator checks the subscript index against thelength of the array for every reference to an element of the array, and throws an exception if the index is outof bounds.

Multiply-dimensioned arrays are implemented as an array of pointers to arrays (a dope vector). Since eacharray has its own length, we can support non-regular multiple dimensioned arrays, just as in Java.

Java StringsLiteral Java strings create their own difficulties in C++. The overriding problem is that literal strings in Javaare Unicode, while literal strings in C++ are ASCII. In converting from Java to C++, we simply wrap allliteral strings with a function (or macro) call. The function is to take a literal ASCII string and return apointer to a Java String, an instance of class java_lang_String. At the moment this is supported by aconversion function at runtime, although this has a minor negative performance impact.

A complete solution to this problem requires either C++ compiler support for literal Unicode strings, ormoving the conversion from an ASCII string to a Java String to be part of program initialization rather thanduring actual execution. The use of a tagging wrapper function on all literal strings makes either of thesepossible.



A developer of a Java program which is translated to C++ must decide if the translated program is to continueto use Unicode or should be implemented with simple ASCII. This would allow the cost of supportingUnicode to be explicitly defined.

Another difficulty in translation is the use of the plus (+) operator for string concatenation. Java defines theactual meaning of a + b, where a and b are Strings to be:

new StringBuffer().append(a).append(b).toString()

Converting this to C++ then produces:

(new java_lang_StringBuffer())->append(a)->append(b)->toString()

By converting the "syntactic sugar" of the plus operator for strings to its explicit implementation, in terms ofappending to a StringBuffer, the normal conversion to C++ can handle this too.

A runtime switch in the translation program leaves the C++ code with a plus operator for strings. This wouldallow the C++ programmer to overload the plus operator on strings to be string concatenation. Java does notallow programmers to overload operators.

Operators and StatementsMost of the operators and statement types of Java are directly equivalent to the corresponding operators inC++, and so can be translated without difficulty. The similarity between Java and C++ in operators andstatements is what led to the idea of automatic translation, at the source level, from Java to C++.

The >>> and instanceof operators require translation.

Java C++a instanceof t (dynamic_cast<a,t> != NULL)a >>> b ((unsigned int)a) >> b

The use of super to identify methods in Java is resolved by substituting the actual class name, resulting in aminor loss in generality.

Statements in Java also match statements in C++ pretty well. The labeled break and continue statements canbe easily replaced by a goto to an appropriately placed (and uniquely named) label. An extra semicolon maybe needed at the end of the body of a switch statement if there are trailing switch labels.

A synchronized block is converted to an explicit call to _j_enter_monitor, followed by the block contents,and a call to _j_exit_monitor. Synchronized methods are handled the same way. Both monitor calls have topass the object on which to synchronize as a parameter. Threads and synchronization have no impact on thesyntax or semantics of the Java and C++ programs. The influence is in the runtime support; the programssimply call functions and procedures, which do the actual work. No compiler or language support is neededin C++ for threads or synchronization (only runtime support).

Exception handling in Java is much the same as exception handling in C++ with the addition of the finally



clause. The finally clause in a try statement is always to be executed after all other processing, eithernormal execution or if an exception is caught. Since there is no equivalent in C++, it becomes necessary todetermine how to construct the equivalent C++ code from the Java code.

One difficulty is that the finally clause must be executed before every exit from the try block. There are asurprising number of potential exits from a general try block: a normal exit, throwing an exception, a returnstatement, a break statement, a continue statement, or failing to catch an exception. The finally clausemust be executed before each of these (if they are used in the try block or a catch block).

For example, the following Java code shows most of the ways to exit from a try block:

try { switch (x) { case 0: continue next; case 1: break out; case 2: throw new TException(); case 3: return(0); } } catch (Error e) { throw(e); } finally { System.out.println("finally"); }

We considered making the finally clause a procedure (or technically, a method). However, the executioncontext of the finally block (the set of variables it can access and the meaning of names) is the executioncontext of the try block. Moving the finally block to a procedure would require either that there be no useof the try block execution context, or that that context be passed to the method as a parameter.

Another alternative would have been to set a local variable followed by a goto statement to transfer control tothe finally block, followed by a switch statement to select a goto to transfer control back to the right placeafter the finally block.

Our solution then is to simply duplicate the text of the finally block in all places where it must be executed.If the finally block is significant in size, this might suggest that the Java programmer might want to make ita method, in order to minimize the code repetition in C++.

The following C++ code shows how the finally block must be replicated for all the exits from a try block:

try { switch(x) { case 0: /* finally clause, continue from try block */ java_lang_System::out->println(_j_toString("finally"));



goto next;

case 1: /* finally clause, break from try block */ java_lang_System::out->println(_j_toString("finally")); goto out;

case 2: /* finally clause, throw exception from try block */ java_lang_System::out->println(_j_toString("finally")); throw(new TException());

case 3: /* finally clause, return from try block */ java_lang_System::out->println(_j_toString("finally")); return(0); } } catch(java_lang_Error *e) { /* finally clause, caught exception from try block */ java_lang_System::out->println(_j_toString("finally")); throw(e); } catch(...) { /* finally clause, uncaught exception from try block */ java_lang_System::out->println(_j_toString("finally")); throw; } /* finally clause, normal exit from try block */ java_lang_System::out->println(_j_toString("finally"));

ConstructorsA major difference between Java and C++ is the specification of constructors. Like C++, Java constructorsare methods with the name of the class. However, the body of Java constructors can have three differentforms.

Simple constructors simply assign values to the fields of the class as necessary. These can be directlyconverted to equivalent C++ constructors.

Super constructors begin with an explicit call to the constructor of the super class for this class. Thiscan be supported in C++ by the member initialization list of the C++ constructor.

this constructors begin by calling another (overloaded) constructor of this same class, with differentparameters. Typically, this is used to compensate for a lack of optional and default parameters in Javaconstructors. Constructors in C++ cannot call another constructor in the same class.

The first two types of constructors in Java can be easily converted to C++, but the last is more difficult.



Our solution is to separate the memory allocation aspect of construction from the initialization of thosevalues. All constructors in Java are converted to initialization methods in C++. Thus a Java class foo, withconstructors foo(int) and foo(float), is translated into a C++ class foo with methods foo_init(int) andfoo_init(float). All Java constructors are invoked by the new operator. A call to new foo is translated to(1) a call to allocate memory (using an undefined, and hence default, constructor for foo), followed by (2) anexplicit call to initialize this object with the current parameters.

Java C++new foo(p) (new foo())->foo_init(p)

The init methods explicitly return this.

With this construction, each this constructor for foo is converted into a simple init method in C++. Initmethods, being just methods to the C++ compiler, can freely call other init methods, either of their superclassor of the same class.

In addition, since Java explicitly defines default values (zero) for all otherwise uninitialized fields, we add amethod to initialize all fields to their default (zero) values.

Notice that the separation of constructors into memory allocation and initialization portions is possible onlybecause Java classes contain only references to objects, not objects themselves. The inclusion of objectswould have required the use of the member initialization list in C++ as part of the constructor.

Class InitializationA final problem involves the issue of when classes are initialized. Classes in Java may have initialized staticfields, static blocks, and other code to initialize the class (as distinct from the initialization of the instances ofthe class by the constructors and instance initializers). C++ also allows class initialization, but the approachesof the two are quite different.

C++ gives no specification of the order in which multiple classes are initialized, only that all classinitialization will be complete before the main program is started. This is useless in implementing classinitialization for translated Java programs, since Java defines and expects a very specific initializationsequence.

Java initialization is defined for and by the order of loading. Classes in Java are dynamically loaded when thefirst reference is made to it, and the class is initialized when it is loaded. Since in our implementation, theequivalent C++ program will be statically bound and loaded, initialization must be programmed to happen ata time different than the loading and linking time.

Our solution is to create in C++ a specific class initialization method for each Java (and C++) class (orinterface). All static field initialization and static blocks are collected in the order given, into the classinitialization method. This collected code is called the core initialization code for this class. The problem thenremains to get this code executed at the right time.

One approach would be to add a test in every method (actually only in every static method and constructor



and before every reference to a static field from a class different than the current one), to see if the class hasbeen initialized. If it has not been initialized, a call would be made to the class initialization method. Whilethis would be a simple test of a class static boolean variable, we felt that the runtime performance impact ofthis constant testing would be a problem.

Depending upon the initialization only at load time produces very fragile Java programs. If the order ofinitialization of two unrelated classes is implicitly expected to be in a particular order, a small change in theprogram, an additional reference to a class or class static variable can change the order of loading and hencethe initialization without warning. It would be dangerous to write Java programs whose class initializationdepended on other than the explicit class dependencies.

In the case where the order of initialization is only determined by explicit dependencies, a differentinitialization strategy is possible. Basically, Java requires that:

superclasses are initialized before their sub-classes, andall classes are initialized before their first use.

Our preferred approach is to add a call to the class initialization method of the main class (the class with themain method which is the entry point for the Java program) before we call the main method. This one callmust initialize not only the main class but any classes that it needs to initialize, in the correct order.

Looking at the class initialization method, we can examine the core initialization code. The superclass of thisclass and any classes used by the core initialization code must be initialized before we execute the coreinitialization code. Thus, if the core initialization code uses the java.lang.Math.sqrt function, we must callthe class initialization method for java_lang_Math before we can execute the core initialization code for thisclass. A scan of the core initialization code is used to create a list of all classes used in that core code. Calls tothe class initialization methods for these classes are inserted into the class initialization method before thecore initialization code itself.

After the core, we also need to be sure that any other class used by any method of this class is initialized.That is, if some other method of this class will want to call a constructor or reference a static variable ofjava_lang_String, then the initialization of this class is not complete unless the java_lang_String isinitialized. Accordingly, we form another list of all the classes used by any method in this class that were notinitialized before the core initialization code. Calls to the class initialization methods for this second list ofused classes are added to the class initialization method after the core class initialization code.

Thus, the class initialization method for the main method calls the class initialization methods of anythingwhich it uses. These classes in turn call the class initialization methods of anything they use, flooding theentire C++ class structure until every reachable class has been initialized. If a class is not initialized, it will bebecause it is unreachable, and so does not need to be initialized.

A private class static boolean variable is used to prevent initializing a class twice.

The TranslatorWith this analysis of the Java programming language, we have written a program to automatically translateJava source application programs to C++ source programs. The translator assumes its input is a correctly



compiling Java file, freeing us from detecting syntax errors and from enforcing compile time constraints onthe input Java file.

The Java source for each Java class is converted into 3 C++ files: a *.G, *.H, and *.C file. The declaration ofa class (or interface) foo is used to generate a foo.H header file, while the actual code for the methods is putin a foo.C code file. The foo.C code file #includes the foo.H header file to get the class declaration.

The foo.H file must #include the definition of the classes which are used in its definition. For example, if aclass has a field which is a string, it might have a line like java_lang_String *name. This means that thefoo.H file must also provide the compiler with a declaration for java_lang_String. However, circularitysoon arises, since, for example, everything must include java_lang_Object, which includesjava_lang_Class and java_lang_String which include java_lang_Object.

Our solution is to notice that, other than for superclasses (where we must #include the *.H file, but there isknown to be no circularity), we do not need the complete definition of other classes that we use; we need onlythe fact that they are classes. Thus, we produce for every class foo, a file foo.G (a "pre-foo.H" file) whichsimply declares that foo is a class:

foo.G: class foo;

The class declaration header files (*.H) then #include the *.G files for all classes that they reference. This issufficient to be able to declare the classes.

In addition, the foo.C code file #includes the class definition header files (the *.H files) for every class that ituses. In some sense, these #include lines replace the import statements in the Java file. Including the *.Hdeclaration files gives a complete definition of each class to the compiler before it generates any actual code.

So the *.C files include the *.H files which include the *.G files. This structure avoids circularity in theinclude files. (Once this structure was in place and working, we have skipped the #include foo.Gstatements, replacing the #include statement directly with the contents of those files, the declaration of fooas a class: class foo;, since it takes fewer characters and is somewhat simpler to compile.)

The translator program uses the LALR(1) grammar for Java of Chapter 19 of The Java LanguageSpecification[2], suitably rewritten as an input to yacc. Combined with a lexical analyzer from lex, we areable to parse Java programs.

The output of the lex/yacc code is a complete parse tree, including, for each token, information aboutcomments and line breaks, to allow the output C++ code to reflect the comments and line breaks of the inputJava code. The parse tree is normalized to eliminate some Java language variations. For example, the archaicdeclaration type name[] is converted to the more general type[] name.

The parse tree is then scanned to produce a symbol table, and to identify and propagate type information,producing a type-annotated parse tree, which is used to generate the output C++ files.

While the parse tree is being scanned, we may encounter references to other classes. Since we need thesymbol table information for these classes, we recursively parse and scan these classes as they areencountered. The symbol table information can be obtained either from Java source code or from the classfile form of these classes.



When the output C++ is generated, we attempt to meet the peculiarities of various C++ compilers. Forexample, the xlC compiler (from IBM), cannot handle repeated declarations of the same index in for-loops,so we generate an extra set of braces to create a block for each for-loop.

{ for (int i = 0; i < n; i++) process(i); }

We are able to successfully translate Java source programs into C++ and compile these C++ programswithout error with the IBM xlC compiler, the GNU g++ compiler, and with MicroSoft's Visual C++ compiler.

An ExampleAs an example, consider the simple class Point from page 120 of The Java Handbook [1]:

class Point { int x, y; Point(int x, int y) { this.x = x; this.y = y; } double distance(int x, int y) { int dx = this.x - x; int dy = this.y - y; return Math.sqrt(dx*dx + dy*dy); } double distance(Point p) { return distance(p.x, p.y); }}

This is converted into the following Point.G, Point.H, and Point.C files:

Point.G:

class Point;

Point.H:

#ifndef __POINT_H_#define __POINT_H_

/* **************************************************** *//* *//* Header file Point.H created from Point.java *//* *//* **************************************************** */

#include "JavaSupport.h"



#include "Point.G"

/* Import #include's */#include "java/lang/Object.G"#include "java/lang/Math.G"#include "java/lang/Class.G"#include "java/lang/String.G"

/* Inheritance #include's */#include "java/lang/Object.H"

class Point: public virtual java_lang_Object{ public: static void _j_class_initialization(_j_int level); private: void _j_instance_initialization(); public: static java_lang_Class *TYPE; _j_int x; _j_int y; Point *Point_init(_j_int x, _j_int y); virtual _j_double distance(_j_int x, _j_int y); virtual _j_double distance(Point *p);};#endif

Point.C:

/* **************************************************** *//* *//* C++ file Point.C created from Point.java *//* *//* **************************************************** */

#include "JavaSupport.h"

#include "Point.H"

/* Import #include's */#include "java/lang/Object.H"#include "java/lang/Math.H"#include "java/lang/Class.H"#include "java/lang/String.H"

java_lang_Class *Point::TYPE;

void Point::_j_class_initialization(_j_int level){ static _j_int _j_class_init_level = 0; while (_j_class_init_level < level) { _j_class_init_level += 1;

/* Initialize super class */ java_lang_Object::_j_class_initialization(_j_class_init_level);



/* Initialize classes used in class initialization. */ java_lang_Class::_j_class_initialization(_j_class_init_level); java_lang_String::_j_class_initialization(_j_class_init_level);

if (_j_class_init_level == 1) { TYPE = java_lang_Class::forName(_j_toString("Point")); } else { /* Initialize classes used in this class. */ java_lang_Math::_j_class_initialization(_j_class_init_level); } }}

void Point::_j_instance_initialization(){ this->x = 0; this->y = 0;}

Point *Point::Point_init(_j_int x, _j_int y){ java_lang_Object::java_lang_Object_init(); this->_j_instance_initialization(); this->x = x; this->y = y; return(this);}

_j_double Point::distance(_j_int x, _j_int y){ _j_int dx = this->x - x; _j_int dy = this->y - y; return(java_lang_Math::sqrt((_j_double )(dx * dx + dy * dy)));}

_j_double Point::distance(Point *p){ return(this->distance(p->x, p->y));}

Runtime SupportTo test the translation, we chose the _210_si test that was considered during the preliminary definition of theSPECjvm98 benchmark [8]. This test is a "small interpreter" written in Java which reads a small program tocompute an approximation of pi and a mortgage amortization table. The translation and compilation of_210_si went without error, converting 3 java files (with 6 classes) into 6 *.G files , 6 *.H files, and 6 *.Cfiles. We added a small main program to call the class initialization method, and then to convert the normalargc, argv parameters to a Java array of strings, and call the main method of class Si.



The _210_si test uses other SPECjvm98 support routines, increasing the test from 6 classes to 22 classes.

Once these classes were translated from Java to C++, and compiled without error, we linked them together tocreate an executable. This link step produced a list of 394 undefined symbols. These 394 undefined symbolswere easily recognized as being from the standard Java library. References were made to 35 different classes,including java_lang_Object, java_lang_String, java_lang_Math, and so on.

These link errors made very clear that a Java application consists not only of the code for the applicationitself, but also assumes the existence of a large supporting library. It is reasonable, in fact, to argue about thebest way to convert Java to C++ with respect to library support. For example, the Java statement:

System.out.println("Principal: " + Principal_Amount.CommaFormat());

will be translated by our converter to

java_lang_System::out->println( ((new java_lang_StringBuffer())->java_lang_StringBuffer_init()) ->append(_j_toString("Principal: ")) ->append(this->Principal_Amount->CommaFormat()) ->toString());

while a C++ programmer would probably expect it to be translated to:

cout << "Principal: " << Principal_Amount.CommaFormat() << endl;

Converting a Java program to C++ thus raises two possible approaches:

1. The calls to the Java library can be converted to calls to an equivalent C++ library, or

2. The calls to the Java library can be left as they are and a C++ implementation of the Java library can beprovided.

The Java library consists of about 1500 different Java classes. To do (1) means we must find (or write)semantically equivalent standard C++ library classes and methods for all the methods of these classes. For theproblem at hand, just running _210_si, we need to find the C++ library calls for the 394 undefined symbolsthat are used in _210_si.

On the other hand, most of the classes and methods for the Java library are written in Java, and with our Javato C++ translator, we can easily convert the Java library source code into C++ source code to provide thesupport needed for (2). Thus, we have decided to use approach (2) to provide the library support needed as aresult of converting a Java program into a C++ program.

The original 22 SPECjvm98 classes for _210_si referenced 35 classes from the Java library. These 35classes, in turn, referenced other Java Library classes, and so on, until closure is reached with 221 classesfrom the Java Library. (One can look at this increase from 22 classes to 22 classes plus 221 supporting libraryclasses as either a triumph of code reuse or an example of code bloat.)

Native Methods



With the mechanical translation of the Java Library from Java to C++, we have almost everything that isneeded to run our test case. We are still missing native methods. Java allows methods to be implemented bycode written in a language other that Java (typically C). These are called native methods. Native methods arerequired to provide the interface from Java to the underlying operating system or platform. There are 409native methods in 54 classes of the Java Library. Providing C++ code for these 409 native methods willprovide a complete support library for converting Java programs to C++.

Writing these 409 native methods is essentially a port of Java to a C++ platform. The implementation of themwill depend, in part, on the target C++ system. For example, the support of threads will depend upon theunderlying operating system more than on the fact that it is written in C or C++.

For the specific _210_si test of Spec Java, only 110 native methods are referenced and only 42 of these areactually called during execution. Implementing these 42 native methods allows the C++ version of _210_sito be executed. As expected, the printed outputs of _210_si in Java and in C++ are identical; the twoimplementations, in Java and in C++, are semantically equivalent.

Garbage CollectionOne additional aspect of supporting the C++ translation of Java programs is the problem of garbagecollection. Wilson [9] has written about the difficulties of providing garbage collection for C++. For thisenvironment, however, all objects inherit from java_lang_Object. By adding a variant of the Boehm-Demers-Weiser conservative garbage collector [10] as a superclass of java_lang_Object, garbage collectioncan be added to the C++ code. This reduced the memory needed to run _210_si from about 110 M-bytes ofmalloced memory to about 500 K-bytes of garbage collected memory.

Runtime ComparisonsAfter converting the _210_si SPECjvm98 program from Java to C++, we converted and compared a largersubset of the preliminary SPECjvm98 programs. The following times are the result of running Java 1.1.0 on aRISC System/6000, model 560F running AIX 4.1.2 with version 3.6 of xlC as the C++ compiler. The specifictimes are not of great interest, since they will vary from system to system with processors, compilers, compileoptions, and runtime systems.

Running the _210_si test in Java took 101 seconds under Java 1.1.0. Moving to Java 1.1.4 with a JIT reducedthe time to 68 seconds. Converting the test to C++ and compiling -g reduced the time to 57 seconds.Compiling -O reduced the time to 41 seconds.

Examining the memory usage of the test, we find that 110,231,136 bytes are being dynamically allocated. Inparticular, 190,339 of these allocations are one particular array of 512 bytes which is allocated inside aparsing method of the test, which is called 190,339 times. This particular array is allocated every time theparsing method is called, used strictly locally as a temporary buffer. The reference to it is lost when theparsing method exits.

Java has no local objects or arrays -- all memory is allocated from the heap and garbage collected. C++, onthe other hand, supports both heap-based memory management and automatic stack-based memory



management. If we change the C++ version to allocate this array, whose use is strictly local to this method,on the stack as a local automatic variable, we reduce the total heap memory used to approximately 8 M-bytes(instead of 110 M-bytes), and execution time (with -O) drops to 24 seconds. This Java program spends a lotof its time allocating memory which is then immediately thrown away and later collected for reuse.

Use of advanced data flow techniques might be able to detect that Java objects are effectively local and putthem on the stack rather than on the heap.

Another performance problem is caused by the use of virtual base classes for the java_lang_Object. Goingback and changing the translator to not use virtual base classes requires that interfaces not inherit fromjava_lang_Object. This requires that objects declared to be an interface type are explicitly cast to anjava_lang_Object when we execute methods from the java_lang_Object class. Thus java_lang_Object isno longer a virtual base class. Since interfaces may have multiple inheritance, it is still necessary, at times, touse virtual base classes when the same interface is inherited in multiple ways. Removing the virtual baseclass for java_lang_Object provides some speed-up. For the best case, the C++ code dropped from 223seconds to 86 seconds.

Looking at another test case, a profile of the C++ code shows that array references use the bulk of the time.Java arrays, when translated to C++, are implemented as a template class and the array subscript operator isnot being in-lined. Thus, each array reference in C++ is a method call.

Replacing the use of the template array subscript operator with special macros to in-line the array referencesin one file drops the time from 334 seconds to 118 seconds. Replacing the array macros with real C++ arrays(with no array bounds checking) reduces the time to 42 seconds, compared to the Java time of 54 seconds.

Another test case has a different problem. Profiling shows that the most time-consuming activity for it was adynamic cast. There are some 12,797,913 dynamic casts in total. One of these, inside a loop in an importantmethod is unnecessary -- it casts a return value of type Constraint to the same type.

public Constraint constraintAt(int index) ...

public void execute(){for (int i = 0; i < size(); ++i){ Constraint c = (Constraint) constraintAt(i); c.execute();}

This is translated to C++ as

void spec_benchmarks__214_deltablue_Plan::execute(){ for (_j_int i = 0; i < this->size(); i++) { spec_benchmarks__214_deltablue_Constraint *c = _j_dynamic_cast(spec_benchmarks__214_deltablue_Constraint *, (this->constraintAt(i))); c->execute(); }}



Removing this dynamic cast drops the total number of dynamic casts to 7,390,713, and drops the executiontime from the C++ version from 42 seconds to 26 seconds. This compares with the Java time of 21 seconds.The Java time does not drop with the removal of the unneeded cast; the Java compiler has already removedthe unneeded cast from the Java execution.

All of these changes suggest the difficulty of defining an equivalent C++ and Java program. Starting from aJava program, it is necessary to support the complete semantics of Java in a translated C++ program. Thus,converting an array reference in Java to an array reference in C++ requires that a bounds check be made andan exception thrown if the array reference is out of bounds. Since the array index may be either self-modifying (such as i++), or dynamically evaluated (such as a call to a method), the C++ program mustcapture the value of the index and evaluate it only once. This can be done by a method or operator call, whichpasses the value as a parameter, or, in a macro, by storing the value in a temporary location. However,temporary values cannot be created on-the-fly within C++ macros, nor can exceptions be thrown within anexpression (since throwing an exception is a statement and not an expression, thus you have to make aprocedure call to throw an exception from within an expression). These constraints tend to prevent a C++compiler from in-lining the array reference.

A further problem in optimizing array references is the need, in some cases, to dynamically check the type ofan assigned value. Java programs commonly pass an array to a method which types it only as an array ofObject. Assignment to this array requires a run-time check that the type of the assigned value is compatiblewith the actual run-time type of the array parameter.

In addition, in many cases, a global review of the use of the array can show that the array bounds checking isunnecessary -- the array is allocated with a size n, and all indexing is clearly in the range 0 to n-1. This levelof optimization is possible for a Java compiler or JIT compiler which knows that the purpose is to safelyreference an array element, but is unlikely for the equivalent C++ code where the various checks andexceptions have had to be made explicit, and the "intent" of the code is not as obvious to the compiler.

It is be possible to do these optimizations when converting from Java to C++, but this raises the issue oftranslating from Java to C++ versus compiling from Java into C++. We have chosen to constrain ourtranslation to a relatively simple source level translation from Java to C++. This exposes the ability to createa more optimized version of the C++ program by data and control flow analysis and other compilationoptimizations, the same sorts of optimizations that are needed from a JIT compiler.

SummaryBy analyzing the Java programming language, we have been able to develop C++ programming constructswhich are equivalent to those in Java. From this research, we then developed a translator which automaticallyconverts Java source files into equivalent C++ source files. This shows that Java programs can be convertedinto equivalent C++ programs; Java is programmatically a subset of C++.

By providing suitable runtime support, we can then run both Java programs and equivalent C++ versions ofthese same programs, allowing us to investigate the performance differences between the twoimplementations of the same programs, with the aim of improving Java performance to the level of C++.

Our early results suggest that some aspects of Java programs (such as array references and casting) can be



very expensive in performance, suggesting that these are places to apply optimizations to improveperformance.

References1. Patrick Naughton, The Java Handbook, Osborne McGraw-Hill, 1996, 424 pages.2. James Gosling, Bill Joy, and Guy Steele, The Java Language Specification, Addison-Wesley, 1996, 825

pages.3. Bjarne Stroustrup, The C++ Programming Language, Third Edition, Addison-Wesley, 1997, 910

pages.4. The Java Oasis Team, Java to C Compilers, http://www.oasis.leo.org/java/development/java-

to-c/00-index.html, June, 1998.5. The Sumatra Project, Toba: A Java-to-C Translator, http://www.cs.arizona.edu/sumatra/toba/,

April, 1998.6. Yukio Andoh, j2c/CafeBabe java .class to C translator,

http://www.webcity.co.jp/info/andoh/java/j2c.html, Oct, 1996.7. Nik Shaylor, JCC - A Java to C converter,

http://www.geocities.com/CapeCanaveral/Hangar/4040/jcc.html, May, 1997.8. SPEC JVM98 Benchmarks, http://www.spec.org/osg/jvm98/, August, 1998.9. Paul R. Wilson. Uniprocessor Garbage Collection Techniques,

ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps,10. Hans-J. Boehm, A garbage collector for C and C++,

http://reality.sgi.com/employees/boehm_mti/gc.html, April, 1998.

Date post:	06-Apr-2023
Category:	Documents
Upload:	utexas
View:	0 times
Download:	0 times

Translating Java Programs into C++

Documents