JEP 348: Compiler Intrinsics for Java SE APIs
Author | Brian Goetz |
Owner | Vicente Arturo Romero Zaldivar |
Type | Feature |
Scope | SE |
Status | Candidate |
Component | tools |
Discussion | amber dash dev at openjdk dot java dot net |
Reviewed by | Alex Buckley, Brian Goetz, Vicente Arturo Romero Zaldivar |
Endorsed by | Alex Buckley |
Created | 2018/06/25 21:23 |
Updated | 2023/02/10 08:36 |
Issue | 8205637 |
Summary
Enable Java compilers to use novel code generation strategies (intrinsification) in order to improve the performance of certain Java SE methods.
Motivation
In modern JVM implementations, Just-In-Time (JIT) compilers do an excellent job of optimizing bytecode at run time. A considerable amount of bytecode is "clerical" in nature -- shuffling data from the stack to the heap and back again -- and can be optimized with techniques such as box elimination and method inlining. However, there are limits to the analysis that a JIT compiler can perform in a reasonable time and space, so it might miss some opportunities for optimization. Unfortunately, the way that method invocations in source code are compiled to bytecode tends to increase the chances of a miss.
For example, consider an invocation of the method String::format
(API. The first argument is a format string such as %s %d
, followed by varargs of any type. A Java compiler generates bytecode that boxes primitive varargs, creates an array, initializes it, and invokes the method; the bytecode of the method's body reverses these steps to obtain values to interpolate according to the format string. Unfortunately, the method's body is too large to inline, so the JIT compiler cannot eliminate the boxing-and-unboxing of primitive varargs, nor the shuffling of varargs into an array and out again. Even more unfortunately, the format string is usually a constant expression, so without inlining it will be parsed every time the method's body runs.
String::format
is important because it is a concise and reliable way to implement toString
. However, some developers shy away from using it purely out of performance considerations, and instead use more verbose and error-prone mechanisms. By optimizing the invocation of String::format
, the most readable and maintainable way to implement toString
also becomes the most performant way.
JEP 280 replaced the translation of string concatenation with invokedynamic
, resulting in faster bytecode, less allocation churn, and more uniform optimizability. We can apply the same technique to String::format
(and closely related methods such as java.util.Formatter::format
) by compiling the invocation using an alternate translation strategy that customizes the bytecode for each specific invocation based on information available at compile time, such as the static types and values of the actual arguments.
Goals
Enable JDK developers to (i) tag methods as candidates for intrinsification by a Java compiler, and (ii) for those candidate methods, implement alternate translations of invocations that result in behavior which conforms to the specification of the method.
Non-Goals
It is not a goal to allow intrinsification of methods declared outside the core Java SE modules.
Description
Traditionally, a Java compiler translates a method invocation in source code to one of the bytecodes invokevirtual
, invokeinterface
, invokespecial
, or invokestatic
. This JEP allows the compiler to use an alternate translation when certain designated methods of the Java SE API are invoked. The use of an alternate translation is called intrinsification; the invocation is said to be intrinsified.
For the compiler to intrinsify a specific invocation of a given method, all of the following have to happen:
- The method opts in to intrinsification at its declaration site, as part of its specification;
- The compiler identifies this invocation as intrinsifiable;
- The compiler knows of an intrinsic processor for the method;
- The intrinsic processor indicates an alternate translation strategy; and
- The compiler generates the bytecode corresponding to the indicated strategy.
Opting in to intrinsification
For a method of the Java SE API to opt in to intrinsification, it must be designated as an intrinsic candidate, via the annotation @IntrinsicCandidate
. A compiler can thus recognize an invocation of such a method as intrinsifiable, and may (but is not required to) delegate the translation decision to an intrinsic processor.
The space of methods that can opt in to intrinsification is restricted, out of an abundance of concern for the broad impact of generating novel bytecode. Only a method exported by the java.base
module may be designated as an intrinsic candidate, and only if it is either (i) an instance method in a final
class, or (ii) a static
method, so that the compiler can be sure of its behavior. Designating any other method as an intrinsic candidate will be ignored.
(It might seem that a final
instance method in a non-final
class is suitable, but the body of such a method may invoke non-final
instance methods in the same class; those methods may be overridden at runtime, so the behavior of the final
instance method is not sufficiently predictable for intrinsification. Even less predictable is the behavior of a non-final
method in a non-final
class, which is why java.io.PrintStream::format
is not mentioned in this JEP despite its clear similarities with String::format
.)
The annotation type IntrinsicCandidate
is part of the Java SE API, and is meta-annotated with @Documented
to flag the significance of applying the annotation.
Intrinsic processors
A Java compiler may provide a mechanism for the discovery of intrinsic processors. An intrinsic processor specifies which method or methods it is able to process; if no intrinsic processor for a given method is known to the compiler, then invocations of that method are not intrinsified. For predictability, all intrinsic processors are disabled by default, and may be enabled with the javac
command-line option -XDintrinsify=all
. If no alternate translation is indicated to the compiler by an intrinsic processor, or if the compiler decides to ignore such an indication, then it must generate bytecode according to JLS 15.12.3.
Generation of alternate bytecode
An intrinsic processor may indicate an alternate translation for a specific invocation of a given method, e.g., replace with invokedynamic
using a given bootstrap, replace with another method call, replace with a constant load, etc. The compiler may then generate precise bytecode for that translation, rather than the traditional bytecode.
Example
Let's analyze the benefits of intrinsifying String::format
to avoid the boxing overhead, varargs overhead, and the repeated analysis of constant format specifiers (the first argument). Consider the following invocation:
String name = ...
int age = ...
String s = String.format("%s: %d", name, age);
Traditionally, this results in boxing age
to an Integer
, allocating a varargs array, storing name
and the boxed age
into the varargs array, and then parsing and interpreting the format string -- on every invocation. The bytecode is lengthy:
0: ldc #2 // String John
2: astore_1
3: bipush 30
5: istore_2
6: ldc #3 // String %s: %d
8: iconst_2
9: anewarray #4 // class java/lang/Object
12: dup
13: iconst_0
14: aload_1
15: aastore
16: dup
17: iconst_1
18: iload_2
19: invokestatic #5 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
22: aastore
23: invokestatic #6 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
26: astore_3
27: return
When the format specifier is constant, which it almost always is, an intrinsic processor can select an alternate translation: (note that neither name
nor age
need to be constant variables)
String s = name + ": " + Integer.toString(age);
Given this translation, the compiler can optimize it to an invokedynamic
using the mechanics of JEP 280, resulting in the following bytecode:
0: ldc #2 // String John
2: astore_1
3: bipush 30
5: istore_2
6: aload_1
7: iload_2
8: invokedynamic #3, 0 // InvokeDynamic #0:format:(Ljava/lang/String;I)Ljava/lang/String;
13: astore_3
14: return
As well as the evident simplification, this bytecode runs between 30 and 50 times faster than traditional bytecode.
Risks and Assumptions
If not properly implemented, the alternate translation may not be perfectly behaviorally compatible with the specification or original implementation.
Even if properly implemented, an alternate implementation may not properly track changes made to the original implementation in the future.
Even if properly implemented and tracked, the maintenance of intrinsic candidate methods and their alternate translations is made more difficult, since changes may need to be made in two places and must be behaviorally identical.
There is no guarantee that the performance of an alternate implementation will be superior, for every execution of every program on every machine, to the performance that would have been achieved by the original implementation.
(As an example of the difficulties of predicting performance, consider the Objects::hash
method. An earlier version of this JEP praised Objects::hash
for similar reasons to String::format
, namely that Objects::hash
is a concise and reliable way to implement hashCode
. Objects::hash
has a similar signature) to String::format
, so the bytecode generated for its invocation has the same performance problems as for String::format
. However, the semantics of hashing and string formatting are quite different, and experiments showed the performance gains from intrinsifying Objects::hash
to be far less than the gains from intrinsifying String::format
. The gains were also far more sensitive to the number and values of actual arguments. Consequently, the efforts to intrinsify Objects::hash
were discontinued.)