JEP 196: Nashorn Optimistic Typing
Authors | Marcus Lagergren, Attila Szegedi |
Owner | Marcus Lagergren |
Type | Feature |
Scope | JDK |
Status | Closed / Delivered |
Release | 8u40 |
Component | core-libs / jdk.nashorn |
Discussion | nashorn dash dev at openjdk dot java dot net |
Effort | L |
Duration | L |
Reviewed by | Brian Goetz |
Endorsed by | Brian Goetz |
Created | 2014/05/12 14:53 |
Updated | 2014/12/05 22:14 |
Issue | 8042946 |
Summary
Improve Nashorn performance by making assumptions about specific types used in arithmetic and array indexing operations, by being prepared to recover when those assumptions prove incorrect, and by improving the HotSpot JVM's ability to optimize non-Java bytecode.
Goals
-
Ensure, by guessing types, that bytecode generated by Nashorn contains as many primitive unboxed operations as possible and that runtime libraries in Nashorn use that fact.
-
Ensure, by guessing types, that bytecode generated by Nashorn uses as much Java integer arithmetic as possible.
-
The resulting performance must be stable. We cannot degrade over time or measure peak performance only. This would block Nashorn as an industrial strength product.
-
Ensure that bytecode generated by Nashorn can revert optimistic assumptions of the above type with an continuation passing/exception throwing mechanism. This is necessary as you can't just switch out an integer instruction to, e.g., a double instruction in bytecode without causing verify errors and having to modify a method on the fly.
-
Ensure that the built in library functions in Nashorn are optimal Java and use primitive types well.
-
Ensure that the JVM does a much better job of optimizing "alien" (non-Java) bytecode for example from Nashorn or JRuby, which has similar problems.
Non-goals
This is not an effort that can be targeted by "we must be X% of the performance of a native JavaScript runtime like v8" as the areas of optimization are many an varying. The initial phase of the implementation will be considered successful if it adds no more than acceptable overhead to warmup and speeds up common JavaScript benchmarks orders of magnitude, as the technology promises, and causes no regressions on benchmarks targeting unexplored performance areas.
Motivation
The Nashorn JavaScript runtime needs to cooperate with the JVM to produce more highly performant code. Native JavaScript runtimes typically outperform Nashorn on many tasks, such as pure number crunching. We need to do our best to make our on-top-of-the-JVM bytecode based solution approach that performance to as large an extent as possible. This effort should take Nashorn performance closer to (but probably not past, in its first iteration) native JavaScript runtime performance.
Using only Object operations ("a"-prefixed-bytecodes) and invokedynamic produces bytecode that runs too slowly on HotSpot, even though it conservatively implements JavaScript correctly. We are not sure that a world containing only boxed objects and invokedynamics will ever be fast, but there needs to be some exploration of this in the JIT as well.
We need to attack this from two ways - generating bytecode that
contains as much primitives as possible and being able to revert code
when an assumption about primitive types fails. HotSpot also needs to
get better at optimizing non-Java bytecode, mainly constructs around
invokedynamic and in java.lang.invoke
.
Description
We propose the following solution, a two-fold one, for Nashorn and the JVM respectively. Note that this is to some part speculation of what is feasible to do in the bytecode space and what the JVM will be able to optimize. Thus, there is still some research scope, both for Nashorn and the JVM in this proposal.
For Nashorn
We know that hardcoding explicit types at compile time gives us a significant performance boost of the resulting bytecode. An interesting thesis project would be to code up a TypeScript frontend for Nashorn to show this concept. This is something that should be done anyway in the interest of dynamic language implementations on the JVM.
Creating optimistic methods
We need to generate optimistic methods based on callsite types known at runtime (code for this is in place already but disabled). That is, if a method is called with integer parameters, we can generate a specialized version for that method.
Gradual deoptimization
We also need to generate intra-method code that is optimistic, i.e., use
integer additions even when the compiler can't statically prove that we
are dealing with int
s, assuming that in fact we are. If we are wrong we
need to be able to generate a new version of the code that no longer
holds the wrong assumption (and is thus somewhat deoptimized compared to
the previous one). We also need to interrupt the execution of the wrong
code and continue execution from that point in the new code. For that,
we need a continuation-based on-stack-code-replacement mechanism
implemented purely using existing JVM capabilities (bytecode and method
handles).
For a detailed explanation of how optimistic types would work, please refer to Lagergren/JVMLS 2013
Rationale
With JavaScript, it is more often than not impossible to precisely infer the data type of many expressions statically. Such expressions can conservatively be treated as Objects, but such treatment severely reduces the speed of execution. By starting out with the computationally fastest types and doing just-in-time gradual deoptimization to slower types, we end up with the fastest code shape that can operate on the given input.
Primitive field representation
Field representation in Nashorn (objects in the scope are represented as fields in Java classes) needs to be modified to be something else than just "Object". We have experimented with a pair of long/Object fields, using the long for all primitive types, and the Objects for non primitives. Microbenchmarks have shown that this gives us significant performance improvements in statically provable primitive types, but in the general case, again we have too little information for speedup. This will most likely be much, much better if we mate it with the optimistic approach, and early experiments have confirmed this.
We should also, in the interest of memory overhead, investigate the
earlier POC for sun.misc.TaggedArray
that is a "both reference and
primitive, either / or" shortcut to get around the Object constraints.
Warmup is slow
For warmup we might need to do lazy code generation, which is already implemented but not enabled. For some projects, like "npm" in the node.js to Java port, we have shown that lazy code generation indeed gives us significant startup performance. The JVM needs to do a lot of warmup work too, though: see below.
Classic IR optimizations
We can do some classic code optimizations in the byte code, because we know more than the VM. For example, in the loop
var sum = 0;
for (i = 0; i < 10; x++) {
sum += y;
}
y
might be a method with side effects and valueOf
overridden, but if
we check that before the loop we can turn the code into
var sum = 0;
var ytmp = y; //which will throw UnwarrantedOptimismException i y is not a nice primitive
for (i = 0; i < 10; x++) {
sum += ytmp;
}
Use def analysis
We also need to superimpose a CFG on top of the AST to be able to do better use/def chains, which would enable us to get rid of assumptions like
if (x) {
z = y & f;
operation_on_z(z)
} else {
z = "string";
}
so that we can use z
as an int
in the true case statically, which we
can, but right now, we conservatively assume z
to be an Object
due to
the else case throughout the entire method.
See Söderberg et al, who describe a similar methodology
For the JVM
Intrinsics for exact math
The various java.lang.Math
operations for exact arithmetic (which
throw ArithmethicException
on overflow) must be intrinsified. Every
optimistic JavaScript addition will have to check for overflow using
this mechanism, and unless it compiles to just an arithmetic operation
and a jump-on-overflow instruction, the overhead will be too great.
This is needed to preserve JavaScript semantics.
Invoke
The MethodHandle.invoke
(not invokeExact
) case (used to implement,
e.g., apply
) needs to be faster. According to John Rose, no
optimization has taken place there so far, but it is very common that
boxing gets in the way.
The JRockit implementation of invokedynamic could turn the test method from the example:
public class Test {
private final static MethodHandle CALC = MethodHandles.publicLookup().findStatic(Test.class, "calc", int.class, int.class, Object.class);
static int test() throws Throwable {
MethodHandle mh = CALC;
Object aString = "A";
int a = mh.invoke(1, aString);
int b = mh.invoke(2, "B");
Integer c = mh.invoke((Integer)3, 3);
return a+b+c;
}
static int calc(int x, Object o) {
return x + o.hashCode();
}
}
into a simple return 140
. For HotSpot to do the same, an improvement
at least to the generic invoke case is required.
Source: Fredrik Öhrström's talk from JavaOne 2010
Source: Fredrik Öhrström's blog
java.lang.invoke
and LambdaForms
The LambdaForms implementation needs to be improved. They contribute to a lot of bytecode generation which makes warmup an issue (even more than just having a bootstrap method per invokedynamic, which is unavoidable). This can probably be alleviated with lambda form caching. JIT profile information must not be polluted by this, though.
Lambda form interpretation overhead is another issue that we frequently bump into. Given that Nashorn reaches a steady state, there should be no more MethodHandles created and no more lambda forms interpreted.
Cached lambda forms should be retrieved as their compiled versions upon trying to reinterpret them so that they never are interpreted again if they have once been compiled.
Furthermore, we have to ensure that all combinators in
java.lang.invoke
have a fast path that avoids boxing or apply-like
semantics. Currently, the catchException combinator is an example of
something we commonly use that needs that required optimization. Other
examples probably exist, especially in the cases that take many
parameters.
Better type analysis is needed
Early experiments show that me miss optimization opportunities to inline virtual dispatch if type analysis is too conservative. As JavaScript uses plenty of type guards to due to its dynamic nature, this must be addressed.
What we need to do in the JVM has been less explored, and we will require at least one full time compiler team resource to help us here.
Testing
For semantic correctness the standard Nashorn test suites will do.
For performance verification existing benchmarks will do, and should be augmented with a microbenchmark suite that tests invalidations of optimistic assumptions.
Any torture tests that are run for a long time by SQE should still run to ensure that no reasource leakage is introduced.
Risks and Assumptions
The main risk is that the JIT still won't be able to generate efficient code from our optimistic type information.
A risk is that bytecode might prove to be a too narrow representation for efficient implementations of dynamic languages. We might need to be a lot more explicit with our generated code, possibly inventing a mechanism to communicate more information to HotSpot's JITs than can be done with the current bytecode format. An existing example of a system that can talk more directly with the JIT is the Truffle project at Oracle Labs that tells the Graal compiler through annotations in the Java based interpreter which assumptions to make. Experiments have shown that this indeed can produce very good peak performance.
In the long run LambdaForms may have to be totally rewritten and replaced with something else. The JRockit JVM basically just generated IR for a callsite and fed it back to the JIT compiler, which enabled all its optimizations like constant propagation and unboxing removal to run transparently on the callsite. This is not as simple in C2, due to its various issues with graph splicing, platform dependencies and global dependencies.
Finally, as code generation becomes optimistic, our warmup time will increase.
Dependences
-
The Hotspot JITs.
-
HotSpot code generation improvements.
-
Efficient MethodHandles implementation.
Impact
There are no compatibility problems, as far as we can tell. Given that footprint and warmup are kept down according to the above, this should be a drop in replacement for current Nashorn versions.