JEP 196: Nashorn Optimistic Typing

AuthorsMarcus Lagergren, Attila Szegedi
OwnerMarcus Lagergren
TypeFeature
ScopeJDK
StatusClosed / Delivered
Release8u40
Componentcore-libs / jdk.nashorn
Discussionnashorn dash dev at openjdk dot java dot net
EffortL
DurationL
Reviewed byBrian Goetz
Endorsed byBrian Goetz
Created2014/05/12 14:53
Updated2014/12/05 22:14
Issue8042946

Summary

Improve Nashorn performance by making assumptions about specific types used in arithmetic and array indexing operations, by being prepared to recover when those assumptions prove incorrect, and by improving the HotSpot JVM's ability to optimize non-Java bytecode.

Goals

Non-goals

This is not an effort that can be targeted by "we must be X% of the performance of a native JavaScript runtime like v8" as the areas of optimization are many an varying. The initial phase of the implementation will be considered successful if it adds no more than acceptable overhead to warmup and speeds up common JavaScript benchmarks orders of magnitude, as the technology promises, and causes no regressions on benchmarks targeting unexplored performance areas.

Motivation

The Nashorn JavaScript runtime needs to cooperate with the JVM to produce more highly performant code. Native JavaScript runtimes typically outperform Nashorn on many tasks, such as pure number crunching. We need to do our best to make our on-top-of-the-JVM bytecode based solution approach that performance to as large an extent as possible. This effort should take Nashorn performance closer to (but probably not past, in its first iteration) native JavaScript runtime performance.

Using only Object operations ("a"-prefixed-bytecodes) and invokedynamic produces bytecode that runs too slowly on HotSpot, even though it conservatively implements JavaScript correctly. We are not sure that a world containing only boxed objects and invokedynamics will ever be fast, but there needs to be some exploration of this in the JIT as well.

We need to attack this from two ways - generating bytecode that contains as much primitives as possible and being able to revert code when an assumption about primitive types fails. HotSpot also needs to get better at optimizing non-Java bytecode, mainly constructs around invokedynamic and in java.lang.invoke.

Description

We propose the following solution, a two-fold one, for Nashorn and the JVM respectively. Note that this is to some part speculation of what is feasible to do in the bytecode space and what the JVM will be able to optimize. Thus, there is still some research scope, both for Nashorn and the JVM in this proposal.

For Nashorn

We know that hardcoding explicit types at compile time gives us a significant performance boost of the resulting bytecode. An interesting thesis project would be to code up a TypeScript frontend for Nashorn to show this concept. This is something that should be done anyway in the interest of dynamic language implementations on the JVM.

Creating optimistic methods

We need to generate optimistic methods based on callsite types known at runtime (code for this is in place already but disabled). That is, if a method is called with integer parameters, we can generate a specialized version for that method.

Gradual deoptimization

We also need to generate intra-method code that is optimistic, i.e., use integer additions even when the compiler can't statically prove that we are dealing with ints, assuming that in fact we are. If we are wrong we need to be able to generate a new version of the code that no longer holds the wrong assumption (and is thus somewhat deoptimized compared to the previous one). We also need to interrupt the execution of the wrong code and continue execution from that point in the new code. For that, we need a continuation-based on-stack-code-replacement mechanism implemented purely using existing JVM capabilities (bytecode and method handles).

For a detailed explanation of how optimistic types would work, please refer to Lagergren/JVMLS 2013

Rationale

With JavaScript, it is more often than not impossible to precisely infer the data type of many expressions statically. Such expressions can conservatively be treated as Objects, but such treatment severely reduces the speed of execution. By starting out with the computationally fastest types and doing just-in-time gradual deoptimization to slower types, we end up with the fastest code shape that can operate on the given input.

Primitive field representation

Field representation in Nashorn (objects in the scope are represented as fields in Java classes) needs to be modified to be something else than just "Object". We have experimented with a pair of long/Object fields, using the long for all primitive types, and the Objects for non primitives. Microbenchmarks have shown that this gives us significant performance improvements in statically provable primitive types, but in the general case, again we have too little information for speedup. This will most likely be much, much better if we mate it with the optimistic approach, and early experiments have confirmed this.

We should also, in the interest of memory overhead, investigate the earlier POC for sun.misc.TaggedArray that is a "both reference and primitive, either / or" shortcut to get around the Object constraints.

Warmup is slow

For warmup we might need to do lazy code generation, which is already implemented but not enabled. For some projects, like "npm" in the node.js to Java port, we have shown that lazy code generation indeed gives us significant startup performance. The JVM needs to do a lot of warmup work too, though: see below.

Classic IR optimizations

We can do some classic code optimizations in the byte code, because we know more than the VM. For example, in the loop

var sum = 0;
for (i = 0; i < 10; x++) {
    sum += y;
}

y might be a method with side effects and valueOf overridden, but if we check that before the loop we can turn the code into

var sum = 0;
 var ytmp = y; //which will throw UnwarrantedOptimismException i y is not a nice primitive
 for (i = 0; i < 10; x++) {
     sum += ytmp;
 }

Use def analysis

We also need to superimpose a CFG on top of the AST to be able to do better use/def chains, which would enable us to get rid of assumptions like

if (x) {
   z = y & f;
   operation_on_z(z)
} else {
   z = "string";
}

so that we can use z as an int in the true case statically, which we can, but right now, we conservatively assume z to be an Object due to the else case throughout the entire method.

See Söderberg et al, who describe a similar methodology

For the JVM

Intrinsics for exact math

The various java.lang.Math operations for exact arithmetic (which throw ArithmethicException on overflow) must be intrinsified. Every optimistic JavaScript addition will have to check for overflow using this mechanism, and unless it compiles to just an arithmetic operation and a jump-on-overflow instruction, the overhead will be too great.

This is needed to preserve JavaScript semantics.

Invoke

The MethodHandle.invoke (not invokeExact) case (used to implement, e.g., apply) needs to be faster. According to John Rose, no optimization has taken place there so far, but it is very common that boxing gets in the way.

The JRockit implementation of invokedynamic could turn the test method from the example:

public class Test {
      private final static MethodHandle CALC = MethodHandles.publicLookup().findStatic(Test.class, "calc", int.class, int.class, Object.class);

      static int test() throws Throwable {
          MethodHandle mh = CALC;
          Object aString = "A";
          int a = mh.invoke(1, aString);
          int b = mh.invoke(2, "B");
          Integer c = mh.invoke((Integer)3, 3);
          return a+b+c;
      }

      static int calc(int x, Object o) {
          return x + o.hashCode();
      }
}

into a simple return 140. For HotSpot to do the same, an improvement at least to the generic invoke case is required.

Source: Fredrik Öhrström's talk from JavaOne 2010

Source: Fredrik Öhrström's blog

java.lang.invoke and LambdaForms

The LambdaForms implementation needs to be improved. They contribute to a lot of bytecode generation which makes warmup an issue (even more than just having a bootstrap method per invokedynamic, which is unavoidable). This can probably be alleviated with lambda form caching. JIT profile information must not be polluted by this, though.

Lambda form interpretation overhead is another issue that we frequently bump into. Given that Nashorn reaches a steady state, there should be no more MethodHandles created and no more lambda forms interpreted.

Cached lambda forms should be retrieved as their compiled versions upon trying to reinterpret them so that they never are interpreted again if they have once been compiled.

Furthermore, we have to ensure that all combinators in java.lang.invoke have a fast path that avoids boxing or apply-like semantics. Currently, the catchException combinator is an example of something we commonly use that needs that required optimization. Other examples probably exist, especially in the cases that take many parameters.

Better type analysis is needed

Early experiments show that me miss optimization opportunities to inline virtual dispatch if type analysis is too conservative. As JavaScript uses plenty of type guards to due to its dynamic nature, this must be addressed.

What we need to do in the JVM has been less explored, and we will require at least one full time compiler team resource to help us here.

Testing

For semantic correctness the standard Nashorn test suites will do.

For performance verification existing benchmarks will do, and should be augmented with a microbenchmark suite that tests invalidations of optimistic assumptions.

Any torture tests that are run for a long time by SQE should still run to ensure that no reasource leakage is introduced.

Risks and Assumptions

The main risk is that the JIT still won't be able to generate efficient code from our optimistic type information.

A risk is that bytecode might prove to be a too narrow representation for efficient implementations of dynamic languages. We might need to be a lot more explicit with our generated code, possibly inventing a mechanism to communicate more information to HotSpot's JITs than can be done with the current bytecode format. An existing example of a system that can talk more directly with the JIT is the Truffle project at Oracle Labs that tells the Graal compiler through annotations in the Java based interpreter which assumptions to make. Experiments have shown that this indeed can produce very good peak performance.

In the long run LambdaForms may have to be totally rewritten and replaced with something else. The JRockit JVM basically just generated IR for a callsite and fed it back to the JIT compiler, which enabled all its optimizations like constant propagation and unboxing removal to run transparently on the callsite. This is not as simple in C2, due to its various issues with graph splicing, platform dependencies and global dependencies.

Finally, as code generation becomes optimistic, our warmup time will increase.

Dependences

Impact

There are no compatibility problems, as far as we can tell. Given that footprint and warmup are kept down according to the above, this should be a drop in replacement for current Nashorn versions.