JEP draft: Isolated Methods

Author	Michael Haupt
Owner	Maurizio Cimadamore
Type	Feature
Scope	JDK
Status	Draft
Component	core-libs / java.lang.invoke
Discussion	mlvm dash dev at openjdk dot java dot net
Effort	L
Duration	XL
Reviewed by	Alex Buckley, Brian Goetz, Jim Laskey, Paul Sandoz, Vladimir Ivanov
Created	2016/06/06 14:00
Updated	2018/04/16 21:06
Issue	8158765

Summary

Extend the MethodHandles.Lookup class of the java.lang.invoke package to support loading method bytecodes without an attached class, and to represent such methods as method handles.

Goals

In the MethodHandles.Lookup class of the java.lang.invoke package, provide a new method loadCode to load a bytecode array plus constants as an isolated method and return a MethodHandle representing that method.
At the level of the JVM, provide an optimised means to store isolated methods.

Non-Goals

A new compilation strategy for lambdas is not in the scope of this JEP.
Extensions at the Java language level are explicitly out of scope.
Extensions to the Java Virtual Machine instruction set (bytecodes) are likewise out of scope.

Success Metrics

Improved performance of method handle infrastructure where it makes use of bytecode generation (also at startup).
Reduced memory footprint of method handle infrastructure (specifically, LambdaForms, BoundMethodHandles, and invokers).
Observable similar effects on dynamic language implementations once they adopt the new API.

Motivation

Both in the JDK core libraries and in language implementations running atop the JVM, it is a common pattern to generate stateless classes with a single static method. These classes are used to represent what is logically a method without a class, or an "isolated method". Generating them is cumbersome as it requires the generation of a full class, and imposes a certain load on the VM in terms of class loading and maintenance.

To enable a more lightweight solution for this scenario, there should be a way of expressing and loading an isolated method directly, and to get hold of it in a form that can be used to invoke it, and to make sure no access violations are carried out using it.

Method handles are usable abstractions for representing code that can be called. Moreover, a means for controlling lookup and access contexts already exists in the form of the MethodHandles.Lookup class. What is missing is a means to load a method in isolation. By adding a single API entry point to the MethodHandles.Lookup class that accepts the representation of an isolated method, such a method can be loaded with the lookup context implied by the Lookup instance at hand.

There are several settings in the JDK core libraries, most notably in the low-level method handles infrastructure, where a new abstraction for isolated methods can be used to reduce code size and memory footprint, and to improve loading performance. Moreover, the Nashorn JavaScript engine can make use of the feature in a similar way, as it generates bytecode from JavaScript sources. Finally, all language implementations that run atop the JVM and generate bytecode may be clients of the isolated method loading capability.

It needs to be noted that the aforementioned scenarios all require access to the internal Unsafe API. Offering a disciplined and secure way of defining custom code in the form of isolated methods will allow for rendering many of these uses of Unsafe unnecessary. Thereby, dependence on internal API, which guest language implementations on the JVM often have, can be reduced.

Description

The MethodHandles.Lookup class of the java.lang.invoke package is to be extended with a method like this:

MethodHandle loadCode(String name, MethodType type, byte[] instructions, Object[] constants)

The name parameter is optional. It denotes the name the isolated method should be identifiable by in stack traces.

The type parameter determines the method's return type and parameter types. The instructions array contains the method's bytecode instructions, as they would occur in a normal class file. A notable difference is that all indices into the class' constant pool that the bytecode would normally contain are now indices into the accompanying constants array. This serves as a method-local constant pool substitute.

The loadCode method creates a method from the passed bytecode instructions and constants and returns a MethodHandle that can be used to call the method. The implementation of loadCode will take care of verification of the code to load.

This method is isolated from any class and behaves largely like a static method. The method handle resulting from a loadCode invocation is of the REF_static kind. It cannot be cracked via MethodHandles.Lookup.revealDirect().

The context for a method defined in this way is determined by the Lookup instance receiving the loadCode call. In case the lookup privileges are not sufficient, an exception will be thrown.

The constants Array

The constants array, meant to contain constants referenced from the bytecode, deserves some attention. First and foremost, it should not be misunderstood as a constant pool. It rather provides a higher level of abstraction over constant pool contents, and adds convenience for clients.

The array of constant pool patches that can be passed to invocations of Unsafe.defineAnonymousClass plays a similar role. For instance, the constant pool patches array allows to pass a String where a CONSTANT_Utf8_info entry is to be patched; in fact, that entry consists of a tag byte, two-byte length, and a character array. Unsafe.defineAnonymousClass supports similar convenience for other constant pool entries too.

For the constants array passed to loadCode, similar convenience should be possible. For instance, where the method instructions reference a Java class, the constants array can contain a Class instance, rather than lower-level structures encountered in constant pools. Likewise, an INVOKEVIRTUAL instruction can reference a constants array entry that itself is a MethodHandle representing the method in question.

The following table lists the different forms of possible constant pool entries and the Java classes that can be used to represent them in the constants array.

CONSTANT_Utf8_info: java.lang.String
CONSTANT_Integer_info: int, java.lang.Integer
CONSTANT_Float_info: float, java.lang.Float
CONSTANT_Long_info: long, java.lang.Long
CONSTANT_Double_info: double, java.lang.Double
CONSTANT_Class_info: java.lang.Class
CONSTANT_String_info: java.lang.String
CONSTANT_Fieldref_info: a java.lang.invoke.DirectMethodHandle of the right kind, obtained via the appropriate API in java.lang.invoke.MethodHandles.Lookup
CONSTANT_Methodref_info: a java.lang.invoke.DirectMethodHandle of the right kind, obtained via the appropriate API in java.lang.invoke.MethodHandles.Lookup
CONSTANT_InterfaceMethodref_info: a java.lang.invoke.DirectMethodHandle of the right kind, obtained via the appropriate API in java.lang.invoke.MethodHandles.Lookup
CONSTANT_NameAndType_info: (should not be required)
CONSTANT_MethodHandle_info: java.lang.invoke.MethodHandle
CONSTANT_MethodType_info: java.lang.invoke.MethodType
CONSTANT_InvokeDynamic_info: either a tuple of (java.lang.invoke.MethodType,java.lang.invoke.MethodHandle), where the MethodType describes the call site's signature, and the MethodHandle represents the bootstrap method with already bound static arguments; or an already initialized java.lang.invoke.CallSite

In addition, the Valhalla project proposes several new constant pool entry types, for which the substitutions in constants arrays can be as follows. Note that the table assumes tuples, which may be introduced with Valhalla, to be existent in the language.

CONSTANT_ArrayType_info: tuple of (byte,java.lang.Class)
CONSTANT_MethodDescriptor_info: array of java.lang.Class (Class instances may have to offer some additional information as Valhalla progresses)
CONSTANT_ParameterizedType_info: tuple of (java.lang.Class,java.lang.Class[])
CONSTANT_TypeVar_info: tuple of (java.lang.String,java.lang.Class)

As a further addition, the new constant pool entry types discussed in the general data in constant pools proposal can be represented as follows.

CONSTANT_Dynamic: tuple of (java.lang.Class,java.lang.invoke.MethodHandle), where the Class represents the expected type, and the MethodHandle describes a bootstrap method with the static parameters already bound
CONSTANT_Group: an array or java.util.List
CONSTANT_Bytes: byte[]

As a note on generic methods, it needs to be pointed out that an isolated method does not have an enclosing class that could define type variables. Instead, all type variables mentioned in the signature of a generic isolated method belong to that method alone.

The constants array can also contain all kinds of objects that can be loaded using an LDC instruction. This can be used to bind certain specific data that are known at compile time.

It will be up to the implementation of loadCode to turn these convenience objects into proper lower-level representations resembling those in a constant pool. The details of this depend on the implementation choices that will be made for the internal representation of isolated methods.

Implementation

The loadCode functionality can be implemented in several stages. Their depth of integration with the present system increases.

Stage 1: Internal Use for LambdaForm and Invoker Generation

The initial version of loadCode should be provided as part of the non-public API for Invokedynamic, e.g., as a non-public method in the MethodHandles.Lookup class, or in the MethodHandleImpl class. There, it can be used to generate LambdaForms and other invokers in the java.lang.invoke implementation. The implementation should not treat isolated methods as such, but wrap the LambdaForm methods in a class as usual.

Stage 2: Optimised Internal Use for LambdaForm and Invoker Generation

The internal stage 1 loadCode implementation can, while the API remains stable, be optimised at the level of HotSpot. At this time, there are two design ideas that can be explored.

Internally, represent each isolated method as a method plus constant pool. The class an isolated method belongs to, to make it fit into the overall expectations of the VM, is a pseudo class that cannot be instantiated. This resembles the way single static method classes are currently built.
Add to HotSpot the notion of a pseudo class (dubbed Gargantuan) that will be the holder of all methods defined through the loadCode interface. This will be an all-static class invisible from the outside (support for getCallerClass notwithstanding).

Gargantuan is a class that is intended to grow as new methods are defined. Methods can be collected when there are no more MethodHandles referencing them. Each method in Gargantuan can have a context different from all other methods, depending on the lookup context at hand in a loadCode invocation. This lookup context is preserved in Gargantuan and associated with the isolated method during its lifetime.

The constants arrays of several isolated methods will very likely contain common constants. The loadCode VM-level implementation will make sure to only add those constants to the Gargantuan constant pool that are not already present, and to patch the bytecode instructions array accordingly. This elision of duplicate constant pool entries can also take place upon garbage collection to facilitate faster loading of isolated methods. Either way, all isolated methods share a common constant pool.

The Gargantuan class can also exist once per module, which will enable efficient collection of constants stored for an isolated method, and possibly collection of other structures, as a module is unloaded.

Stage 3: Public API

Eventually, the loadCode method should be public in MethodHandles.Lookup, to support its more widespread usage. In the meantime, availability via the MLVM repository will allow for applying the loadCode feature to existing language implementations for experimentation.

Usage Examples

The examples below serve to point out possible future shapes of the infrastructure needed to generate the instructions array. All examples describe the generation and loading of a method that has the signature (Ljava/lang/String;)I and retrieves the length of its argument. It consists of these instructions:

ALOAD_0
INVOKEVIRTUAL #0 <String.length()>
IRETURN

The first example adopts the higher level of abstraction over the constant pool, as suggested above:

MethodHandlee stringLength = lookup.loadCode("isoToString",
    methodType(int.class, String.class),
    new byte[]{42, 182, 0, 0, 172},
    new Object[]{
        lookup.findVirtual(String.class, "length", methodType(int.class))});

In the above example, the instructions array has been provided as immediate constants. To conveniently generate such arrays and the constants arrays they reference, a convenient generator for isolated methods is conceivable (its API is inspired by the ASM GeneratorAdapter):

MethodHandle stringLength =
    new IsolatedMethodBuilder("isoToString", methodType(int.class, String.class)).
        loadArg(0).
        invokeVirtual(lookup.findVirtual(String.class, "length", methodType(int.class))).
        returnValue().
        load();

The above examples serve to kick off a discussion about how the isolated methods loading API and possible supporting API can be shaped.

Alternatives

Anonymous classes (obtained via Unsafe.defineAnonymousClass()) are an already existing way to dynamically define classes. They specifically support use cases where an anonymous method needs to access state associated with it, e.g., in case of lambda expressions that close over local state. Isolated methods can substitute those uses of anonymous classes that fall into the "single static method, no state" category. Isolated methods share with anonymous classes the characteristic that they cannot be looked up by name.

An alternative approach to speed up bytecode spinning in the JDK core libraries is to use bytecode templates with predefined constant pools where only method bytecodes are inserted. This approach has two main drawbacks: it would be easily adoptable only in the core libraries but would not scale out to guest language implementations; and it would still require the bytecodes in question to be generated, separating which from the ASM class notion is hard.

Testing

There are no special platform or hardware requirements for testing. As the JDK core libraries themselves make use of method handles, and as especially the module system relies on lambdas and the ensuing bytecode spinning, the JDK itself is an excellent test bed. The existing tests for the method handle functionality will be valuable as well.

In terms of testing by guest language implementations, all such implementations that already utilize the method handles API will implicitly be available for testing with their respective test suites. Experimental extensions of such guest language implementations can adopt an implementation scheme based on isolated methods for ongoing testing. The Nashorn JavaScript engine, for instance, is capable of running a large body of standard JavaScript code, including benchmarks.

Risks and Assumptions

Introducing a new API to load code into the VM is risky per se. If this feature is deemed too risky, it can be moved to the Unsafe API.

Dependences

This JEP depends on the presence of a bytecode generation framework that provides easy access to the constant pool, and allows to decouple method generation from class generation.