JEP draft: Isolated Methods
Author | Michael Haupt |
Owner | Maurizio Cimadamore |
Type | Feature |
Scope | JDK |
Status | Draft |
Component | core-libs / java.lang.invoke |
Discussion | mlvm dash dev at openjdk dot java dot net |
Effort | L |
Duration | XL |
Reviewed by | Alex Buckley, Brian Goetz, Jim Laskey, Paul Sandoz, Vladimir Ivanov |
Created | 2016/06/06 14:00 |
Updated | 2018/04/16 21:06 |
Issue | 8158765 |
Summary
Extend the MethodHandles.Lookup
class of the java.lang.invoke
package to
support loading method bytecodes without an attached class, and to represent
such methods as method handles.
Goals
-
In the
MethodHandles.Lookup
class of thejava.lang.invoke
package, provide a new methodloadCode
to load a bytecode array plus constants as an isolated method and return aMethodHandle
representing that method. -
At the level of the JVM, provide an optimised means to store isolated methods.
Non-Goals
-
A new compilation strategy for lambdas is not in the scope of this JEP.
-
Extensions at the Java language level are explicitly out of scope.
-
Extensions to the Java Virtual Machine instruction set (bytecodes) are likewise out of scope.
Success Metrics
-
Improved performance of method handle infrastructure where it makes use of bytecode generation (also at startup).
-
Reduced memory footprint of method handle infrastructure (specifically,
LambdaForm
s,BoundMethodHandle
s, and invokers). -
Observable similar effects on dynamic language implementations once they adopt the new API.
Motivation
Both in the JDK core libraries and in language implementations running atop the JVM, it is a common pattern to generate stateless classes with a single static method. These classes are used to represent what is logically a method without a class, or an "isolated method". Generating them is cumbersome as it requires the generation of a full class, and imposes a certain load on the VM in terms of class loading and maintenance.
To enable a more lightweight solution for this scenario, there should be a way of expressing and loading an isolated method directly, and to get hold of it in a form that can be used to invoke it, and to make sure no access violations are carried out using it.
Method handles are usable abstractions for representing code that can be called.
Moreover, a means for controlling lookup and access contexts already exists in
the form of the MethodHandles.Lookup
class. What is missing is a means to load
a method in isolation. By adding a single API entry point to the
MethodHandles.Lookup
class that accepts the representation of an isolated
method, such a method can be loaded with the lookup context implied by the
Lookup
instance at hand.
There are several settings in the JDK core libraries, most notably in the low-level method handles infrastructure, where a new abstraction for isolated methods can be used to reduce code size and memory footprint, and to improve loading performance. Moreover, the Nashorn JavaScript engine can make use of the feature in a similar way, as it generates bytecode from JavaScript sources. Finally, all language implementations that run atop the JVM and generate bytecode may be clients of the isolated method loading capability.
It needs to be noted that the aforementioned scenarios all require access to the
internal Unsafe
API. Offering a disciplined and secure way of defining custom
code in the form of isolated methods will allow for rendering many of these uses
of Unsafe
unnecessary. Thereby, dependence on internal API, which guest
language implementations on the JVM often have, can be reduced.
Description
The MethodHandles.Lookup
class of the java.lang.invoke
package is to be
extended with a method like this:
MethodHandle loadCode(String name, MethodType type, byte[] instructions, Object[] constants)
The name
parameter is optional. It denotes the name the isolated method should
be identifiable by in stack traces.
The type
parameter determines the method's return type and parameter types.
The instructions
array contains the method's bytecode instructions, as they
would occur in a normal class file. A notable difference is that all indices
into the class' constant pool that the bytecode would normally contain are now
indices into the accompanying constants
array. This serves as a method-local
constant pool substitute.
The loadCode
method creates a method from the passed bytecode instructions and
constants and returns a MethodHandle
that can be used to call the method. The
implementation of loadCode
will take care of verification of the code to load.
This method is isolated from any class and behaves largely like a static method.
The method handle resulting from a loadCode
invocation is of the REF_static
kind. It cannot be cracked via MethodHandles.Lookup.revealDirect()
.
The context for a method defined in this way is determined by the Lookup
instance receiving the loadCode
call. In case the lookup privileges are not
sufficient, an exception will be thrown.
The constants
Array
The constants
array, meant to contain constants referenced from the bytecode,
deserves some attention. First and foremost, it should not be misunderstood as a
constant pool. It rather provides a higher level of abstraction over constant
pool contents, and adds convenience for clients.
The array of constant pool patches that can be passed to invocations of
Unsafe.defineAnonymousClass
plays a similar role. For instance, the constant
pool patches array allows to pass a String
where a CONSTANT_Utf8_info
entry
is to be patched; in fact, that entry consists of a tag byte, two-byte length,
and a character array. Unsafe.defineAnonymousClass
supports similar
convenience for other constant pool entries too.
For the constants
array passed to loadCode
, similar convenience should be
possible. For instance, where the method instructions reference a Java class,
the constants
array can contain a Class
instance, rather than lower-level
structures encountered in constant pools. Likewise, an INVOKEVIRTUAL
instruction can reference a constants
array entry that itself is a
MethodHandle
representing the method in question.
The following table lists the different forms of possible constant pool entries
and the Java classes that can be used to represent them in the constants
array.
-
CONSTANT_Utf8_info
:java.lang.String
-
CONSTANT_Integer_info
:int
,java.lang.Integer
-
CONSTANT_Float_info
:float
,java.lang.Float
-
CONSTANT_Long_info
:long
,java.lang.Long
-
CONSTANT_Double_info
:double
,java.lang.Double
-
CONSTANT_Class_info
:java.lang.Class
-
CONSTANT_String_info
:java.lang.String
-
CONSTANT_Fieldref_info
: ajava.lang.invoke.DirectMethodHandle
of the right kind, obtained via the appropriate API injava.lang.invoke.MethodHandles.Lookup
-
CONSTANT_Methodref_info
: ajava.lang.invoke.DirectMethodHandle
of the right kind, obtained via the appropriate API injava.lang.invoke.MethodHandles.Lookup
-
CONSTANT_InterfaceMethodref_info
: ajava.lang.invoke.DirectMethodHandle
of the right kind, obtained via the appropriate API injava.lang.invoke.MethodHandles.Lookup
-
CONSTANT_NameAndType_info
: (should not be required) -
CONSTANT_MethodHandle_info
:java.lang.invoke.MethodHandle
-
CONSTANT_MethodType_info
:java.lang.invoke.MethodType
-
CONSTANT_InvokeDynamic_info
: either a tuple of(java.lang.invoke.MethodType,java.lang.invoke.MethodHandle)
, where theMethodType
describes the call site's signature, and theMethodHandle
represents the bootstrap method with already bound static arguments; or an already initializedjava.lang.invoke.CallSite
In addition, the Valhalla project proposes several new constant pool entry
types, for which the substitutions in constants
arrays can be as follows. Note
that the table assumes tuples, which may be introduced with Valhalla, to be
existent in the language.
-
CONSTANT_ArrayType_info
: tuple of(byte,java.lang.Class)
-
CONSTANT_MethodDescriptor_info
: array ofjava.lang.Class
(Class
instances may have to offer some additional information as Valhalla progresses) -
CONSTANT_ParameterizedType_info
: tuple of(java.lang.Class,java.lang.Class[])
-
CONSTANT_TypeVar_info
: tuple of(java.lang.String,java.lang.Class)
As a further addition, the new constant pool entry types discussed in the general data in constant pools proposal can be represented as follows.
-
CONSTANT_Dynamic
: tuple of(java.lang.Class,java.lang.invoke.MethodHandle)
, where theClass
represents the expected type, and theMethodHandle
describes a bootstrap method with the static parameters already bound -
CONSTANT_Group
: an array orjava.util.List
-
CONSTANT_Bytes
:byte[]
As a note on generic methods, it needs to be pointed out that an isolated method does not have an enclosing class that could define type variables. Instead, all type variables mentioned in the signature of a generic isolated method belong to that method alone.
The constants
array can also contain all kinds of objects that can be loaded
using an LDC
instruction. This can be used to bind certain specific data that
are known at compile time.
It will be up to the implementation of loadCode
to turn these convenience
objects into proper lower-level representations resembling those in a constant
pool. The details of this depend on the implementation choices that will be made
for the internal representation of isolated methods.
Implementation
The loadCode
functionality can be implemented in several stages. Their depth
of integration with the present system increases.
Stage 1: Internal Use for LambdaForm
and Invoker Generation
The initial version of loadCode
should be provided as part of the non-public
API for Invokedynamic, e.g., as a non-public method in the
MethodHandles.Lookup
class, or in the MethodHandleImpl
class. There, it can
be used to generate LambdaForm
s and other invokers in the java.lang.invoke
implementation. The implementation should not treat isolated methods as such,
but wrap the LambdaForm
methods in a class as usual.
Stage 2: Optimised Internal Use for LambdaForm
and Invoker Generation
The internal stage 1 loadCode
implementation can, while the API remains
stable, be optimised at the level of HotSpot. At this time, there are two design
ideas that can be explored.
-
Internally, represent each isolated method as a method plus constant pool. The class an isolated method belongs to, to make it fit into the overall expectations of the VM, is a pseudo class that cannot be instantiated. This resembles the way single static method classes are currently built.
-
Add to HotSpot the notion of a pseudo class (dubbed
Gargantuan
) that will be the holder of all methods defined through theloadCode
interface. This will be an all-static class invisible from the outside (support forgetCallerClass
notwithstanding).Gargantuan
is a class that is intended to grow as new methods are defined. Methods can be collected when there are no moreMethodHandle
s referencing them. Each method inGargantuan
can have a context different from all other methods, depending on the lookup context at hand in aloadCode
invocation. This lookup context is preserved inGargantuan
and associated with the isolated method during its lifetime.The
constants
arrays of several isolated methods will very likely contain common constants. TheloadCode
VM-level implementation will make sure to only add those constants to theGargantuan
constant pool that are not already present, and to patch the bytecode instructions array accordingly. This elision of duplicate constant pool entries can also take place upon garbage collection to facilitate faster loading of isolated methods. Either way, all isolated methods share a common constant pool.The
Gargantuan
class can also exist once per module, which will enable efficient collection of constants stored for an isolated method, and possibly collection of other structures, as a module is unloaded.
Stage 3: Public API
Eventually, the loadCode
method should be public in MethodHandles.Lookup
,
to support its more widespread usage. In the meantime, availability via the MLVM
repository will allow for applying the loadCode
feature to existing language
implementations for experimentation.
Usage Examples
The examples below serve to point out possible future shapes of the
infrastructure needed to generate the instructions
array. All examples
describe the generation and loading of a method that has the signature
(Ljava/lang/String;)I
and retrieves the length of its argument. It consists of
these instructions:
ALOAD_0
INVOKEVIRTUAL #0 <String.length()>
IRETURN
The first example adopts the higher level of abstraction over the constant pool, as suggested above:
MethodHandlee stringLength = lookup.loadCode("isoToString",
methodType(int.class, String.class),
new byte[]{42, 182, 0, 0, 172},
new Object[]{
lookup.findVirtual(String.class, "length", methodType(int.class))});
In the above example, the instructions
array has been provided as immediate
constants. To conveniently generate such arrays and the constants
arrays they
reference, a convenient generator for isolated methods is conceivable (its API
is inspired by the ASM GeneratorAdapter
):
MethodHandle stringLength =
new IsolatedMethodBuilder("isoToString", methodType(int.class, String.class)).
loadArg(0).
invokeVirtual(lookup.findVirtual(String.class, "length", methodType(int.class))).
returnValue().
load();
The above examples serve to kick off a discussion about how the isolated methods loading API and possible supporting API can be shaped.
Alternatives
Anonymous classes (obtained via Unsafe.defineAnonymousClass()
) are an
already existing way to dynamically define classes. They specifically support
use cases where an anonymous method needs to access state associated with it,
e.g., in case of lambda expressions that close over local state. Isolated
methods can substitute those uses of anonymous classes that fall into the
"single static method, no state" category. Isolated methods share with anonymous
classes the characteristic that they cannot be looked up by name.
An alternative approach to speed up bytecode spinning in the JDK core libraries is to use bytecode templates with predefined constant pools where only method bytecodes are inserted. This approach has two main drawbacks: it would be easily adoptable only in the core libraries but would not scale out to guest language implementations; and it would still require the bytecodes in question to be generated, separating which from the ASM class notion is hard.
Testing
There are no special platform or hardware requirements for testing. As the JDK core libraries themselves make use of method handles, and as especially the module system relies on lambdas and the ensuing bytecode spinning, the JDK itself is an excellent test bed. The existing tests for the method handle functionality will be valuable as well.
In terms of testing by guest language implementations, all such implementations that already utilize the method handles API will implicitly be available for testing with their respective test suites. Experimental extensions of such guest language implementations can adopt an implementation scheme based on isolated methods for ongoing testing. The Nashorn JavaScript engine, for instance, is capable of running a large body of standard JavaScript code, including benchmarks.
Risks and Assumptions
Introducing a new API to load code into the VM is risky per se. If this feature
is deemed too risky, it can be moved to the Unsafe
API.
Dependences
This JEP depends on the presence of a bytecode generation framework that provides easy access to the constant pool, and allows to decouple method generation from class generation.