JEP draft: Code reflection (Incubator)
| Owner | Paul Sandoz |
| Type | Feature |
| Scope | JDK |
| Status | Draft |
| Component | core-libs |
| Effort | L |
| Duration | L |
| Reviewed by | Adam Sotona, Gary Frost, Juan Fumero, Maurizio Cimadamore |
| Created | 2025/06/30 19:54 |
| Updated | 2026/02/19 00:16 |
| Issue | 8361105 |
Summary
Enhance the core reflection API to model Java code, build and transform models of Java code, and access models of Java code in methods and lambda expressions. Libraries can use this enhancement to analyze Java code and extend its reach, such as executing it as code on GPUs. This is an incubating API.
Goals
- Enable Java developers to interface with non-Java (foreign) programming models using familiar Java language constructs, such as lambda expressions and static typing.
- Encourage libraries to expose novel programming models to Java developers without requiring developers to embed non-Java code inside Java code, or to write tedious Java code that builds data structures to model Java code or other (foreign) code.
Non-Goals
- It is not a goal to change the meaning of Java programs as specified by the Java Language Specification, compile Java source code to anything other than the instruction set as specified by the Java Virtual Machine specification, change the JVM’s instruction set, and change HotSpot to support instruction sets of specialized processing units. For example, it is not a goal to make such changes to the Java platform to execute Java methods on GPUs.
- It is not a goal to standardize the internal Abstract Syntax Tree of
javacto serve as the model for Java code. - It is not a goal to enable access at run time to bytecode and for it to serve as the model for Java code.
- It is not a goal to devise a general metaprogramming or macro facility for the Java language.
- It is not a goal to introduce language constructs, like class literals, to concisely express access to a model of code.
Motivation
Many Java programs need to process large amounts of data in parallel, and Java libraries make it easy to implement parallel computations. For example, in a face detection algorithm, we need to convert RGB pixels to grayscale; here is simplified code to do that using a lambda expression and the parallel streams built into the JDK:
IntConsumer rgbToGray = i -> {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
};
IntStream.range(0, N)
.parallel()
.forEach(rgbToGray);
If the number of pixels N is sufficiently large and/or the work to compute
each pixel is sufficiently demanding, then the stream will compute the result
faster than a single-threaded for loop, even with the overhead of starting and
coordinating multiple threads.
for (int i = 0; i < N; i++) {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
}
Gustafson's Law states that as we increase the number of
threads M, each working on a sufficiently large number of pixels, the
estimated speedup of a program will approach M as the fraction of time spent
on parallel tasks grows.
Unfortunately, the number of threads that can run compute-intensive tasks is limited by the CPU, e.g., an AMD EPYC 9005 Zen 5c has 384 threads. Java 21 introduced virtual threads to run large numbers of I/O-intensive tasks in parallel, but virtual threads do not create new compute resources and cannot speed up code that is already CPU-bound.
General-purpose computing with Graphics Processing Units
There is a class of computing device, the Graphics Processing Unit (GPU), whose architecture is very different to the CPU: rather than a few hundred threads, a modern GPU such as an NVIDIA Blackwell B200 GPU can simultaneously execute a few hundred thousand threads.
Originally GPUs were designed for rendering images and video games, but now we can use them for general-purpose computations such as face detection, General Matrix Multiplication (GEMM), or Fast Fourier Transformation (FFT).
If we could run parallel tasks on a GPU instead of a CPU, with orders of magnitude more threads, we could either greatly reduce the execution time or compute more in the same execution time.
Historically one approach was to write the multithreaded computation in a language supported by the GPU, e,g., CUDA C, and embed it as a string in a Java program. We could use JNI to run the CUDA C compiler and transfer the compiled code to the GPU for execution.
static void gpuComputation(int N, byte[] rgbImage, byte[] grayImage) {
var cudaCCode = """
__device__
char gray(char r, char g, char b) { return ...; }
__global__
void computeGrayImage(int N, char* rgbImage, char* grayImage) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < N) {
char r = rgbImage[i * 3 + 0];
char g = rgbImage[i * 3 + 1];
char b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
}
}
""";
var gpuCode = compileGpuCode(cudaCCode);
runGpuCode(gpuCode, N, rgbImage, grayImage);
}
This approach is problematic: it presents a leaky abstraction that forces
developers to be familiar with CUDA artifacts, and the code is no longer
hardware independent. Expecting developers to write Java code that merely
carries non-Java code is misguided since javac cannot compile and check that
code; they can choose any language, not just Java, to carry CUDA C.
Exploring Better Abstractions
In the 2010s, OpenJDK Project Sumatra aimed to let Java developers take advantage of GPUs by enhancing the JVM and parallel streams. The Sumatra JVM could generate code for AMD GPUs and place parts of the JVM’s heap in GPU memory. This approach, where the Java Platform obscures the presence of a GPU from Java code, is in stark contrast to manually embedding GPU code into a Java program. Neither approach provides the right abstraction, and this is why we abandoned Sumatra.
Obscuring the GPU is particularly challenging. First, memory is split between CPU and GPU; managing the JVM’s heap across the CPU and GPU can be a continual drag on performance. Second, the idiomatic Java code in lambda expressions is polymorphic: methods are commonly invoked on interfaces rather than classes and each invocation triggers virtual method lookup and possibly class loading, initialization, etc. It is counterproductive to bring this highly variable behavior, where each thread may run different code, to the GPU, where each thread is intended to run identical code in lock step on different data elements using a Single Instruction Multiple Thread (SIMT) execution model.
Empowering libraries
We believe the best way to support GPUs in the Java Platform is to introduce primitives that enable the creation of libraries which, in turn, introduce novel programming models and APIs that harness the unique memory and execution capabilities of GPUs. With these primitives we design the Java platform for growth. Libraries can introduce their novel programming models that feel part of the Java platform and yet are free to deviate from the semantics of Java code. They are not held back waiting, at vast expense, for the addition of questionable features to the Java platform. (See the Foreign programming models section for more detail.)
One such primitive is the Foreign Function & Memory (FFM) API, introduced in Java 22. While the FFM API has no built-in knowledge of GPUs, it allows libraries to interact efficiently with native device drivers on the CPU and thereby control the GPU indirectly.
If libraries are to translate the parallel parts of Java programs to GPU code, they need access to Java code. Fortunately, the Java Platform has a longstanding primitive –- core reflection -- which allows a library to inspect the structure of a Java program. Java 1.1 introduced reflection at run time, the core reflection API, kick-starting an ecosystem of libraries for data access, unit testing, messaging, etc. Java 5 introduced reflection at compile time, allowing annotation processors to generate code that extends the application with no maintenance overhead.
Unfortunately, the core reflection API is limited and does not provide access to the code in methods or lambda expressions. We can use the core reflection API to inspect what methods a class declares, but we can go no deeper and inspect the code of those methods.
A library can access the source code of methods and lambdas with internal APIs
of javac, but this is only available at compile time and is too complex since
it contains many extraneous syntactic details. At run time, a library can access
the bytecode of methods (but not lambdas) with the Class-File API, but
this is a poor substitute for source code and class files are not always
available.
Code Reflection: The Missing Primitive
To support libraries effectively, we propose to enhance reflection to expose not just classes, fields, and methods but also the code of methods and lambda expressions. With this enhancement, we can develop libraries that translate Java code, e.g., the lambda expression used in a parallel stream, into GPU code, eliminating the need to manually write CUDA C code. With knowledge of both Java code and GPU code, libraries can model data dependencies and optimize data transfer between CPU and GPU for better performance.
Foreign programming models
Just as the FFM API is not specific to GPUs, an API providing access to Java code is not specific to GPUs. Libraries could use it to, e.g., automatically differentiate Java code, pass translated Java code to native machine learning runtimes, or translate Java code to SQL statements. These are all examples of foreign programming models.
A library that translates Java code to code of some other foreign programming
model does not, in general, preserve the semantics of that Java code. For
example, a GPU library will not, in general, preserve the semantics of Java code
when it translates it to C code conforming to the CUDA C programming model
specified by NVIDIA. The GPU library specifies the rules as to what constitutes
translatable Java code. For example, it may reject try statements and not
preserve accuracy of floating point operations. Those rules are foreign to the
Java programming model as specified by the Java Language Specification, which
specifies nothing about programming models for GPUs, nor those for automatic
differentiation etc.
So with code reflection we can leverage a foreign programming model and with the Foreign Function & Memory API we can leverage a foreign runtime. Using both together we can embrace a foreign world and orchestrate complex activity between the two.
Enabling the incubating API
Code reflection is an incubating API, disabled by default. Code
reflection is offered in the incubator module jdk.incubator.code. To try
out code reflection you must request that the incubator module
jdk.incubator.code be resolved:
- Compile the program with
javac --add-modules jdk.incubator.code Main.javaand run it withjava --add-modules jdk.incubator.code Main; or - When using the source code launcher, run the program with
java --add-modules jdk.incubator.code Main.java;, or - When using jshell, start it with
jshell --add-modules jdk.incubator.code.
Description
We propose to enhance the core reflection API with code reflection. Code reflection supports access to a model of code in a method or lambda expression, a code model, at run time that is suited for analysis and transformation.
With code reflection, a library can generate CUDA C code from Java code. Recall the lambda and stream based example.
IntConsumer rgbToGray = i -> {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
};
IntStream.range(0, N)
.parallel()
.forEach(rgbToGray);
First, we declare that the lambda expression is reflectable and thereby grant
access to its code. We do so by casting our lambda expression to the target
interface annotated with @Reflect.
final byte[] rgbImage = ...
final float[] grayImage = ...
IntConsumer rgbToGray = (@Reflect IntConsumer) i -> {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
};
When the lambda expression is compiled by javac it translates its internal
model of the lamda expression to a standard model, a code model, and stores
the code model in a class file related to the class file containing the bytecode
of the compiled lambda expression. A code model is an immutable tree of
code elements, where in general each element models some Java statement or
expression.
We use code reflection to access the lambda expression’s code model, which loads the corresponding code model that was stored in the related class file.
var rgbToGrayModel = Op.ofLambda(rgbToGray).orElseThrow();
(Since code reflection is an incubating enhancement to the core reflection API
we cannot add new APIs in packages of other modules, such as in the
java.lang.reflect package of the java.base module. For now, we must provide
such methods in the incubating code reflection module.)
The method Op.ofLambda returns the code model for the result of a reflectable
lambda expression, in this case an instance of IntConsumer. By default, lambda
expressions are not reflectable, so we return an optional value. (For more
details see the Declaring reflectable code
section.)
What does the lamba expression's code model look like? To get some comprehension
we can convert the code model to a string and print it out. Below we show part
of that string with embedded comments associating code elements with the
langauge elements they model. Later in the Code models section
we explain further by looking in more detail at the code model of the gray
method.
%0 : java.type:"java.util.function.IntConsumer" = lambda
@lambda.isReflectable=true
(%1 : java.type:"int")java.type:"void" -> {
// declaration of method parameter i
%2 : Var<java.type:"int"> = var %1 @"i";
// access to captured rgbImage
%3 : java.type:"byte[]" = var.load %4;
// access to i
%5 : java.type:"int" = var.load %2;
// 3
%6 : java.type:"int" = constant @3;
// i * 3
%7 : java.type:"int" = mul %5 %6;
// 0
%8 : java.type:"int" = constant @0;
// i * 3 + 0
%9 : java.type:"int" = add %7 %8;
// rgbImage[i * 3 + 0]
%10 : java.type:"byte" = array.load %3 %9;
// byte r = rgbImage[i * 3 + 0]
%11 : Var<java.type:"byte"> = var %10 @"r";
...
// access to r
%31 : java.type:"byte" = var.load %11;
// conversion of byte to int
%32 : java.type:"int" = conv %31;
...
// gray(r, g, b)
%37 : java.type:"int" = invoke %32 %34 %36 @java.ref:"GPUExample::gray(int, int, int):int";
// conversion of int to float
%38 : java.type:"float" = conv %37;
// grayImage[i] = gray(r, g, b)
array.store %28 %30 %38;
// implicit return
return;
};
Once we have the Java code model we can pass it to our GPU library.
String cudaCCode = translateJavaCodeToCudaCCode(rgbToGrayModel);
The GPU library uses code reflection to traverse the code model and translate it to CUDA C code embedded in a string, similar to what we previously wrote by hand, after which the example proceeds as before to compile the CUDA C code and run it.
As the GPU library traverses the code model it will encounter an element that
models the invocation expression to the gray method. This method also needs to
be translated to CUDA C code, otherwise we will generate an incomplete CUDA
program. However, the library has no intrinsic support for this method. The
library requires the code model of this method so that it can traverse and
translate like was done with the lambda expression’s code model.
To achieve this we must declare that the gray method is also reflectable and
thereby grant access to its code. We do so by also annotating our method with
the @Reflect annotation.
@Reflect
static int gray(int r, int g, int b) {
return ...;
}
Ordinarily the GPU library would call the methods to translate, compile, and run on behalf of the user, so the user simply calls just one method.
compileAndRun((@Reflect IntConsumer) i -> {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
});
This enables the GPU library to hide the details of translation and compilation. For example, the library may, instead of translating to CUDA C, translate to an intermediate representation called SPIR-V, a standard binary interchange format for cross-platform computation on GPUs.
More importantly the GPU library can choose where to run the code. When performance is not a concern the library could be configured to directly invoke the lambda expression on the JVM using just a few platform threads, e.g, internally using a sequential or parallel stream. The developer can then use existing Java tools to debug their code and write unit tests against their code. When suitably debugged and tested the code can be run with increased confidence and performance on the GPU, where the code is much harder to debug and test.
Declaring reflectable code
We have previously shown how to declare a reflectable lambda expression, using
the @Reflect annotation, and access its code model using code reflection.
However, we think declaring reflectable code would best be done via a new
keyword in the Java language, but an incubating module can't introduce language
features. So, using an annotation is good enough for now. In some future
non-incubating JEP we might devise the required language feature.
Declaration serves two purposes. First, we explicitly grant that other parts of
our Java application may have run time access to the code, such as a library we
may not be directly responsible for. Not all code needs to be reflected over,
and not all code should, so we can reduce to only that which is necessary to
share. Second, it informs javac it needs to perform additional tasks, so that
a code model can be built and is made accessible at run time.
In total there are four syntactic locations where @Reflect can appear that
governs, in increasing scope, what is declared reflectable.
-
If the annotation appears in a cast expression of a lambda expression (or method reference), annotating the use of the type in the cast operator of the cast expression, then the lambda expression is declared reflectable. For example,
compileAndRun((@Reflect IntConsumer) i -> { ... }); -
If the annotation appears as a modifier for a field declaration or a local variable declaration, annotating the field or local variable, then any lambda expressions (or method references) in the variable initializer expression (if present) are declared reflectable. This is useful when cast expressions become verbose and/or types become hard to reason about. For example, with fluent stream-like expressions where many reflectable lambda expressions are passed as arguments. For example,
@Reflect IntConsumer rgbToGray = i -> { ... }; compileAndRun(rgbToGray); -
Finally, if the annotation appears as a modifier for a non-abstract method declaration, annotating the method, then the method and any lambda expressions (or method references) it contains are declared reflectable. For example,
@Reflect static int gray(int r, int g, int b) { ... }
The annotation is ignored if it appears in any other valid syntactic location.
Declaring a reflectable lambda expression or method does not implicitly broaden
the scope of what is reflectable to methods they invoke. (In our GPU example we
needed to annotate the gray method.) Furthermore, declaring a reflectable
lambda expression does broaden the scope to the surrounding code of final, or
effectively final, variables used but not declared in the lambda expression.
We access the code model of a reflectable method by invoking the method
Op.ofMethod with a given Method instance, which returns an optional instance
of the code model, a code element modeling the method declaration (see
the Code models section).
We access the code model of a reflectable lambda expression by invoking the
method Op.ofLambda with a given instance of a functional interface associated
with the lambda expression, which returns an optional instance of
Quoted<JavaOp.LambdaOp>. From the Quoted instance we can obtain the code
model, a code element that models the lambda expression. In addition, we can
obtain a mapping of run time values to items in the code model that model final,
or effectively final, variables used but not declared in the lambda expression.
Code Models
A code model is an immutable instance of data structures that can, in general,
model many kinds of code, be it Java code or foreign code. It has some
properties like an Abstract Syntax Tree (AST) used by a source compiler
like javac, such as modeling code as a tree of arbitrary depth, and some
properties like an intermediate representation used by an optimizing
compiler like HotSpot, such as modeling control flow and data flow as graphs.
These properties ensure code models can preserve many important details of code
they model and ensure code models are suited for analysis and transformation.
A code model of Java code produced by javac will be similar in fidelity to
javac's AST of that Java code. That model can be progressively lowered, via
transformation (while preserving the meaning of the code it models), to a code
model that is similar in fidelity to bytecode or HotSpot's C2 IR. Code
reflection provides one set of features to support code models across the
spectrum of high fidelity to low fidelity.
The primary data structure of a code model is a tree of code elements. There are three kinds of code elements, operation, body, and block. The root of a code model is an operation, and descendant operations form a tree of arbitrary depth.
Code reflection supports representing the data structures of a code model, code elements for modeling Java language constructs and behavior, traversing code models, building code models, and transforming code models. We shall explain with further examples.
Traversing code models
Continuing with our GPU example, we shall reflect over the gray method
(presented again with its implementation), access its code model, and traverse
to print the model’s tree structure.
@Reflect
static int gray(int r, int g, int b) {
return (29 * r + 60 * g + 11 * b) / 100;
}
var grayMethod = GPUExample.class.getDeclaredMethod("gray",
int.class, int.class, int.class, int.class);
FuncOp grayModel = Op.ofMethod(addMethod).orElseThrow();
(To simplify we assume all bytes of the rgbImage array are normalized to be
within the range of 0 to 100.)
First we obtain the core reflection Method instance for the method declaration
of the gray method, then we go deeper and access the method’s code model
using Op.ofMethod (as previously explained). The root of the code model is an
operation, an instance of FuncOp that is a function declaration operation
modeling the method.
We can then stream over elements of the code model, sorted topologically in
pre-order traversal, using the CodeElement.elements method.
grayModel.elements().forEach((CodeElement<?, ?> e) -> {
int depth = 0;
var parent = e;
while ((parent = parent.parent()) != null) depth++;
IO.println(" ".repeat(depth) + e.getClass());
});
This code prints out the class of each code element it encounters and prefixes that with white space proportionate to the depth of the element in the code model tree. (We compute the depth for each code element by traversing back up the code model tree until the root element is reached. So, it is possible to traverse up and down the code model tree.)
class jdk.incubator.code.dialect.core.CoreOp$FuncOp
class jdk.incubator.code.Body
class jdk.incubator.code.Block
class jdk.incubator.code.dialect.core.CoreOp$VarOp
class jdk.incubator.code.dialect.core.CoreOp$VarOp
class jdk.incubator.code.dialect.core.CoreOp$VarOp
class jdk.incubator.code.dialect.core.CoreOp$ConstantOp
class jdk.incubator.code.dialect.core.CoreOp$VarAccessOp$VarLoadOp
class jdk.incubator.code.dialect.java.JavaOp$MulOp
class jdk.incubator.code.dialect.core.CoreOp$ConstantOp
class jdk.incubator.code.dialect.core.CoreOp$VarAccessOp$VarLoadOp
class jdk.incubator.code.dialect.java.JavaOp$MulOp
class jdk.incubator.code.dialect.java.JavaOp$AddOp
class jdk.incubator.code.dialect.core.CoreOp$ConstantOp
class jdk.incubator.code.dialect.core.CoreOp$VarAccessOp$VarLoadOp
class jdk.incubator.code.dialect.java.JavaOp$MulOp
class jdk.incubator.code.dialect.java.JavaOp$AddOp
class jdk.incubator.code.dialect.core.CoreOp$ConstantOp
class jdk.incubator.code.dialect.java.JavaOp$DivOp
class jdk.incubator.code.dialect.core.CoreOp$ReturnOp
We can observe that the top of the tree is the FuncOp which contains one
child, a Body, which in turn contains one child, a Block, which in turn
contains a sequence of operations. Bodies and blocks provide additional
structure for modeling code. Each operation models some part of the method's
code, for example variable declaration operations (instances of VarOp)
model Java variable declarations, in this case the method parameters, and the
add operation (instance of AddOp) models the Java + operator.
Now that we know how to access code models and traverse them we can combine both to traverse from the lambda expression's code model to the gray method's code model (as needed by the GPU library).
final byte[] rgbImage = ...
final float[] grayImage = ...
@Reflect
IntConsumer rgbToGray = i -> {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
};
Quoted<LambdaOp> rgbToGrayQuotedModel = Op.ofLambda(rgbToGray).orElseThrow();
LambdaOp rgbToGrayModel = rgbToGrayQuotedModel.op();
FuncOp grayModel = lambdaModel.elements()
.flatMap((e) -> switch (e) {
case JavaOp.InvokeOp iop -> {
Method m;
try {
m = iop.invokeDescriptor().resolveToMethod(MethodHandles.lookup());
} catch (ReflectiveOperationException ex) {
yield null;
}
yield Op.ofMethod(m).stream();
}
default -> null;
}).findFirst().orElseThrow();
We access the lambda expression’s code model using Op.ofLambda (as previously
explained), which returns an instance of Quoted<LambdaOp> holding the code
model. The code model is an operation, an instance of LambdaOp, that is a
lambda expression operation modeling the lambda expression. The quoted instance
also holds run time values for the final local variables rgbImage and
grayImage, and associates them with items in the code model that model those
variables. (If the variables were instead instance fields then the quoted
instance would hold the value of this.)
The stream expression maps an operation modeling an invocation expression, an
instance of InvokeOp, to the code model of the method it invokes, and returns
the first model it encounters. We do this by resolving an invoke operation's
description of the method it invokes to a Method instance from which we can
obtain its code model.
(We assume the MethodHandles.lookup() instance, passed as an argument to
resolveToMethod, grants permission to resolve.)
To increase our comprehension of grayModel's code model we can convert it to a
string and print it out.
IO.println(grayModel.toText());
(The toText method will traverse the code elements in a similar manner as we
did previously when using the elements method.)
36 ....@Reflect
37 ....static int gray(int r, int g, int b) {
38 .... return (29 * r + 60 * g + 11 * b) / 100;
39 ....}
func @loc="36:5:file:/.../GPUExample.java" @"gray" (
%0 : java.type:"int", %1 : java.type:"int", %2 : java.type:"int")java.type:"int" -> {
%3 : Var<java.type:"int"> = var %0 @loc="36:5" @"r";
%4 : Var<java.type:"int"> = var %1 @loc="36:5" @"g";
%5 : Var<java.type:"int"> = var %2 @loc="36:5" @"b";
%6 : java.type:"int" = constant @loc="38:17" @29;
%7 : java.type:"int" = var.load %3 @loc="38:22";
%8 : java.type:"int" = mul %6 %7 @loc="38:17";
%9 : java.type:"int" = constant @loc="38:26" @60;
%10 : java.type:"int" = var.load %4 @loc="38:31";
%11 : java.type:"int" = mul %9 %10 @loc="38:26";
%12 : java.type:"int" = add %8 %11 @loc="38:17";
%13 : java.type:"int" = constant @loc="38:35" @11;
%14 : java.type:"int" = var.load %5 @loc="38:40";
%15 : java.type:"int" = mul %13 %14 @loc="38:35";
%16 : java.type:"int" = add %12 %15 @loc="38:17";
%17 : java.type:"int" = constant @loc="38:45" @100;
%18 : java.type:"int" = div %16 %17 @loc="38:16";
return %18 @loc="38:9";
};
A code model’s text is designed to be human-readable. Its format is unspecified and is intended for debugging, testing, and comprehension. To further aid debugging each operation has line number information, and the root operation also has source information from where the code model originated.
The code model text shows the code model’s root element is a function
declaration (func) operation. The lambda-like expression represents the fusion
of the function declaration operation’s single body and the body’s first and
only block, called the entry block. Then there is a sequence of operations in
the entry block. For each operation there is an instance of a corresponding Java
class, all of which extend from the abstract class jdk.incubator.code.Op and
which have already seen when we printed out the classes. Unsurprisingly the
printed operations and printed operation classes occur in the same order since
the toText method traverses the model in the same order as we explicitly
traversed.
The entry block declares three values called block parameters, %0, %1, and
%2, which model the method’s initial values for parameters r, g and b.
Focusing on the block parameter %0 we can track its transitive dependencies as
it is used as an operand of an operation that produces a value called an
operation result, which is used as an operand of a subsequent operation and so
on until we reach the return operation.
func @loc="36:5:file:/.../GPUExample.java" @"gray" (
%0 : java.type:"int", %1 : java.type:"int", %2 : java.type:"int")java.type:"int" -> {
// declaration of method parameter r
%3 : Var<java.type:"int"> = var %0 @loc="36:5" @"r";
...
// access to r
%7 : java.type:"int" = var.load %3 @loc="38:22";
// 29 * r
%8 : java.type:"int" = mul %6 %7 @loc="38:17";
...
// (29 * r + 60 * g)
%12 : java.type:"int" = add %8 %11 @loc="38:17";
...
// (29 * r + 60 * g + 11 * b)
%16 : java.type:"int" = add %12 %15 @loc="38:17";
...
// (29 * r + 60 * g + 11 * b) / 100
%18 : java.type:"int" = div %16 %17 @loc="38:16";
return %18 @loc="38:9";
};
The declaration of parameter r is modeled as an embedded var operation,
initialized with block parameter %0 used as the var operation’s single
operand. The operation result, %3, models the parameter as a variable value.
A variable value can be loaded from or stored to using variable access
operations, respectively modeling an expression that denotes a variable and
assignment to a variable. The expression denoting parameter r is modeled as
a var.load operation that uses %3 as an operand. The operation result, %7
modeling the current value of r, is used by the mul operation and so on.
Finally, the result of the div operation modeling the / operator, %18,
is used by the return operation modeling the return statement.
The source code of our method might contain all sorts of syntactic details
that javac will represent in its internal model but are extraneous details
for code reflection. This complexity is not present in the code model. For
example, the same code model would be produced if subexpressions in the return
statement were explicitly grouped e.g.,
(((29 * r) + (60 * g)) + (11 * b)) / (100).
In addition to the code model containing code elements forming a tree it also
contains other code items, values (block parameters or operation results) we
previously introduced, that form bidirectional dependency graphs between their
declaration and their use. A value also has a type element, another code item,
modeling the set of all possible values. In our example many of the type
elements model Java types, and some model the type of variable values (the type
element of the operation result of a var operation).
Astute readers may observe that code models are in Static Single-Assignment (SSA) form, and there is no explicit distinction, as there is in the source code, between statements and expressions. Block parameters and operation results are declared before they are used and cannot be reassigned (and we therefore require special operations and type elements to model variables as we previously showed).
Finally, we can execute the code model by transforming it to byte code, wrapping it in a method handle, and invoking the handle.
var handle = BytecodeGenerator.generate(MethodHandles.lookup(), addModel);
assert ExampleAdd.gray(32, 32, 32) == (int) handle.invokeExact(32, 32, 32);
Building code models
Building code models is an important feature of code reflection that is used
by many other areas. For example, javac uses this feature to build code models
of reflectable code and the run time uses it when those same models are
accessed. Later we shall see how building is composed to support transformation
of code models.
We can write Java code to build an equivalent code model we previously accessed and traversed (that was built for us at compile time and run time).
var builtGrayModel = func(
"gray",
CoreType.functionType(INT, INT, INT, INT))
.body((Block.Builder bldr) -> {
// int r
Op.Result varR = bldr.op(var("r", bldr.parameters().get(0)));
// int g
Op.Result varG = bldr.op(var("g", bldr.parameters().get(1)));
// int b
Op.Result varB = bldr.op(var("b", bldr.parameters().get(1)));
// (((29 * r) + (60 * g)) + 11 * b)
var sum = bldr.op(add(
bldr.op(add(
bldr.op(mul(
bldr.op(constant(INT, 29)),
bldr.op(varLoad(varR)))),
bldr.op(mul(
bldr.op(constant(INT, 60)),
bldr.op(varLoad(varG)))))),
bldr.op(mul(
bldr.op(constant(INT, 11)),
bldr.op(varLoad(varB))))));
// return (...) / 100;
bldr.op(return_(
bldr.op(div(
sum,
bldr.op(constant(INT, 100))))));
});
The consuming lambda expression passed to the body method operates on a block
builder, instance of Block.Builder, representing the entry block being built.
We use that to append operations to the entry block. When an operation is
appended it produces an operation result that can be used as an operand of a
further operation and so on. It is possible to fluently build complete
expressions as expression trees, or build separate subexpressions. In the
above example we separated out the building of the subexpression corresponding
to the numerator, and used the result of that subexpression, sum, as the first
operand of the div operation. The same code model will be built regardless
of whether expressions are built fluently or separately as distinct statements
(assuming the left-to-right precedence of the return statement's expression
is preserved).
When the body method returns a body element and the entry block element it
contains will be fully built. Building is carefully designed so that
structurally invalid models cannot be built.
We don’t anticipate most users will commonly build complete models of Java code,
since it’s a rather verbose and tedious process, although potentially less so
than other approaches e.g., building byte code, or using method handle
combinators. Building complete models is more likely to be performed by tooling,
like javac, and since it is very good at building code models it can be
employed to do so for many purposes. Instead, we anticipate many users will
build parts of models when they transform them.
Transforming code models
Code reflection supports the transformation of code models by combining traversing and building. A code model transformation is represented by a function that takes an operation, encountered in the (input) model being transformed, and a code model builder for the resulting transformed (output) model, and mediates how, if at all, that operation is transformed into other code elements that are built. We were inspired by the functional transformation approach devised by the Class-File API and adapted that design to work on the tree structure of (immutable) code models.
We can write a simple code model transform that transforms our gray method’s
code model, replacing the operation modeling the + operator with an invocation
operation modeling an invocation expression to the method Integer.sum.
MethodRef SUM = MethodRef.method(Integer.class, "sum", int.class,
int.class, int.class);
CodeTransformer grayToMethodTransformer = CodeTransformer.opTransformer((
Function<Op, Op.Result> builder,
Op inputOp,
List<Value> outputOperands) -> {
switch (inputOp) {
// Replace a + b; with Integer.sum(a, b);
case AddOp _ -> builder.apply(invoke(SUM, outputOperands));
// Copy operation
default -> builder.apply(inputOp);
}
});
The code transformation function, passed as lambda expression to
CodeTransformer.opTransformer, accepts as parameters a block builder function,
builder, an operation encountered when traversing the input code model,
inputOp, and a list of values in the output model being built that are
associated with input operation’s operands, outputOperands. We must have
previously encountered and transformed the input operations whose results are
associated with those values, since values can only be used after they have been
declared.
In the code transformation we switch over the input operation, and in this case
we just match on an add operation and by default any other operation. In the
latter case we apply the input operation to the builder function, which creates
a new output operation that is a copy of the input operation, appends the new
operation to the block being built, and associates the new operation’s result
with the input operation’s result. When we match on an add operation we
replace it by building part of a code model, a method invoke operation to the
Integer.sum method constructed with the given output operands. The result of
the output invoke operation is automatically associated with the result of the
input add operation.
We can then transform the method’s code model by invoking the FuncOp.transform
method with the code transformer as an argument.
FuncOp transformedGrayModel = grayModel.transform(grayToMethodTransformer);
IO.println(transformedGrayModel.toText());
The transformed code model is naturally very similar to the input code model.
func @loc="36:5:file:/.../GPUExample.java" @"gray" (
%0 : java.type:"int", %1 : java.type:"int", %2 : java.type:"int")java.type:"int" -> {
%3 : Var<java.type:"int"> = var %0 @loc="36:5" @"r";
%4 : Var<java.type:"int"> = var %1 @loc="36:5" @"g";
%5 : Var<java.type:"int"> = var %2 @loc="36:5" @"b";
%6 : java.type:"int" = constant @loc="38:17" @29;
%7 : java.type:"int" = var.load %3 @loc="38:22";
%8 : java.type:"int" = mul %6 %7 @loc="38:17";
%9 : java.type:"int" = constant @loc="38:26" @60;
%10 : java.type:"int" = var.load %4 @loc="38:31";
%11 : java.type:"int" = mul %9 %10 @loc="38:26";
%12 : java.type:"int" = invoke %8 %11 @java.ref:"java.lang.Integer::sum(int, int):int";
%13 : java.type:"int" = constant @loc="38:35" @11;
%14 : java.type:"int" = var.load %5 @loc="38:40";
%15 : java.type:"int" = mul %13 %14 @loc="38:35";
%16 : java.type:"int" = invoke %12 %15 @java.ref:"java.lang.Integer::sum(int, int):int";
%17 : java.type:"int" = constant @loc="38:45" @100;
%18 : java.type:"int" = div %16 %17 @loc="38:16";
return %18 @loc="38:9";
};
We can observe the two add operations have been replaced with two invoke
operations. Also, by default, each operation that was copied preserves line
number information. This code transformation can also be applied unmodified to
the code model of our lambda expression or to more complex models containing
many + operators in arbitrarily nested positions.
The code transformation function is not a direct implementation of functional
interface CodeTransformer. Instead, we adapted from another functional
interface, which is easier to implement for simpler transformations on
operations. Direct implementations of CodeTransformer are more complex but are
also capable of more complex transformations, such as building new blocks and
retaining more control over associating items in the input and output models.
Code reflection provides many complex code transformers, such as those
for progressively lowering code models, converting models into pure SSA-form,
and inlining models into other models. We will continue to explore the code
model transformation design to better understand how we can improve it across
the spectrum of simple to complex transformations.
Future work
We shall explore access to code models at compile time. Code reflection provides very basic support for annotation processors to access code models of program elements, and while useful for advanced experimentation it needs more consideration.
As the language evolves we shall look for opportunities to enhance code reflection to take advantage of new language features, especially features related to pattern matching and data-oriented programming. We anticipate pattern matching will strongly influence code reflection and enhance the querying of code models. Furthermore, this is an opportunity to provide feedback on language features.
We need to explore the language feature for declaration of reflectable code. Use
of the @Reflect annotation is a temporary solution that is good enough for
incubation but insufficient for preview.
We need to ensure that a library using code reflection can operate on code models produced by a JDK version that is greater that the version it was compiled against. Such forward compatibility is challenging. We shall explore solutions, such as the library declaring an upper bound of JDK versions of reflective code it supports or enabling the lowering of a modeled language feature a library is not capable of processing to modeled features it can (potentially compromising high fidelity but still preserving programing meaning).
Alternatives
Compiler Tree API
The com.sun.source package of the jdk.compiler module contains the javac
API for accessing the abstract trees (ASTs) representing Java source
code. Javac uses an implementation of this API when parsing source code. This
API is not suitable for standardization as it is too intertwined with javac’s
implementation, since javac reserves the right to make breaking changes to
this API as the language evolves. More generally ASTs can be difficult to
analyze and transform. For example, a modern optimizing compiler will transform
its AST representing source code into another slightly lower form, an
intermediate representation, that is easier to analyze and transform to
executable code.
Bytecode
Bytecode is not easily accessible nor guaranteed to be so at run
time, and even if we made it so it would not be ideal. The translation of Java
source to bytecode by javac will result in numerous Java language features
being translated away, making it hard to recover them e.g., lambda expressions
are translated into invoke dynamic instructions and synthetic methods. Bytecode
is also, by default, too low-level which makes it difficult to analyze and
transform. For example, the HotSpot C2 compiler will transform bytecode into
another higher form, an intermediate representation, that is easier to analyze
and transform to executable code.
C# expression trees
C# expression trees represent code in a tree-like data structure, where each node is an expression. The C# compiler supports the declaration of expressible C# code to access to the code's expression tree. Expressions trees can also be built directly.
C# expression trees have many similarities to code reflection. However, C# is limited in the set of C# statements and expressions it can model and is therefore limited in what C# code it can access as expression trees. Code reflection can model nearly all Java statements and expressions and therefore can access a wider variety of Java code as code models. Further, code models are more suited to analysis and transformation since they combine the properties of models used by source compilers and optimizing compilers.
C#'s language feature for declaration of expressible code is superior to
code reflection's use of the @Reflect annotation to declare reflectable code.
We can learn from C# when devising the equivalent Java langauge feature.
Testing
Testing will focus on a suite of unit tests for the compiler and runtime that give high modeling coverage and code coverage. Where possible we try to operationally reuse code reflection features, such as when storing and loading models in class files.
We need to ensure that Java code models produced by the compiler preserve Java
program meaning. We will select an existing suite of Java tests and recompile
the source they test, using a special javac internal flag, such that the
bytecode is generated from code models produced by the compiler. Testing against
these specially compiled sources must yield the same results as testing against
the ordinarily compiled sources.
Risks and Assumptions
While incubating we will strive to keep the number of changes required to code
in the java.base and jdk.compiler modules to a minimum, thereby reducing the
burden on maintainers and reviewers. So far the changes are modest.
Introduction of a new language feature, even a modest one, is a significant effort with numerous tasks to update many areas of the platform. Code reflection will add to that list of tasks, since the language feature will need to be modeled and supported like existing modeled features. There is a risk it will require significant effort to model, especially with high fidelity. We think this risk is mitigated by the generic modeling capabilities of code models, and that we can currently model all Java statements and expressions with high fidelity.