JEP draft: Code reflection (Incubator)
| Owner | Paul Sandoz |
| Type | Feature |
| Scope | JDK |
| Status | Draft |
| Component | core-libs |
| Effort | L |
| Duration | L |
| Reviewed by | Maurizio Cimadamore |
| Created | 2025/06/30 19:54 |
| Updated | 2026/01/29 09:11 |
| Issue | 8361105 |
Summary
Enhance core reflection with a standard API to model Java code, build and transform models of Java code, and access models of Java code in methods and lambda expressions. Libraries can use this API to analyze Java code and extend its reach, such as executing it as code on GPUs. This is an incubating API.
Goals
- Enable Java developers to interface with non-Java (foreign) programming models using familiar Java language constructs, such as lambda expressions and static typing.
- Encourage libraries to expose novel programming models to Java developers without requiring developers to embed non-Java code inside Java code, or to write tedious Java code that builds data structures to model Java code or other (foreign) code.
- Enable access at run time to a high fidelity model of Java code, specifically code in a method or lambda expression.
- Provide APIs for building models of Java code and transforming them to Java code or other (foreign) code.
Non-Goals
- It is not a goal to change the meaning of Java programs as specified by the Java Language Specification, compile Java source code to anything other than the instruction set as specified by the Java Virtual Machine specification, change the JVM’s instruction set, and change HotSpot to support instruction sets of specialized processing units. For example, it is not a goal to make such changes to the Java platform to execute Java methods on GPUs.
- It is not a goal to standardize the internal Abstract Syntax Tree of
javacto serve as the model for Java code. - It is not a goal to enable access at run time to bytecode and for it to serve as the model for Java code.
- It is not a goal to devise a general metaprogramming or macro facility for the Java language.
- It is not a goal to introduce language constructs, like class literals, to concisely express access to a model of code.
Enabling the incubating API
Code reflection is an incubating API, disabled by default. The code
reflection API is offered in the incubator module jdk.incubator.code. To try
out code reflection you must request that the incubator module
jdk.incubator.code be resolved:
- Compile the program with
javac --add-modules jdk.incubator.code Main.javaand run it withjava --add-modules jdk.incubator.code Main; or - When using the source code launcher, run the program with
java --add-modules jdk.incubator.code Main.java;, or - When using jshell, start it with
jshell --add-modules jdk.incubator.code.
For the benefit of readers wishing to follow along in more detail it is
possible to start jshell with
jshell -R-ea --enable-preview --add-modules jdk.incubator.code
and copy code snippets associated with the Example class, in order, into the
jshell session.
Motivation
Core reflection is a powerful feature that enables inspection of Java code at run time. For example, consider the following Java code that we want to inspect, a class containing a field and method, and another class also containing a field and method.
static class Example {
static Runnable R = () -> IO.println("Example:field:R");
static int add(int a, int b) {
IO.println("Example:method:add");
return a + b;
}
static class Nested {
static Runnable R = () -> IO.println("Example.Nested:field:R");
void m() { IO.println("Example.Nested:method:m"); }
}
}
We can write a simple stream that uses core reflection and traverses program structure, a tree of annotated elements, starting from a given class and reporting elements in a topological order.
static Stream<AnnotatedElement> elements(Class<?> c) {
return Stream.of(c).mapMulti((e, mapper) -> traverse(e, mapper));
}
private static void traverse(AnnotatedElement e,
Consumer<? super AnnotatedElement> mapper) {
mapper.accept(e);
if (e instanceof Class<?> c) {
for (Field df : c.getDeclaredFields()) { traverse(df, mapper); }
for (Method dm : c.getDeclaredMethods()) { traverse(dm, mapper); }
for (Class<?> dc : c.getDeclaredClasses()) { traverse(dc, mapper); }
}
}
(AnnotatedElement is the common super type of Class,
Field, and Method.)
The traverse method recursively traverses a class’s declared fields, methods
and classes. Starting from Example, using a class literal expression, we can
print out the classes, fields, and methods we encounter.
elements(Example.class)
.forEach(IO::println);
More interestingly we can perform some simple analysis, such as counting the
number of static fields whose type is Runnable.
static boolean isStaticRunnableField(Field f) {
return f.accessFlags().contains(AccessFlag.STATIC)
&& Runnable.class.isAssignableFrom(f.getType());
}
assert 2 == elements(Example.class)
.filter(e -> e instanceof Field f && isStaticRunnableField(f))
.count();
However, if we want to perform some analysis of the code in the lambda expressions and methods we are out of luck. Core reflection can only inspect the classes, fields, and methods – it provides no facility to go deeper and inspect code. This can severely limit what Java libraries can do, such as a library that wants to expose a novel parallel programming model and execute parallel programs on specialized hardware.
Parallel programming
Many Java programs need to process large amounts of data in parallel, and Java libraries make it easy to implement parallel computations. For example, in a face detection algorithm, we need to convert RGB pixels to grayscale; here is simplified code to do that using a lambda expression and the parallel streams built into the JDK:
IntConsumer rgbToGray = i -> {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
};
IntStream.range(0, N)
.parallel()
.forEach(rgbToGray);
If the number of pixels N is sufficiently large and/or the work to compute
each pixel is sufficiently demanding, then the stream will compute the result
faster than a single-threaded for loop, even with the overhead of starting and
coordinating multiple threads.
for (int i = 0; i < N; i++) {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
}
Gustafson's Law states that as we increase the number of
threads M, each working on a sufficiently large number of pixels, the
estimated speed up of a program will approach M as the fraction of time spent
on parallel tasks grows.
Unfortunately, the number of threads that can run compute-intensive tasks is limited by the CPU, e.g., an AMD EPYC 9005 Zen 5c has 384 threads. Java 21 introduced virtual threads to run large numbers of I/O-intensive tasks in parallel, but virtual threads do not create new compute resources and cannot speed up code that is already CPU-bound.
General-purpose computing with Graphics Processing Units
There is a class of computing device, the Graphics Processing Unit (GPU), whose architecture is very different to the CPU: rather than a few hundred threads, a modern GPU such as an NVIDIA Blackwell B200 GPU can simultaneously execute a few hundred thousand threads.
Originally GPUs were designed for rendering images and video games, but now we can use them for general-purpose computations such as face detection, General Matrix Multiplication (GEMM), or Fast Fourier Transformation (FFT).
If we could run parallel tasks on a GPU instead of a CPU, with orders of magnitude more threads, we could either greatly reduce the execution time or compute more in the same execution time.
Historically one approach was to write the multithreaded computation in a language supported by the GPU, e,g., CUDA C, and embed it as a string in a Java program. You could then use JNI to run the CUDA C compiler and transfer the compiled code to the GPU for execution.
static void gpuComputation(int N, byte[] rgbImage, byte[] grayImage) {
var cudaCCode = """
__device__
char gray(char r, char g, char b) { return ...; }
__global__
void computeGrayImage(int N, char* rgbImage, char* grayImage) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < N) {
char r = rgbImage[i * 3 + 0];
char g = rgbImage[i * 3 + 1];
char b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
}
}
""";
var kernel = compileGpuCode(cudaCCode);
executeGpuCode(kernel, N, rgbImage, grayImage);
}
This approach is problematic: it presents a leaky abstraction that forces developers to be familiar with CUDA artifacts, and the code is no longer hardware independent. Expecting developers to write Java code that merely carries non-Java code is misguided since javac cannot compile and check that code; they can choose any language, not just Java, to carry CUDA C.
Exploring Better Abstractions
In the 2010s, OpenJDK Project Sumatra aimed to let Java developers take advantage of GPUs by enhancing the JVM and parallel streams. The Sumatra JVM could generate code for AMD GPUs and place parts of the JVM’s heap in GPU memory. This approach, where the Java Platform obscures the presence of a GPU from Java code, is in stark contrast to manually embedding GPU code into a Java program. Neither approach provides the right abstraction.
Obscuring the GPU is particularly challenging. First, memory is split between CPU and GPU; managing the JVM’s heap across the CPU and GPU can be a continual drag on performance. Second, the idiomatic Java code in lambda expressions is polymorphic: methods are commonly invoked on interfaces rather than classes and each invocation triggers virtual method lookup and possibly class loading, initialization, etc. It is counterproductive to bring this highly variable behavior, where each thread may run different code, to the GPU, where each thread is intended to run identical code in lock step on different data elements using a Single Instruction Multiple Thread (SIMT) execution model.
Empowering libraries
We believe the best way to support GPUs in the Java Platform is to introduce primitives that enable the creation of libraries which, in turn, introduce novel programming models and APIs that harness the unique memory and execution capabilities of GPUs.
One such primitive is the Foreign Function & Memory (FFM) API, introduced in Java 22. While the FFM API has no built-in knowledge of GPUs, it allows libraries to interact efficiently with native device drivers on the CPU and thereby control the GPU indirectly.
If libraries are to translate the parallel parts of Java programs to GPU code, they need high-fidelity access to Java code. Fortunately, the Java Platform has a longstanding primitive – reflection -- which allows a library to inspect the structure of a Java program. Java 1.1 introduced reflection at run time, core reflection, kick-starting an ecosystem of libraries for data access, unit testing, messaging, etc. Java 5 introduced reflection at compile time, allowing annotation processors to generate code that extends the application with no maintenance overhead.
Unfortunately, as we have shown, reflection is limited: it does not provide high-fidelity access to the code in methods or lambda expressions.
A library can access the source code of methods and lambdas with internal APIs
of javac, but this is only available at compile time and is too complex since
it contains many extraneous syntactic details. At run time, a library can access
the bytecode of methods (but not lambdas) with the Class-File API, but
this is a poor substitute for source code and class files are not always
available. As such, libraries can have complex high-fidelity code with low
availability, or low-fidelity code with low availability.
Code Reflection: The Missing Primitive
To support libraries effectively, we propose to enhance reflection to expose not just classes, fields, and methods but also the code of methods and lambda expressions. With this enhancement, we can develop libraries that translate Java code, e.g., the lambda expression used in a parallel stream, into GPU code, eliminating the need to manually write CUDA C code. With knowledge of both Java code and GPU code, libraries can model data dependencies and optimize data transfer between CPU and GPU for better performance.
Just as the FFM API is not specific to GPUs, an API providing access to Java code is not specific to GPUs. Libraries could use it to, e.g., automatically differentiate Java code, pass translated Java code to native machine learning runtimes, or translate Java code to SQL statements.
Description
We propose to enhance core reflection with code reflection. The code reflection API supports access to a model of code in a method or lambda expression, a code model, at run time that is suited for analysis and transformation.
We shall introduce code reflection by continuing with the two examples we
presented earlier, extending analysis to the code in the Example class and
then describing how a library can use code reflection to translate Java code to
GPU code. Then we shall describe code reflection in more detail.
Let’s update our Example class so that the code of lambda expressions and
methods is accessible just like the fields and methods.
import jdk.incubator.code.*;
import jdk.incubator.code.bytecode.*;
import jdk.incubator.code.dialect.core.*;
import jdk.incubator.code.dialect.java.*;
import static jdk.incubator.code.dialect.core.CoreOp.*;
import static jdk.incubator.code.dialect.java.JavaOp.*;
static class Example {
@Reflect
static Runnable R = () -> IO.println("Example:field:R");
@Reflect
static int add(int a, int b) {
IO.println("Example:method:add");
return a + b;
}
static class Nested {
@Reflect
static Runnable R = () -> IO.println("Example.Nested:field:R");
@Reflect
void m() { IO.println("Example.Nested:method:m"); }
}
}
We declare the lambda expressions and methods are reflectable by annotating
their declarations with @Reflect. By doing so we grant access to their code.
When the source of the Example class is compiled by javac it translates its
internal model of method add’s code to a standard model, called a code
model, and stores the code model in a class file related to the Example class
file where add’s code is compiled to bytecode. (The same occurs for the other
method and lambda expressions.)
A code model is an immutable tree of code elements, where each element models some Java statement or expression (for further details see the Code models section).
We can use the code reflection API to access the code model of an annotated element, which loads the corresponding code model that was stored in the related class file.
static Object getStaticFieldValue(Field f) {
try { return f.get(null); }
catch (IllegalAccessException e) { throw new RuntimeException(e); }
}
static Optional<? extends CodeElement<?, ?>> getCodeModel(AnnotatedElement ae) {
return switch (ae) {
case Method m -> Op.ofMethod(m);
case Field f when isStaticRunnableField(f) ->
Op.ofLambda(getStaticFieldValue(f)).map(Quoted::op);
default -> Optional.empty();
};
}
(Note: since code reflection is an incubating API we cannot add new APIs in
packages of other modules, such as in the java.lang.reflect package of the
java.base module. For now, we must provide such methods in the incubating code
reflection module.)
The method getCodeModel returns the code model for a reflectable method or
lambda expression, a code element that is the root of the code model tree. By
default methods and lambda expressions are not reflectable, so we return an
optional value. If the annotated element is a method we retrieve the code model
from the method. If the annotated element is a static field whose type is
Runnable we access its value, an instance of Runnable whose result is
produced from a lambda expression, and from that instance we retrieve the lambda
expression’s code model. The retrieval is slightly different for lambda
expressions since they can capture values (for more details see the
Declaring reflectable code section).
We can use getCodeModel to map from Example’s annotated elements to their
code models.
elements(Example.class)
// AnnotatedElement -> CodeModel?
.flatMap(ae -> getCodeModel(ae).stream())
.forEach(IO::println);
More interestingly we can now perform some simple analysis of code, such as extracting the values of the string literal expressions that are printed.
static final MethodRef PRINTLN = MethodRef.method(IO.class, "println",
void.class, Object.class);
static Optional<String> isPrintConstantString(CodeElement<?, ?> e) {
if (e instanceof InvokeOp i &&
i.invokeDescriptor().equals(PRINTLN) &&
i.operands().get(0).declaringElement() instanceof ConstantOp cop &&
cop.value() instanceof String s) {
return Optional.of(s);
} else {
return Optional.empty();
}
}
static List<String> analyzeCodeModel(CodeElement<?, ?> codeModel) {
return codeModel.elements()
// CodeElement -> String?
.flatMap(e -> isPrintConstantString(e).stream())
.toList();
}
The method analyzeCodeModel streams over all elements of a code model and
returns the list of string literal values passed to invocations of IO.println.
The code to match such an invocation is straightforward but verbose, and
therefore can be hard to read. We hope to address this in a future JEP by using
future advancements in pattern matching, specifically the capability to
declare member patterns. Until then we will avoid making near
term improvements that we think can be better solved using better pattern
matching.
We can then use analyzeCodeModel to further refine our steam expression to
print out all such string literal values.
elements(Example.class)
// AnnotatedElement -> CodeModel?
.flatMap(ae -> getCodeModel(ae).stream())
// CodeModel -> List<String>
.map(codeModel -> analyzeCodeModel(codeModel))
.forEach(IO::println);
Translating Java code to GPU code
With code reflection, a library can generate CUDA C code from Java code. Recall the lambda and stream based example.
IntConsumer rgbToGray = i -> {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
};
IntStream.range(0, N)
.parallel()
.forEach(rgbToGray);
First, we declare that the lambda expression is reflectable and thereby grant
access to its code. We do so by casting our lambda expression to the target
interface annotated with @Reflect.
IntConsumer rgbToGray = (@Reflect IntConsumer) i -> {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
};
We use the code reflection API to access the lambda expression’s code model.
var rgbToGrayModel = Op.ofLambda(rgbToGray).orElseThrow().op();
Once we have the Java code model we can pass it to our GPU library.
String cudaCCode = translateJavaCodeToGpuCode(rgbToGrayModel);
The GPU library uses code reflection APIs to traverse the code model and translate it to CUDA C code embedded in a string, after which the example proceeds as before to compile the CUDA C code and execute it. Ordinarily the GPU library would call the methods to translate, compile and execute on behalf of the user, so the user simply passes the reflectable lambda expression as an argument.
dispatchKernel((@Reflect IntConsumer) i -> {
byte r = rgbImage[i * 3 + 0];
byte g = rgbImage[i * 3 + 1];
byte b = rgbImage[i * 3 + 2];
grayImage[i] = gray(r, g, b);
});
As the GPU library traverses the code model it will encounter an element that
models the invocation expression to the gray method. This method also needs to
be translated to CUDA C code, otherwise we will generate an incomplete CUDA
program. However, the library has no intrinsic understanding of what this method
does. The library needs the code model of this method so that it can traverse
and translate like was done with the lambda expression’s code model.
To achieve this we must declare that the gray method is also reflectable and
thereby grant access to its code. We do so by also annotating our method with
the @Reflect annotation.
@Reflect
static byte gray(byte r, byte g, byte b) {
return ...;
}
The library uses code reflection to traverse from the lambda expression’s code model to the gray method’s code model, accessing the code model of the method.
Foreign programming models
GPU programming is an example of a foreign programming model, more specifically for the GPU example the CUDA C programming model, specified by NVIDIA. As presented we can use code reflection to develop a GPU library that translates Java code to CUDA C code. The GPU library specifies the rules as to what constitutes GPU Java code. Those rules are foreign to the Java programming model as specified by the Java Language Specification, which knows nothing about GPU Java code. We can then use the CUDA runtime to compile and execute the CUDA C code. So not only do we leverage a foreign programming model, but we also leverage foreign code and a foreign runtime. Thanks to code reflection and the Foreign Function & Memory API the Java world can embrace a foreign world and orchestrate complex activity between the two.
Declaring reflectable code
We have previously shown how to declare reflectable lambda expressions and
methods, using the @Reflect annotation, and access their code models using the
code reflection API. For the purposes of incubation we can only incubate APIs,
so we must avoid any changes to language syntax and semantics. In some future
non-incubating JEP we might devise a new language feature. Until then use of the
annotation serves as a temporary declarative mechanism that is good enough for
experimentation.
Declaration serves two purposes. First, we explicitly grant that other parts of
our Java application may have run time access to the code, such as a library we
may not be directly responsible for. Not all code needs to be reflected over,
and not all code should, so we can reduce to only that which is necessary to
share. Second, it informs javac it needs to perform additional tasks, so that
a code model can be produced and is made accessible at run time.
In total there are four syntactic locations where @Reflect can appear that
governs, in increasing scope, what is declared reflectable.
- If the annotation appears in a cast expression of a lambda expression (or method reference), annotating the use of the type in the cast operator of the cast expression, then the lambda expression is declared reflectable.
- If the annotation appears as a modifier for a field declaration or a local variable declaration, annotating the field or local variable, then any lambda expressions (or method references) in the variable initializer expression (if present) are declared reflectable. This is useful when cast expressions become verbose and/or types become hard to reason about. For example, with fluent stream-like expressions where many reflectable lambda expressions are passed as arguments.
- Finally, if the annotation appears as a modifier for a non-abstract method declaration, annotating the method, then the method and any lambda expressions (or method references) it contains are declared reflectable.
The annotation is ignored if it appears in any other valid syntactic location.
Declaring a reflectable lambda expression or method does not implicitly broaden
the scope of what is reflectable to methods they invoke. (In the GPU example we
needed to annotate the gray method.) Furthermore, declaring a reflectable
lambda expression does broaden the scope to the surrounding code of final, or
effectively final, variables used but not declared in the lambda expression.
We access the code model of a reflectable method by invoking the method
Op.ofMethod with a given Method instance, which returns an optional instance
of the code model, a root code element. The root code element models the method
declaration (see the Code models section).
We access the code model of a reflectable lambda expression by invoking the
method Op.ofLambda with a given instance of a functional interface associated
with the lambda expression, which returns an optional instance of
Quoted<JavaOp.LambdaOp>. From the Quoted instance we can obtain the root
code element that models the lambda expression. In addition, we can obtain a
mapping of run time values to items in the code model that model final, or
effectively final, variables used but not declared in the lambda expression.
Code Models
A code model is an immutable instance of data structures that can, in general, model many kinds of code, be it Java code or foreign code. It has some properties like an Abstract Syntax Tree (AST) used by a source compiler, such as modeling code as a tree of arbitrary depth, and some properties like an intermediate representation used by an optimizing compiler, such as modeling control flow and data flow as graphs. These properties ensure code models can preserve many important details of code they model and ensure code models are suited for analysis and transformation.
The primary data structure of a code model is a tree of code elements. There are three kinds of code elements, operation, body, and block. The root of a code model is an operation, and descendant operations form a tree of arbitrary depth. We shall see more in subsequent sections.
The code reflection API supports representing the data structures of a code model, code elements for modeling Java language constructs and behavior, traversing code models, building code models, and transforming code models. We shall explain with further examples.
Traversing code models
We shall continue with our Example class, reflecting over the add method,
accessing method’s code model, and traversing to print the model’s tree
structure.
var addMethod = Example.class.getDeclaredMethod("add", int.class, int.class);
FuncOp addModel = Op.ofMethod(addMethod).orElseThrow();
assert addModel == Op.ofMethod(addMethod).orElseThrow();
We access the method’s code model as we have previously shown. The root of the
code model is an operation, an instance of FuncOp that is a function
declaration operation modeling the method. Further, we assert that if we obtain
the code model for a second time the same instance is returned. The identity
of items in the code model are stable, and therefore they can be used as stable
keys for associating items with other information.
One way to traverse the code model is to write a recursive method that iterates over code elements and their children. That way we can get a sense of what a code model contains.
static void traverse(int depth, CodeElement<?, ?> e) {
IO.println(" ".repeat(depth) + e.getClass());
for (CodeElement<?, ?> c : e.children()) {
traverse(depth + 1, c);
}
}
traverse(0, addModel);
The traverse method prints out the class of the code element it encounters and
prefixes that with white space proportionate to the depth of the element in the
code model tree.
jshell> traverse(0, addModel);
class jdk.incubator.code.dialect.core.CoreOp$FuncOp
class jdk.incubator.code.Body
class jdk.incubator.code.Block
class jdk.incubator.code.dialect.core.CoreOp$VarOp
class jdk.incubator.code.dialect.core.CoreOp$VarOp
class jdk.incubator.code.dialect.core.CoreOp$ConstantOp
class jdk.incubator.code.dialect.java.JavaOp$InvokeOp
class jdk.incubator.code.dialect.core.CoreOp$VarAccessOp$VarLoadOp
class jdk.incubator.code.dialect.core.CoreOp$VarAccessOp$VarLoadOp
class jdk.incubator.code.dialect.java.JavaOp$AddOp
class jdk.incubator.code.dialect.core.CoreOp$ReturnOp
We can observe that the top of the tree is the FuncOp which contains one
child, a Body, which in turn contains one child, a Block, which in turn
contains a sequence of eight operations. Bodies and blocks provide additional
structure for modeling code. Each operation models some part of the methods
code, for example variable declaration operations (instances of VarOp) model
Java variable declarations, in this case the method parameters, and the add
operation (instance of AddOp) models the Java + operator.
Alternatively, we can stream over elements of the code model (as we did
previously when analyzing the code for string literals) in the same
topologically sorted order using the CodeElement.elements method:
addModel.elements().forEach((CodeElement<?, ?> e) -> {
int depth = 0;
var parent = e;
while ((parent = parent.parent()) != null) depth++;
IO.println(" ".repeat(depth) + e.getClass());
});
We compute the depth for each code element by traversing back up the code model tree until the root element is reached. So, it is possible to traverse up and down the code model tree.
To get a superior sense of what the code model contains we can convert it to a text string and print it out.
IO.println(addModel.toText());
The toText method will traverse the code elements in a similar manner as we
presented but print out more detail.
func @loc="22:5:string:///REPL/$JShell$8D.java" @"add" (
%0 : java.type:"int", %1 : java.type:"int")java.type:"int" -> {
%2 : Var<java.type:"int"> = var %0 @loc="22:5" @"a";
%3 : Var<java.type:"int"> = var %1 @loc="22:5" @"b";
%4 : java.type:"java.lang.String" = constant @loc="24:20" @"Example:method:add";
invoke %4 @loc="24:9" @java.ref:"java.lang.IO::println(java.lang.Object):void";
%5 : java.type:"int" = var.load %2 @loc="25:16";
%6 : java.type:"int" = var.load %3 @loc="25:20";
%7 : java.type:"int" = add %5 %6 @loc="25:16";
return %7 @loc="25:9";
};
A code model’s text is designed to be human-readable, primarily intended for debugging and testing. It is also invaluable for explaining code models. To aid debugging each operation has line number information, and the root operation also has source information from where the code model originated. Also notice how the text output mirrors the structure of the source code.
The code model text shows the code model’s root element is a function
declaration (func) operation. The lambda-like expression represents the fusion
of the function declaration operation’s single body and the body’s first and
only block, called the entry block. Then there is a sequence of operations in
the entry block. For each operation there is an instance of a corresponding Java
class, all of which extend from the abstract class jdk.incubator.code.Op and
which have already seen when we printed out the classes. Unsurprisingly the
printed operations and printed operation classes occur in the same order since
the toText method traverses the model in the same order as we explicitly
traversed.
The entry block declares two values called block parameters, %0 and %1,
which model the method’s initial values for parameters a and b. The method
parameter declarations are modeled as embedded var operations, each
initialized with a corresponding block parameter used as the var operation’s
single operand. The var operations produce values called operation results,
variable values %2 and %3, which model the variables a and b. A variable
value can be loaded from or stored to using variable access operations,
respectively modeling an expression that denotes a variable and assignment to a
variable. The expressions denoting parameters a and b are modeled as
var.load operations that use the variable values %2 and %3 respectively as
operands. The operation results of these operations are used as operands of
subsequent operations and so on, e.g., %7 the result of the add operation
modeling the + operator is used as an operand of the return operation modeling
the return statement.
The source code of our add method might contain all sorts of syntactic details
that javac rightly needs to know about but are extraneous for modeling
purposes. This complexity is not present in the code model. For example, the
same code model would be produced if the return statement’s expression was
((a) + (b)) instead of a + b.
In addition to the code model containing code elements forming a tree it also contains other code items, values (block parameters or operation results) we previously introduced, that form bidirectional dependency graphs between their declaration and their use. A value also has a type element, another code item, modeling the set of all possible values. In our example many of the type elements model Java types, and some model the type of variable values (the type element of the operation result of a var operation). In summary a code model contains five kinds of code item, operation, body, block, value, and type element.
Astute readers may observe that code models are in Static Single-Assignment (SSA) form, and there is no explicit distinction, as there is in the source code, between statements and expressions. Block parameters and operation results are declared before they are used and cannot be reassigned (and we therefore require special operations and type elements to model variables as we previously showed).
Finally, we can execute the code model by transforming it to byte code, wrapping it in a method handle, and invoking the handle.
var handle = BytecodeGenerator.generate(MethodHandles.lookup(), addModel);
assert ExampleAdd.add(1, 1) == (int) handle.invokeExact(1, 1);
Building code models
The code reflection API provides functionality to build code models. We can use the API to build an equivalent model we previously accessed and traversed.
var builtAddModel = func(
"add",
CoreType.functionType(JavaType.INT, JavaType.INT, JavaType.INT))
.body((Block.Builder builder) -> {
// Check the entry block parameters
assert builder.parameters().size() == 2;
assert builder.parameters().stream().allMatch(
(Block.Parameter param) -> param.type().equals(JavaType.INT));
// int a
VarOp varOpA = var("a", builder.parameters().get(0));
Op.Result varA = builder.op(varOpA);
// int b
VarOp varOpB = var("b", builder.parameters().get(1));
Op.Result varB = builder.op(varOpB);
// IO.println("A:method:m")
builder.op(invoke(PRINTLN,
builder.op(constant(JavaType.J_L_STRING, "A:method:m"))));
// return a + b;
builder.op(return_(
builder.op(add(
builder.op(varLoad(varA)),
builder.op(varLoad(varB))))));
});
IO.println(builtAddModel.toText());
The consuming lambda expression passed to the body method operates on a block
builder, instance of Block.Builder, representing the entry block being built.
We use that to append operations to the entry block. When an operation is
appended it produces an operation result that can be used as an operand of a
further operation and so on. When the body method returns a body element and
the entry block element it contains will be fully built.
Notice how building, like the text output, mirrors the source code structure. Building is carefully designed so that structurally invalid models cannot be built. We can approximately test equivalence with our previously accessed model as follows.
var builtAddModelElements = builtAddModel.elements()
.map(CodeElement::getClass).toList();
var addModelElements = addModel.elements()
.map(CodeElement::getClass).toList();
assert builtAddModelElements.equals(addModelElements);
We don’t anticipate most users will commonly build complete models of Java code,
since it’s a rather verbose and tedious process, although potentially less so
than other approaches e.g., building byte code, or using code combinators that
must be built from the inside out. Javac already knows how to build
models. In fact, javac uses the same API to build models, and the run time
uses it to produce models that are accessed. Instead we anticipate many users
will build parts of models when they transform them.
Transforming code models
The code reflection API supports the transformation of code models by combining traversing and building. A code model transformation is represented by a function that takes an operation, encountered in the (input) model being transformed, and a code model builder for the resulting transformed (output) model, and mediates how, if at all, that operation is transformed into other code elements that are built. We were inspired by the functional transformation approach devised by the Class-File API and adapted that design to work on the nested structure of immutable code model trees.
We can write a simple code model transform that transforms our method’s code
model, replacing the operation modeling the + operator with an invocation
operation modeling an invocation expression to the method Integer.sum.
static final MethodRef SUM = MethodRef.method(Integer.class, "sum", int.class,
int.class, int.class);
CodeTransformer addToMethodTransformer = CodeTransformer.opTransformer((
Function<Op, Op.Result> builder,
Op inputOp,
List<Value> outputOperands) -> {
switch (inputOp) {
// Replace a + b; with Integer.sum(a, b);
case AddOp _ -> builder.apply(invoke(SUM, outputOperands));
// Copy operation
default -> builder.apply(inputOp);
}
});
The code transformation function, passed as lambda expression to
CodeTransformer.opTransformer, accepts as parameters a block builder function,
builder, an operation encountered when traversing the input code model,
inputOp, and a list of values in the output model being built that are
associated with input operation’s operands, outputOperands. We must have
previously encountered and transformed the input operations whose results are
associated with those values, since values can only be used after they have been
declared.
In the code transformation we switch over the input operation, and in this case
we just match on add operation and by default any other operation. In the latter
case we apply the input operation to the builder function, which creates a new
output operation that is a copy of the input operation, appends the new
operation to the block being built, and associates the new operation’s result
with the input operation’s result. When we match on an add operation we
replace it by building part of a code model, a method invoke operation to the
Integer.sum method constructed with the given output operands. The result of
the output invoke operation is automatically associated with the result of the
input add operation.
We can then transform the method’s code model by calling the FuncOp.transform
method and passing the code transformer as an argument.
FuncOp transformedAddModel = addModel.transform(addToMethodTransformer);
IO.println(transformedAddModel.toText());
The transformed code model is naturally very similar to the input code model.
func @loc="22:5:string:///REPL/$JShell$8D.java" @"add" (
%0 : java.type:"int", %1 : java.type:"int")java.type:"int" -> {
%2 : Var<java.type:"int"> = var %0 @loc="22:5" @"a";
%3 : Var<java.type:"int"> = var %1 @loc="22:5" @"b";
%4 : java.type:"java.lang.String" = constant @loc="24:20" @"Example:method:add";
invoke %4 @loc="24:9" @java.ref:"java.lang.IO::println(java.lang.Object):void";
%5 : java.type:"int" = var.load %2 @loc="25:16";
%6 : java.type:"int" = var.load %3 @loc="25:20";
%7 : java.type:"int" = invoke %5 %6 @java.ref:"java.lang.Integer::sum(int, int):int";
return %7 @loc="25:9";
};
We can observe the add operation has been replaced with the invoke
operation. Also, by default, each operation that was copied preserves line
number information. The code transformation function can also be applied
unmodified to more complex code containing many + operators in arbitrarily
nested positions. (Such application is left as an exercise for the curious
reader.)
The code transformation function is not a direct implementation of functional
interface CodeTransformer. Instead we adapted from another functional
interface, which is easier to implement for simpler transformations on
operations. Direct implementations of CodeTransformer are more complex but are
also capable of more complex transformations, such as building new blocks and
retaining more control over associating values in the input and output models.
The code reflection API provides many complex code transformers, such as those
for progressively lowering code models, converting models into pure SSA-form,
and inlining models into other models. We will continue to explore the code
model transformation design to better understand how we can improve the API
across the spectrum of simple to complex transformations.
Alternatives
Compiler Tree API
The com.sun.source package of the jdk.compiler module contains the javac
API for accessing the abstract trees (ASTs) representing Java source
code. Javac uses an implementation of this API when parsing source code. This
API is not suitable for standardization as it is too intertwined with javac’s
implementation, since javac reserves the right to make breaking changes to
this API as the language evolves. More generally ASTs can be difficult to
analyze and transform. For example, a modern optimizing compiler will transform
its AST representing source code into another slightly lower form, an
intermediate representation, that is easier to analyze and transform to
executable code.
Bytecode
Bytecode is not easily accessible nor guaranteed to be so at run time, and even if we made it so it would not be ideal. The translation of Java source to bytecode by javac will result in numerous Java language features being translated away, making it hard to recover them e.g., lambda expressions are translated into invoke dynamic instructions and synthetic methods. Bytecode is also, by default, too low-level which makes it difficult to analyze and transform. For example, the HotSpot C2 compiler will transform bytecode into another higher form, an intermediate representation, that is easier to analyze and transform to executable code.
Testing
Testing will focus on a suite of unit tests for the compiler and runtime that give high modeling coverage and code coverage. Where possible we try to operationally reuse code reflection APIs, such as when storing and loading models in class files.
We need to ensure that Java code models produced by the compiler preserve Java program meaning. We will select an existing suite of Java tests and recompile the source they test, using a special javac internal flag, such that the bytecode is generated from code models produced by the compiler. Testing against these specially compiled sources must yield the same results as testing against the ordinarily compiled sources.
Risks and Assumptions
While incubating we will strive to keep the number of changes required to code
in the java.base and jdk.compiler modules to a minimum, thereby reducing the
burden on maintainers and reviewers. So far the changes are modest.
Introduction of a new language feature, even a modest one, is a significant effort with numerous tasks to update many areas of the platform. Code reflection will add to that list of tasks, since the language feature will need to be modeled and supported like existing modeled features. There is a risk it will require significant effort to model, especially with high fidelity. We think this risk is mitigated by the generic modeling capabilities of code models, and that we can currently model all Java statements and expressions with high fidelity.
Future work
We shall explore access to code models at compile time. The code reflection API provides very basic support for annotation processors to access code models of program elements, and while useful for advanced experimentation it needs more consideration.
As the language evolves we shall look for opportunities to enhance the code reflection API to take advantage of new language features, especially features related to pattern matching and data-oriented programming. We anticipate pattern matching will strongly influence the code reflection API and enhance the querying of code models. Furthermore, this is an opportunity to provide feedback on language features.
We need to explore the language feature for declaration of reflectable code. Use
of the @Reflect annotation is a temporary solution that is good enough for
incubation but insufficient for preview.
We need to ensure that a library using code reflection can operate on code models produced by a JDK version that is greater that the version it was compiled against. Such forward compatibility is challenging. We shall explore solutions, such as a library declaring an upper bound of JDK versions of reflective code it supports or enabling the lowering of a modeled language feature a library does not know about to modeled features it does (potentially compromising high fidelity but still preserving programing meaning).