JEP draft: Code reflection (Incubator)

OwnerPaul Sandoz
TypeFeature
ScopeJDK
StatusDraft
Componentcore-libs
EffortL
DurationL
Reviewed byMaurizio Cimadamore
Created2025/06/30 19:54
Updated2026/01/29 09:11
Issue8361105

Summary

Enhance core reflection with a standard API to model Java code, build and transform models of Java code, and access models of Java code in methods and lambda expressions. Libraries can use this API to analyze Java code and extend its reach, such as executing it as code on GPUs. This is an incubating API.

Goals

  1. Enable Java developers to interface with non-Java (foreign) programming models using familiar Java language constructs, such as lambda expressions and static typing.
  2. Encourage libraries to expose novel programming models to Java developers without requiring developers to embed non-Java code inside Java code, or to write tedious Java code that builds data structures to model Java code or other (foreign) code.
  3. Enable access at run time to a high fidelity model of Java code, specifically code in a method or lambda expression.
  4. Provide APIs for building models of Java code and transforming them to Java code or other (foreign) code.

Non-Goals

  1. It is not a goal to change the meaning of Java programs as specified by the Java Language Specification, compile Java source code to anything other than the instruction set as specified by the Java Virtual Machine specification, change the JVM’s instruction set, and change HotSpot to support instruction sets of specialized processing units. For example, it is not a goal to make such changes to the Java platform to execute Java methods on GPUs.
  2. It is not a goal to standardize the internal Abstract Syntax Tree of javac to serve as the model for Java code.
  3. It is not a goal to enable access at run time to bytecode and for it to serve as the model for Java code.
  4. It is not a goal to devise a general metaprogramming or macro facility for the Java language.
  5. It is not a goal to introduce language constructs, like class literals, to concisely express access to a model of code.

Enabling the incubating API

Code reflection is an incubating API, disabled by default. The code reflection API is offered in the incubator module jdk.incubator.code. To try out code reflection you must request that the incubator module jdk.incubator.code be resolved:

For the benefit of readers wishing to follow along in more detail it is possible to start jshell with

jshell -R-ea --enable-preview --add-modules jdk.incubator.code

and copy code snippets associated with the Example class, in order, into the jshell session.

Motivation

Core reflection is a powerful feature that enables inspection of Java code at run time. For example, consider the following Java code that we want to inspect, a class containing a field and method, and another class also containing a field and method.

static class Example {
    static Runnable R = () -> IO.println("Example:field:R");
    static int add(int a, int b) {
        IO.println("Example:method:add");
        return a + b;
    }

    static class Nested {
        static Runnable R = () -> IO.println("Example.Nested:field:R");
        void m() { IO.println("Example.Nested:method:m"); }
    }
}

We can write a simple stream that uses core reflection and traverses program structure, a tree of annotated elements, starting from a given class and reporting elements in a topological order.

static Stream<AnnotatedElement> elements(Class<?> c) {
    return Stream.of(c).mapMulti((e, mapper) -> traverse(e, mapper));
}
private static void traverse(AnnotatedElement e,
                             Consumer<? super AnnotatedElement> mapper) {
    mapper.accept(e);
    if (e instanceof Class<?> c) {
        for (Field df : c.getDeclaredFields()) { traverse(df, mapper); }
        for (Method dm : c.getDeclaredMethods()) { traverse(dm, mapper); }
        for (Class<?> dc : c.getDeclaredClasses()) { traverse(dc, mapper); }
    }
}

(AnnotatedElement is the common super type of Class, Field, and Method.) The traverse method recursively traverses a class’s declared fields, methods and classes. Starting from Example, using a class literal expression, we can print out the classes, fields, and methods we encounter.

elements(Example.class)
    .forEach(IO::println);

More interestingly we can perform some simple analysis, such as counting the number of static fields whose type is Runnable.

static boolean isStaticRunnableField(Field f) {
    return f.accessFlags().contains(AccessFlag.STATIC)
        && Runnable.class.isAssignableFrom(f.getType());
}
assert 2 == elements(Example.class)
    .filter(e -> e instanceof Field f && isStaticRunnableField(f))
    .count();

However, if we want to perform some analysis of the code in the lambda expressions and methods we are out of luck. Core reflection can only inspect the classes, fields, and methods – it provides no facility to go deeper and inspect code. This can severely limit what Java libraries can do, such as a library that wants to expose a novel parallel programming model and execute parallel programs on specialized hardware.

Parallel programming

Many Java programs need to process large amounts of data in parallel, and Java libraries make it easy to implement parallel computations. For example, in a face detection algorithm, we need to convert RGB pixels to grayscale; here is simplified code to do that using a lambda expression and the parallel streams built into the JDK:

IntConsumer rgbToGray = i -> {
    byte r = rgbImage[i * 3 + 0];
    byte g = rgbImage[i * 3 + 1];
    byte b = rgbImage[i * 3 + 2];
    grayImage[i] = gray(r, g, b);
};
IntStream.range(0, N)
    .parallel()
    .forEach(rgbToGray);

If the number of pixels N is sufficiently large and/or the work to compute each pixel is sufficiently demanding, then the stream will compute the result faster than a single-threaded for loop, even with the overhead of starting and coordinating multiple threads.

for (int i = 0; i < N; i++) {
    byte r = rgbImage[i * 3 + 0];
    byte g = rgbImage[i * 3 + 1];
    byte b = rgbImage[i * 3 + 2];
    grayImage[i] = gray(r, g, b);
}

Gustafson's Law states that as we increase the number of threads M, each working on a sufficiently large number of pixels, the estimated speed up of a program will approach M as the fraction of time spent on parallel tasks grows.

Unfortunately, the number of threads that can run compute-intensive tasks is limited by the CPU, e.g., an AMD EPYC 9005 Zen 5c has 384 threads. Java 21 introduced virtual threads to run large numbers of I/O-intensive tasks in parallel, but virtual threads do not create new compute resources and cannot speed up code that is already CPU-bound.

General-purpose computing with Graphics Processing Units

There is a class of computing device, the Graphics Processing Unit (GPU), whose architecture is very different to the CPU: rather than a few hundred threads, a modern GPU such as an NVIDIA Blackwell B200 GPU can simultaneously execute a few hundred thousand threads.

Originally GPUs were designed for rendering images and video games, but now we can use them for general-purpose computations such as face detection, General Matrix Multiplication (GEMM), or Fast Fourier Transformation (FFT).

If we could run parallel tasks on a GPU instead of a CPU, with orders of magnitude more threads, we could either greatly reduce the execution time or compute more in the same execution time.

Historically one approach was to write the multithreaded computation in a language supported by the GPU, e,g., CUDA C, and embed it as a string in a Java program. You could then use JNI to run the CUDA C compiler and transfer the compiled code to the GPU for execution.

static void gpuComputation(int N, byte[] rgbImage, byte[] grayImage) {
    var cudaCCode = """
        __device__
        char gray(char r, char g, char b) { return ...; }

        __global__
        void computeGrayImage(int N, char* rgbImage, char* grayImage) {
            int i = blockIdx.x * blockDim.x + threadIdx.x;
            if (i < N) {
                char r = rgbImage[i * 3 + 0];
                char g = rgbImage[i * 3 + 1];
                char b = rgbImage[i * 3 + 2];
                grayImage[i] = gray(r, g, b);
            }
        }
        """;
    var kernel = compileGpuCode(cudaCCode);
    executeGpuCode(kernel, N, rgbImage, grayImage);
}

This approach is problematic: it presents a leaky abstraction that forces developers to be familiar with CUDA artifacts, and the code is no longer hardware independent. Expecting developers to write Java code that merely carries non-Java code is misguided since javac cannot compile and check that code; they can choose any language, not just Java, to carry CUDA C.

Exploring Better Abstractions

In the 2010s, OpenJDK Project Sumatra aimed to let Java developers take advantage of GPUs by enhancing the JVM and parallel streams. The Sumatra JVM could generate code for AMD GPUs and place parts of the JVM’s heap in GPU memory. This approach, where the Java Platform obscures the presence of a GPU from Java code, is in stark contrast to manually embedding GPU code into a Java program. Neither approach provides the right abstraction.

Obscuring the GPU is particularly challenging. First, memory is split between CPU and GPU; managing the JVM’s heap across the CPU and GPU can be a continual drag on performance. Second, the idiomatic Java code in lambda expressions is polymorphic: methods are commonly invoked on interfaces rather than classes and each invocation triggers virtual method lookup and possibly class loading, initialization, etc. It is counterproductive to bring this highly variable behavior, where each thread may run different code, to the GPU, where each thread is intended to run identical code in lock step on different data elements using a Single Instruction Multiple Thread (SIMT) execution model.

Empowering libraries

We believe the best way to support GPUs in the Java Platform is to introduce primitives that enable the creation of libraries which, in turn, introduce novel programming models and APIs that harness the unique memory and execution capabilities of GPUs.

One such primitive is the Foreign Function & Memory (FFM) API, introduced in Java 22. While the FFM API has no built-in knowledge of GPUs, it allows libraries to interact efficiently with native device drivers on the CPU and thereby control the GPU indirectly.

If libraries are to translate the parallel parts of Java programs to GPU code, they need high-fidelity access to Java code. Fortunately, the Java Platform has a longstanding primitive – reflection -- which allows a library to inspect the structure of a Java program. Java 1.1 introduced reflection at run time, core reflection, kick-starting an ecosystem of libraries for data access, unit testing, messaging, etc. Java 5 introduced reflection at compile time, allowing annotation processors to generate code that extends the application with no maintenance overhead.

Unfortunately, as we have shown, reflection is limited: it does not provide high-fidelity access to the code in methods or lambda expressions.

A library can access the source code of methods and lambdas with internal APIs of javac, but this is only available at compile time and is too complex since it contains many extraneous syntactic details. At run time, a library can access the bytecode of methods (but not lambdas) with the Class-File API, but this is a poor substitute for source code and class files are not always available. As such, libraries can have complex high-fidelity code with low availability, or low-fidelity code with low availability.

Code Reflection: The Missing Primitive

To support libraries effectively, we propose to enhance reflection to expose not just classes, fields, and methods but also the code of methods and lambda expressions. With this enhancement, we can develop libraries that translate Java code, e.g., the lambda expression used in a parallel stream, into GPU code, eliminating the need to manually write CUDA C code. With knowledge of both Java code and GPU code, libraries can model data dependencies and optimize data transfer between CPU and GPU for better performance.

Just as the FFM API is not specific to GPUs, an API providing access to Java code is not specific to GPUs. Libraries could use it to, e.g., automatically differentiate Java code, pass translated Java code to native machine learning runtimes, or translate Java code to SQL statements.

Description

We propose to enhance core reflection with code reflection. The code reflection API supports access to a model of code in a method or lambda expression, a code model, at run time that is suited for analysis and transformation.

We shall introduce code reflection by continuing with the two examples we presented earlier, extending analysis to the code in the Example class and then describing how a library can use code reflection to translate Java code to GPU code. Then we shall describe code reflection in more detail.

Let’s update our Example class so that the code of lambda expressions and methods is accessible just like the fields and methods.

import jdk.incubator.code.*;
import jdk.incubator.code.bytecode.*;
import jdk.incubator.code.dialect.core.*;
import jdk.incubator.code.dialect.java.*;
import static jdk.incubator.code.dialect.core.CoreOp.*;
import static jdk.incubator.code.dialect.java.JavaOp.*;

static class Example {
    @Reflect
    static Runnable R = () -> IO.println("Example:field:R");
    @Reflect
    static int add(int a, int b) {
        IO.println("Example:method:add");
        return a + b;
    }

    static class Nested {
        @Reflect
        static Runnable R = () -> IO.println("Example.Nested:field:R");
        @Reflect
        void m() { IO.println("Example.Nested:method:m"); }
    }
}

We declare the lambda expressions and methods are reflectable by annotating their declarations with @Reflect. By doing so we grant access to their code. When the source of the Example class is compiled by javac it translates its internal model of method add’s code to a standard model, called a code model, and stores the code model in a class file related to the Example class file where add’s code is compiled to bytecode. (The same occurs for the other method and lambda expressions.)

A code model is an immutable tree of code elements, where each element models some Java statement or expression (for further details see the Code models section).

We can use the code reflection API to access the code model of an annotated element, which loads the corresponding code model that was stored in the related class file.

static Object getStaticFieldValue(Field f) {
    try { return f.get(null); }
    catch (IllegalAccessException e) { throw new RuntimeException(e); }
}
static Optional<? extends CodeElement<?, ?>> getCodeModel(AnnotatedElement ae) {
    return switch (ae) {
        case Method m -> Op.ofMethod(m);
        case Field f when isStaticRunnableField(f) ->
                Op.ofLambda(getStaticFieldValue(f)).map(Quoted::op);
        default -> Optional.empty();
    };
}

(Note: since code reflection is an incubating API we cannot add new APIs in packages of other modules, such as in the java.lang.reflect package of the java.base module. For now, we must provide such methods in the incubating code reflection module.)

The method getCodeModel returns the code model for a reflectable method or lambda expression, a code element that is the root of the code model tree. By default methods and lambda expressions are not reflectable, so we return an optional value. If the annotated element is a method we retrieve the code model from the method. If the annotated element is a static field whose type is Runnable we access its value, an instance of Runnable whose result is produced from a lambda expression, and from that instance we retrieve the lambda expression’s code model. The retrieval is slightly different for lambda expressions since they can capture values (for more details see the Declaring reflectable code section).

We can use getCodeModel to map from Example’s annotated elements to their code models.

elements(Example.class)
        // AnnotatedElement -> CodeModel?
        .flatMap(ae -> getCodeModel(ae).stream())
        .forEach(IO::println);

More interestingly we can now perform some simple analysis of code, such as extracting the values of the string literal expressions that are printed.

static final MethodRef PRINTLN = MethodRef.method(IO.class, "println",
        void.class, Object.class);
static Optional<String> isPrintConstantString(CodeElement<?, ?> e) {
    if (e instanceof InvokeOp i &&
            i.invokeDescriptor().equals(PRINTLN) &&
            i.operands().get(0).declaringElement() instanceof ConstantOp cop &&
            cop.value() instanceof String s) {
        return Optional.of(s);
    } else {
        return Optional.empty();
    }
}
static List<String> analyzeCodeModel(CodeElement<?, ?> codeModel) {
    return codeModel.elements()
            // CodeElement -> String?
            .flatMap(e -> isPrintConstantString(e).stream())
            .toList();
}

The method analyzeCodeModel streams over all elements of a code model and returns the list of string literal values passed to invocations of IO.println. The code to match such an invocation is straightforward but verbose, and therefore can be hard to read. We hope to address this in a future JEP by using future advancements in pattern matching, specifically the capability to declare member patterns. Until then we will avoid making near term improvements that we think can be better solved using better pattern matching.

We can then use analyzeCodeModel to further refine our steam expression to print out all such string literal values.

elements(Example.class)
        // AnnotatedElement -> CodeModel?
        .flatMap(ae -> getCodeModel(ae).stream())
        // CodeModel -> List<String>
        .map(codeModel -> analyzeCodeModel(codeModel))
        .forEach(IO::println);

Translating Java code to GPU code

With code reflection, a library can generate CUDA C code from Java code. Recall the lambda and stream based example.

IntConsumer rgbToGray = i -> {
    byte r = rgbImage[i * 3 + 0];
    byte g = rgbImage[i * 3 + 1];
    byte b = rgbImage[i * 3 + 2];
    grayImage[i] = gray(r, g, b);
};
IntStream.range(0, N)
        .parallel()
        .forEach(rgbToGray);

First, we declare that the lambda expression is reflectable and thereby grant access to its code. We do so by casting our lambda expression to the target interface annotated with @Reflect.

IntConsumer rgbToGray = (@Reflect IntConsumer) i -> {
    byte r = rgbImage[i * 3 + 0];
    byte g = rgbImage[i * 3 + 1];
    byte b = rgbImage[i * 3 + 2];
    grayImage[i] = gray(r, g, b);
};

We use the code reflection API to access the lambda expression’s code model.

var rgbToGrayModel = Op.ofLambda(rgbToGray).orElseThrow().op();

Once we have the Java code model we can pass it to our GPU library.

String cudaCCode = translateJavaCodeToGpuCode(rgbToGrayModel);

The GPU library uses code reflection APIs to traverse the code model and translate it to CUDA C code embedded in a string, after which the example proceeds as before to compile the CUDA C code and execute it. Ordinarily the GPU library would call the methods to translate, compile and execute on behalf of the user, so the user simply passes the reflectable lambda expression as an argument.

dispatchKernel((@Reflect IntConsumer) i -> {
    byte r = rgbImage[i * 3 + 0];
    byte g = rgbImage[i * 3 + 1];
    byte b = rgbImage[i * 3 + 2];
    grayImage[i] = gray(r, g, b);
});

As the GPU library traverses the code model it will encounter an element that models the invocation expression to the gray method. This method also needs to be translated to CUDA C code, otherwise we will generate an incomplete CUDA program. However, the library has no intrinsic understanding of what this method does. The library needs the code model of this method so that it can traverse and translate like was done with the lambda expression’s code model.

To achieve this we must declare that the gray method is also reflectable and thereby grant access to its code. We do so by also annotating our method with the @Reflect annotation.

@Reflect
static byte gray(byte r, byte g, byte b) {
    return ...;
}

The library uses code reflection to traverse from the lambda expression’s code model to the gray method’s code model, accessing the code model of the method.

Foreign programming models

GPU programming is an example of a foreign programming model, more specifically for the GPU example the CUDA C programming model, specified by NVIDIA. As presented we can use code reflection to develop a GPU library that translates Java code to CUDA C code. The GPU library specifies the rules as to what constitutes GPU Java code. Those rules are foreign to the Java programming model as specified by the Java Language Specification, which knows nothing about GPU Java code. We can then use the CUDA runtime to compile and execute the CUDA C code. So not only do we leverage a foreign programming model, but we also leverage foreign code and a foreign runtime. Thanks to code reflection and the Foreign Function & Memory API the Java world can embrace a foreign world and orchestrate complex activity between the two.

Declaring reflectable code

We have previously shown how to declare reflectable lambda expressions and methods, using the @Reflect annotation, and access their code models using the code reflection API. For the purposes of incubation we can only incubate APIs, so we must avoid any changes to language syntax and semantics. In some future non-incubating JEP we might devise a new language feature. Until then use of the annotation serves as a temporary declarative mechanism that is good enough for experimentation.

Declaration serves two purposes. First, we explicitly grant that other parts of our Java application may have run time access to the code, such as a library we may not be directly responsible for. Not all code needs to be reflected over, and not all code should, so we can reduce to only that which is necessary to share. Second, it informs javac it needs to perform additional tasks, so that a code model can be produced and is made accessible at run time.

In total there are four syntactic locations where @Reflect can appear that governs, in increasing scope, what is declared reflectable.

The annotation is ignored if it appears in any other valid syntactic location.

Declaring a reflectable lambda expression or method does not implicitly broaden the scope of what is reflectable to methods they invoke. (In the GPU example we needed to annotate the gray method.) Furthermore, declaring a reflectable lambda expression does broaden the scope to the surrounding code of final, or effectively final, variables used but not declared in the lambda expression.

We access the code model of a reflectable method by invoking the method Op.ofMethod with a given Method instance, which returns an optional instance of the code model, a root code element. The root code element models the method declaration (see the Code models section).

We access the code model of a reflectable lambda expression by invoking the method Op.ofLambda with a given instance of a functional interface associated with the lambda expression, which returns an optional instance of Quoted<JavaOp.LambdaOp>. From the Quoted instance we can obtain the root code element that models the lambda expression. In addition, we can obtain a mapping of run time values to items in the code model that model final, or effectively final, variables used but not declared in the lambda expression.

Code Models

A code model is an immutable instance of data structures that can, in general, model many kinds of code, be it Java code or foreign code. It has some properties like an Abstract Syntax Tree (AST) used by a source compiler, such as modeling code as a tree of arbitrary depth, and some properties like an intermediate representation used by an optimizing compiler, such as modeling control flow and data flow as graphs. These properties ensure code models can preserve many important details of code they model and ensure code models are suited for analysis and transformation.

The primary data structure of a code model is a tree of code elements. There are three kinds of code elements, operation, body, and block. The root of a code model is an operation, and descendant operations form a tree of arbitrary depth. We shall see more in subsequent sections.

The code reflection API supports representing the data structures of a code model, code elements for modeling Java language constructs and behavior, traversing code models, building code models, and transforming code models. We shall explain with further examples.

Traversing code models

We shall continue with our Example class, reflecting over the add method, accessing method’s code model, and traversing to print the model’s tree structure.

var addMethod = Example.class.getDeclaredMethod("add", int.class, int.class);
FuncOp addModel = Op.ofMethod(addMethod).orElseThrow();
assert addModel == Op.ofMethod(addMethod).orElseThrow();

We access the method’s code model as we have previously shown. The root of the code model is an operation, an instance of FuncOp that is a function declaration operation modeling the method. Further, we assert that if we obtain the code model for a second time the same instance is returned. The identity of items in the code model are stable, and therefore they can be used as stable keys for associating items with other information.

One way to traverse the code model is to write a recursive method that iterates over code elements and their children. That way we can get a sense of what a code model contains.

static void traverse(int depth, CodeElement<?, ?> e) {
    IO.println("  ".repeat(depth) + e.getClass());

    for (CodeElement<?, ?> c : e.children()) {
        traverse(depth + 1, c);
    }
}
traverse(0, addModel);

The traverse method prints out the class of the code element it encounters and prefixes that with white space proportionate to the depth of the element in the code model tree.

jshell> traverse(0, addModel);
class jdk.incubator.code.dialect.core.CoreOp$FuncOp
  class jdk.incubator.code.Body
    class jdk.incubator.code.Block
      class jdk.incubator.code.dialect.core.CoreOp$VarOp
      class jdk.incubator.code.dialect.core.CoreOp$VarOp
      class jdk.incubator.code.dialect.core.CoreOp$ConstantOp
      class jdk.incubator.code.dialect.java.JavaOp$InvokeOp
      class jdk.incubator.code.dialect.core.CoreOp$VarAccessOp$VarLoadOp
      class jdk.incubator.code.dialect.core.CoreOp$VarAccessOp$VarLoadOp
      class jdk.incubator.code.dialect.java.JavaOp$AddOp
      class jdk.incubator.code.dialect.core.CoreOp$ReturnOp

We can observe that the top of the tree is the FuncOp which contains one child, a Body, which in turn contains one child, a Block, which in turn contains a sequence of eight operations. Bodies and blocks provide additional structure for modeling code. Each operation models some part of the methods code, for example variable declaration operations (instances of VarOp) model Java variable declarations, in this case the method parameters, and the add operation (instance of AddOp) models the Java + operator.

Alternatively, we can stream over elements of the code model (as we did previously when analyzing the code for string literals) in the same topologically sorted order using the CodeElement.elements method:

addModel.elements().forEach((CodeElement<?, ?> e) -> {
    int depth = 0;
    var parent = e;
    while ((parent = parent.parent()) != null) depth++;
    IO.println("  ".repeat(depth) + e.getClass());
});

We compute the depth for each code element by traversing back up the code model tree until the root element is reached. So, it is possible to traverse up and down the code model tree.

To get a superior sense of what the code model contains we can convert it to a text string and print it out.

IO.println(addModel.toText());

The toText method will traverse the code elements in a similar manner as we presented but print out more detail.

func @loc="22:5:string:///REPL/$JShell$8D.java" @"add" (
        %0 : java.type:"int", %1 : java.type:"int")java.type:"int" -> {
    %2 : Var<java.type:"int"> = var %0 @loc="22:5" @"a";
    %3 : Var<java.type:"int"> = var %1 @loc="22:5" @"b";
    %4 : java.type:"java.lang.String" = constant @loc="24:20" @"Example:method:add";
    invoke %4 @loc="24:9" @java.ref:"java.lang.IO::println(java.lang.Object):void";
    %5 : java.type:"int" = var.load %2 @loc="25:16";
    %6 : java.type:"int" = var.load %3 @loc="25:20";
    %7 : java.type:"int" = add %5 %6 @loc="25:16";
    return %7 @loc="25:9";
};

A code model’s text is designed to be human-readable, primarily intended for debugging and testing. It is also invaluable for explaining code models. To aid debugging each operation has line number information, and the root operation also has source information from where the code model originated. Also notice how the text output mirrors the structure of the source code.

The code model text shows the code model’s root element is a function declaration (func) operation. The lambda-like expression represents the fusion of the function declaration operation’s single body and the body’s first and only block, called the entry block. Then there is a sequence of operations in the entry block. For each operation there is an instance of a corresponding Java class, all of which extend from the abstract class jdk.incubator.code.Op and which have already seen when we printed out the classes. Unsurprisingly the printed operations and printed operation classes occur in the same order since the toText method traverses the model in the same order as we explicitly traversed.

The entry block declares two values called block parameters, %0 and %1, which model the method’s initial values for parameters a and b. The method parameter declarations are modeled as embedded var operations, each initialized with a corresponding block parameter used as the var operation’s single operand. The var operations produce values called operation results, variable values %2 and %3, which model the variables a and b. A variable value can be loaded from or stored to using variable access operations, respectively modeling an expression that denotes a variable and assignment to a variable. The expressions denoting parameters a and b are modeled as var.load operations that use the variable values %2 and %3 respectively as operands. The operation results of these operations are used as operands of subsequent operations and so on, e.g., %7 the result of the add operation modeling the + operator is used as an operand of the return operation modeling the return statement.

The source code of our add method might contain all sorts of syntactic details that javac rightly needs to know about but are extraneous for modeling purposes. This complexity is not present in the code model. For example, the same code model would be produced if the return statement’s expression was ((a) + (b)) instead of a + b.

In addition to the code model containing code elements forming a tree it also contains other code items, values (block parameters or operation results) we previously introduced, that form bidirectional dependency graphs between their declaration and their use. A value also has a type element, another code item, modeling the set of all possible values. In our example many of the type elements model Java types, and some model the type of variable values (the type element of the operation result of a var operation). In summary a code model contains five kinds of code item, operation, body, block, value, and type element.

Astute readers may observe that code models are in Static Single-Assignment (SSA) form, and there is no explicit distinction, as there is in the source code, between statements and expressions. Block parameters and operation results are declared before they are used and cannot be reassigned (and we therefore require special operations and type elements to model variables as we previously showed).

Finally, we can execute the code model by transforming it to byte code, wrapping it in a method handle, and invoking the handle.

var handle = BytecodeGenerator.generate(MethodHandles.lookup(), addModel);
assert ExampleAdd.add(1, 1) == (int) handle.invokeExact(1, 1);

Building code models

The code reflection API provides functionality to build code models. We can use the API to build an equivalent model we previously accessed and traversed.

var builtAddModel = func(
    "add",
    CoreType.functionType(JavaType.INT, JavaType.INT, JavaType.INT))
    .body((Block.Builder builder) -> {
        // Check the entry block parameters
        assert builder.parameters().size() == 2;
        assert builder.parameters().stream().allMatch(
                (Block.Parameter param) -> param.type().equals(JavaType.INT));

        // int a
        VarOp varOpA = var("a", builder.parameters().get(0));
        Op.Result varA = builder.op(varOpA);

        // int b
        VarOp varOpB = var("b", builder.parameters().get(1));
        Op.Result varB = builder.op(varOpB);

        // IO.println("A:method:m")
        builder.op(invoke(PRINTLN,
                builder.op(constant(JavaType.J_L_STRING, "A:method:m"))));

        // return a + b;
        builder.op(return_(
                builder.op(add(
                        builder.op(varLoad(varA)),
                        builder.op(varLoad(varB))))));
    });
IO.println(builtAddModel.toText());

The consuming lambda expression passed to the body method operates on a block builder, instance of Block.Builder, representing the entry block being built. We use that to append operations to the entry block. When an operation is appended it produces an operation result that can be used as an operand of a further operation and so on. When the body method returns a body element and the entry block element it contains will be fully built.

Notice how building, like the text output, mirrors the source code structure. Building is carefully designed so that structurally invalid models cannot be built. We can approximately test equivalence with our previously accessed model as follows.

var builtAddModelElements = builtAddModel.elements()
        .map(CodeElement::getClass).toList();
var addModelElements = addModel.elements()
        .map(CodeElement::getClass).toList();
assert builtAddModelElements.equals(addModelElements);

We don’t anticipate most users will commonly build complete models of Java code, since it’s a rather verbose and tedious process, although potentially less so than other approaches e.g., building byte code, or using code combinators that must be built from the inside out. Javac already knows how to build models. In fact, javac uses the same API to build models, and the run time uses it to produce models that are accessed. Instead we anticipate many users will build parts of models when they transform them.

Transforming code models

The code reflection API supports the transformation of code models by combining traversing and building. A code model transformation is represented by a function that takes an operation, encountered in the (input) model being transformed, and a code model builder for the resulting transformed (output) model, and mediates how, if at all, that operation is transformed into other code elements that are built. We were inspired by the functional transformation approach devised by the Class-File API and adapted that design to work on the nested structure of immutable code model trees.

We can write a simple code model transform that transforms our method’s code model, replacing the operation modeling the + operator with an invocation operation modeling an invocation expression to the method Integer.sum.

static final MethodRef SUM = MethodRef.method(Integer.class, "sum", int.class,
        int.class, int.class);
CodeTransformer addToMethodTransformer = CodeTransformer.opTransformer((
        Function<Op, Op.Result> builder,
        Op inputOp,
        List<Value> outputOperands) -> {
    switch (inputOp) {
        // Replace a + b; with Integer.sum(a, b);
        case AddOp _ -> builder.apply(invoke(SUM, outputOperands));
        // Copy operation
        default -> builder.apply(inputOp);
    }
});

The code transformation function, passed as lambda expression to CodeTransformer.opTransformer, accepts as parameters a block builder function, builder, an operation encountered when traversing the input code model, inputOp, and a list of values in the output model being built that are associated with input operation’s operands, outputOperands. We must have previously encountered and transformed the input operations whose results are associated with those values, since values can only be used after they have been declared.

In the code transformation we switch over the input operation, and in this case we just match on add operation and by default any other operation. In the latter case we apply the input operation to the builder function, which creates a new output operation that is a copy of the input operation, appends the new operation to the block being built, and associates the new operation’s result with the input operation’s result. When we match on an add operation we replace it by building part of a code model, a method invoke operation to the Integer.sum method constructed with the given output operands. The result of the output invoke operation is automatically associated with the result of the input add operation.

We can then transform the method’s code model by calling the FuncOp.transform method and passing the code transformer as an argument.

FuncOp transformedAddModel = addModel.transform(addToMethodTransformer);
IO.println(transformedAddModel.toText());

The transformed code model is naturally very similar to the input code model.

func @loc="22:5:string:///REPL/$JShell$8D.java" @"add" (
        %0 : java.type:"int", %1 : java.type:"int")java.type:"int" -> {
    %2 : Var<java.type:"int"> = var %0 @loc="22:5" @"a";
    %3 : Var<java.type:"int"> = var %1 @loc="22:5" @"b";
    %4 : java.type:"java.lang.String" = constant @loc="24:20" @"Example:method:add";
    invoke %4 @loc="24:9" @java.ref:"java.lang.IO::println(java.lang.Object):void";
    %5 : java.type:"int" = var.load %2 @loc="25:16";
    %6 : java.type:"int" = var.load %3 @loc="25:20";
    %7 : java.type:"int" = invoke %5 %6 @java.ref:"java.lang.Integer::sum(int, int):int";
    return %7 @loc="25:9";
};

We can observe the add operation has been replaced with the invoke operation. Also, by default, each operation that was copied preserves line number information. The code transformation function can also be applied unmodified to more complex code containing many + operators in arbitrarily nested positions. (Such application is left as an exercise for the curious reader.)

The code transformation function is not a direct implementation of functional interface CodeTransformer. Instead we adapted from another functional interface, which is easier to implement for simpler transformations on operations. Direct implementations of CodeTransformer are more complex but are also capable of more complex transformations, such as building new blocks and retaining more control over associating values in the input and output models. The code reflection API provides many complex code transformers, such as those for progressively lowering code models, converting models into pure SSA-form, and inlining models into other models. We will continue to explore the code model transformation design to better understand how we can improve the API across the spectrum of simple to complex transformations.

Alternatives

Compiler Tree API

The com.sun.source package of the jdk.compiler module contains the javac API for accessing the abstract trees (ASTs) representing Java source code. Javac uses an implementation of this API when parsing source code. This API is not suitable for standardization as it is too intertwined with javac’s implementation, since javac reserves the right to make breaking changes to this API as the language evolves. More generally ASTs can be difficult to analyze and transform. For example, a modern optimizing compiler will transform its AST representing source code into another slightly lower form, an intermediate representation, that is easier to analyze and transform to executable code.

Bytecode

Bytecode is not easily accessible nor guaranteed to be so at run time, and even if we made it so it would not be ideal. The translation of Java source to bytecode by javac will result in numerous Java language features being translated away, making it hard to recover them e.g., lambda expressions are translated into invoke dynamic instructions and synthetic methods. Bytecode is also, by default, too low-level which makes it difficult to analyze and transform. For example, the HotSpot C2 compiler will transform bytecode into another higher form, an intermediate representation, that is easier to analyze and transform to executable code.

Testing

Testing will focus on a suite of unit tests for the compiler and runtime that give high modeling coverage and code coverage. Where possible we try to operationally reuse code reflection APIs, such as when storing and loading models in class files.

We need to ensure that Java code models produced by the compiler preserve Java program meaning. We will select an existing suite of Java tests and recompile the source they test, using a special javac internal flag, such that the bytecode is generated from code models produced by the compiler. Testing against these specially compiled sources must yield the same results as testing against the ordinarily compiled sources.

Risks and Assumptions

While incubating we will strive to keep the number of changes required to code in the java.base and jdk.compiler modules to a minimum, thereby reducing the burden on maintainers and reviewers. So far the changes are modest.

Introduction of a new language feature, even a modest one, is a significant effort with numerous tasks to update many areas of the platform. Code reflection will add to that list of tasks, since the language feature will need to be modeled and supported like existing modeled features. There is a risk it will require significant effort to model, especially with high fidelity. We think this risk is mitigated by the generic modeling capabilities of code models, and that we can currently model all Java statements and expressions with high fidelity.

Future work

We shall explore access to code models at compile time. The code reflection API provides very basic support for annotation processors to access code models of program elements, and while useful for advanced experimentation it needs more consideration.

As the language evolves we shall look for opportunities to enhance the code reflection API to take advantage of new language features, especially features related to pattern matching and data-oriented programming. We anticipate pattern matching will strongly influence the code reflection API and enhance the querying of code models. Furthermore, this is an opportunity to provide feedback on language features.

We need to explore the language feature for declaration of reflectable code. Use of the @Reflect annotation is a temporary solution that is good enough for incubation but insufficient for preview.

We need to ensure that a library using code reflection can operate on code models produced by a JDK version that is greater that the version it was compiled against. Such forward compatibility is challenging. We shall explore solutions, such as a library declaring an upper bound of JDK versions of reflective code it supports or enabling the lowering of a modeled language feature a library does not know about to modeled features it does (potentially compromising high fidelity but still preserving programing meaning).