JEP 483: Ahead-of-Time Class Loading & Linking

AuthorsIoi Lam, Dan Heidinga, & John Rose
OwnerIoi Lam
TypeFeature
ScopeJDK
StatusCandidate
Componenthotspot / runtime
Discussionleyden dash dev at openjdk dot org
Reviewed byAlex Buckley, Brian Goetz, Mark Reinhold, Vladimir Kozlov
Created2023/09/06 04:07
Updated2024/10/08 17:08
Issue8315737

Summary

Improve startup time by making the classes of an application instantly available, in a loaded and linked state, when the HotSpot Java Virtual Machine starts. Achieve this by monitoring the application during one run and storing the loaded and linked forms of all classes in a cache for use in subsequent runs. Lay a foundation for future improvements to both startup and warmup time.

Goals

Non-Goals

Motivation

The Java Platform is highly dynamic. This is a source of great strength.

Features such as dynamic class loading, dynamic linkage, dynamic dispatch, and dynamic reflection give vast expressive power to developers. They can create frameworks which use reflection to determine an application’s configuration by inspecting application code for annotations. They can write libraries which dynamically load and then link to plug-in components discovered at run time. They can, finally, assemble applications by composing libraries which dynamically link to other libraries, leveraging the rich Java ecosystem.

Features such as dynamic compilation, dynamic deoptimization, and dynamic storage reclamation give broad flexibility to the JVM. It can compile a method from bytecode to native code when it detects, by observing an application’s behavior, that doing so will be worthwhile. It can speculatively optimize native code, assuming a particular frequent path of execution, and revert to interpreting bytecode when it observes that the assumption no longer holds. It can reclaim storage when it observes that doing will be profitable. By these and related techniques, the JVM can achieve higher peak performance than is possible with traditional static approaches.

All this dynamism comes at a price, however, which must be paid every time an application starts.

The JVM does a lot of work during the startup of a typical server application, interleaving several kinds of activities:

If, additionally, the application uses a framework, e.g., the Spring Framework, then the framework’s startup-time discovery of @Bean, @Configuration, and related annotations will trigger yet more work.

All this work is done on demand, lazily, just in time. It is heavily optimized, however, so many Java programs start up in milliseconds. Even so, a large server application which uses a web application framework plus libraries for XML processing, database persistence, etc., may require seconds or even minutes to start up.

Yet applications tend to repeat themselves, often doing essentially the same thing every time they start: Scanning the same JAR files, reading and parsing and loading and linking the same classes, executing the same static initializers, and using reflection to configure the same application objects. The key to improving startup time is to try to do at least some of this work eagerly, ahead of time, rather than just in time. To put it another way, in the terms of Project Leyden, we aim to shift some of this work earlier in time.

Description

We extend the HotSpot JVM to support an ahead-of-time cache which can store classes after reading, parsing, loading, and linking them. Once a cache is created for a specific application, it can be re-used in subsequent runs of that application to improve startup time.

To create a cache takes two steps. First, run the application once, in a training run, to record its AOT configuration, in this case into the file app.aotconf:

$ java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf \
       -cp app.jar com.example.App ...

Second, use the configuration to create the cache, in the file app.aot:

$ java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf \
       -XX:AOTCache=app.aot

(This second step doesn’t run the application, it just creates the cache. We intend to streamline the process of cache creation in future work.)

Subsequently, in testing or production, run the application with the cache:

$ java -XX:AOTCache=app.aot -cp app.jar com.example.App ...

(If the cache file is unusable or does not exist then the JVM issues a warning message and continues.)

With the AOT cache, the reading, parsing, loading, and linking work that the JVM would usually do just-in-time when the program runs in the third step is shifted ahead-of-time to the second step, which creates the cache. Subsequently, the program starts up faster in the third step because its classes are available instantly from the cache.

For example, here is a program which, though short, uses the Stream API and thus causes almost 600 JDK classes to be read, parsed, loaded, and linked:

import java.util.*;
import java.util.stream.*;

public class HelloStream {

    public static void main(String ... args) {
        var words = List.of("hello", "fuzzy", "world");
        var greeting = words.stream()
            .filter(w -> !w.contains("z"))
            .collect(Collectors.joining(", "));
        System.out.println(greeting);  // hello, world
    }

}

This program runs in 0.031 seconds on JDK 23. After doing the small amount of additional work required to create an AOT cache it runs in in 0.018 seconds on JDK NN — an improvement of 42%. The AOT cache occupies 11.4 megabytes.

For a representative server application, consider Spring PetClinic, version 3.2.0. It loads and links about 21,000 classes at startup. It starts in 4.486 seconds on JDK 23 and in 2.604 seconds on JDK NN when using an AOT cache — also an improvement of 42%, by coincidence. The AOT cache occupies 130 megabytes.

How to train your JVM

A training run captures application configuration and execution history for use in subsequent testing and production runs. A good candidate for a training run is, therefore, a production run. Using a production run for training, however, is not always practical, especially for server applications which, e.g., create log files, open network connections, and access databases. For such cases we recommend creating a synthetic training run that resembles actual production runs as much as possible. It should, among other things, fully configure itself and exercise typical production code paths.

One way to achieve this is to add a second main class to your application specifically for training, e.g., com.example.AppTrainer. This class can invoke the production main class to exercise the common modes of the application using a temporary log-file directory, a local network configuration, and a mocked database if required. You might already have such a main class in the form of an integration test.

Some additional tips:

Consistency of training and subsequent runs

To enjoy the benefits of the AOT cache generated during a training run, the training run and all subsequent runs must be essentially similar.

If any of these constraints are violated then the JVM, by default, issues a warning and ignores the cache. You can insist that the JVM use the cache by adding the option -XX:AOTMode=on to the command line:

$ java -XX:AOTCache=app.aot -XX:AOTMode=on \
       -cp app.jar com.example.App ...

If this option is present then the JVM reports an error and exits if any of the above constraints are violated, or if the cache does not exist.

(If needed, you can disable the AOT cache entirely via -XX:AOTMode=off. You can also specify the default mode via -XX:AOTMode=auto, in which case the JVM tries to use the AOT cache specified via the -XX:AOTCache option; if the cache is unusable or does not exist then it issues a warning message and continues.)

A useful exception to the requirement for consistency is that training and subsequent runs may use different garbage collectors. Another useful exception is that training and subsequent runs may use different main classes; this gives flexibility in constructing training runs, as noted above.

History

The ahead-of-time cache proposed here is a natural evolution of an old feature in the HotSpot JVM, class-data sharing (CDS).

CDS was first introduced in an update to JDK 5, in 2004. It initially aimed to shrink the memory footprint of multiple Java applications running on the same machine. It achieved this by reading and parsing JDK class files, storing the resulting metadata in a read-only archive file that could later be mapped directly into memory by multiple JVM processes using the same virtual-memory pages. We later extended CDS so that it could also store metadata for application classes.

Nowadays the sharing benefit of CDS has been reduced by new security practices such as address space layout randomization (ASLR), which makes the address at which a file is mapped into memory unpredictable. CDS still, however, offers a significant startup-time improvement — so much so that builds of JDK 12 and later include a built-in CDS archive containing the metadata of over a thousand commonly-used JDK classes. CDS is, therefore, ubiquitous, even though many Java developers have never heard of it and few have used it directly.

The AOT cache builds upon CDS by not only reading and parsing class files ahead-of-time but also loading and linking them. You can see the effect of the latter two optimizations by disabling them via the -XX:-AOTClassLinking option when creating a cache:

$ java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf \
       -XX:AOTCache=app.aot -XX:-AOTClassLinking

When we use this option, we can see that most of the improvement to the startup time of the HelloStream program is due to ahead-of-time loading and linking, while most of the improvement to the startup time of the PetClinic application is due to the ahead-of-time reading and parsing already done by CDS today (all times are in seconds, and percentages are cumulative):

HelloStream PetClinic
JDK 23 0.031 4.486
AOT cache, no loading or linking 0.027 (+13%) 3.008 (+33%)
AOT cache, with loading and linking 0.018 (+42%) 2.604 (+42%)

Users of Spring Boot and, more generally, the Spring Framework, can therefore enjoy significant startup-time improvements, today, simply by using the CDS feature already available in previous JDK releases.

The new -XX:AOT* command-line options are, for the most part at this time, macros for existing CDS options such as -Xshare, -XX:DumpLoadedClassList, and -XX:SharedArchiveFile. We are introducing the -XX:AOT* options in order to provide a uniform user experience for both this and future ahead-of-time features, and to drop the potentially confusing words “share” and “shared.”

Compatibility

Ahead-of-time class loading and linking works with every existing Java application, library, and framework. It requires no changes to source code and no changes to build configurations, aside from the additional step of creating the AOT cache. It fully supports the highly dynamic nature of the Java Platform, including run-time reflection.

This is so because the timing and ordering of class reading, parsing, loading, and linking is immaterial to Java code. The Java language and virtual-machine specifications give the JVM broad freedom in scheduling these operations. When we shift these operations from just-in-time to ahead-of-time, the application observes classes being loaded and linked as if the JVM did that work at the exact moment requested — though unaccountably fast.

Future work

Testing

Risks and Assumptions