JEP draft: Ahead-of-Time Class Linking

AuthorsIoi Lam, Dan Heidinga, John Rose
OwnerIoi Lam
TypeFeature
ScopeJDK
StatusSubmitted
Componenthotspot / runtime
Reviewed byAlex Buckley, Vladimir Kozlov
Created2023/09/06 04:07
Updated2024/07/24 18:25
Issue8315737

Summary

Improve startup time by making the classes of an application instantly available, in a loaded and linked state, when the HotSpot Java Virtual Machine starts. Achieve this by monitoring the application during one run and storing the loaded and linked forms of all classes in a cache for use in subsequent runs. Lay a foundation for future improvements to both startup and warmup time.

Goals

Non-Goals

Motivation

The Java Platform is attractive for developing server applications because it combines a safe language, a vast choice of libraries, and the reliable HotSpot Java Virtual Machine which silently optimizes applications for peak performance. With the help of tools such as Maven and Gradle, application developers routinely rely on dozens or hundreds of libraries, and frequently upgrade them to get new functionality, performance improvements, and security fixes. With the help of test frameworks such as JUnit, developers readily validate that their application’s behavior does not change when the libraries underneath evolve.

All these technologies streamline the creation of applications, but they do not solve an old complaint: Java applications are slow to start.

The JVM does a lot of work during the startup of a typical server application, interleaving several kinds of activities:

In addition, popular frameworks use reflection to determine an application’s configuration at startup; the Spring Framework, e.g., inspects application code for @Bean, @Configuration, and related annotations, which triggers yet more work. This further delays the moment when an application can serve requests.

As a result of all this, startup might take milliseconds for a small program but seconds or even minutes for a server application that uses a web application framework plus libraries for XML processing, persistence, and so forth.

From just-in-time to ahead-of-time

The slow startup of large Java applications is a consequence of the highly dynamic nature of the Java Platform. The work of reading, parsing, loading, and linking class files, of executing static initializers, of compiling bytecodes to native code, of reclaiming memory via garbage collection, and of many other kinds of activities, is all done on demand, lazily, just in time. This gives great flexibility to application, library, and framework developers. It comes at a price, however, which must be paid every time the application starts.

Yet applications tend to repeat themselves, often doing essentially the same thing every time they start: Scanning the same JAR files, reading and parsing and loading and linking the same classes, executing the same static initializers, and using reflection to configure the same application objects. The key to improving startup time is to do at least some of this work eagerly, ahead of time, rather than just in time. To put it another way, in the terms of Project Leyden, we aim to shift some of this work earlier in time.

Description

We extend the HotSpot JVM to support an ahead-of-time cache which can store classes after reading, parsing, loading, and linking them, along with any other necessary data. Once a cache is created for a specific application, it can be re-used in subsequent runs of that application to improve startup time.

To create a cache takes two steps. First, run the application once, in a training run, to record its AOT configuration, in this case into the file app.aotconf:

$ java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf \
       -cp app.jar com.example.App ...

Second, use the configuration to create the cache, in the file app.aot:

$ java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf \
       -XX:AOTCache=app.aot

(This second step doesn’t run the application, it just creates the cache.)

Subsequently, in production, run the application with the cache:

$ java -XX:AOTCache=app.aot -cp app.jar com.example.App ...

With the AOT cache and ahead-of-time class linking, the reading, parsing, loading, and linking work that the JVM would usually do just-in-time when the program runs in the third step is shifted ahead-of-time to the second step, which creates the cache. Subsequently, the program starts up faster in the third step because its classes are available instantly from the cache.

For example, here is a program which, though short, uses the Stream API and thus causes almost 600 JDK classes to be read, parsed, loaded, and linked:

import java.util.*;
import java.util.stream.*;

public class HelloStream {

    public static void main(String ... args) {
        var words = List.of("hello", "fuzzy", "world");
        var greeting = words.stream()
            .filter(w -> !w.contains("z"))
            .collect(Collectors.joining(", "));
        System.out.println(greeting);  // hello, world
    }

}

This program runs in 0.031 seconds on JDK 23 and in 0.018 seconds on JDK NN when using an AOT cache — an improvement of 42%. The AOT cache occupies 11.4 megabytes.

For a more realistic example, the Spring PetClinic application, version 3.2.0, starts up in 4.486 seconds on JDK 23 and in 2.604 seconds on JDK NN when using an AOT cache — also an improvement of 42%, by coincidence. It loads about 21,000 classes from an AOT cache of 130 megabytes.

Limitations

If any of these constraints are violated then the JVM reports an error and exits without creating a cache.

Consistency of training and production runs

To enjoy the benefits of the AOT cache generated during a training run, the subsequent production runs must be essentially similar.

If any of these constraints are violated then the JVM, by default, issues a warning and ignores the cache. You can insist that the JVM use the cache by adding the option -XX:AOTMode=on to the command line:

$ java -XX:AOTCache=app.aot -XX:AOTMode=on \
       -cp app.jar com.example.App ...

If this option is present then the JVM reports an error and exits if any of the above constraints are violated, or if the cache does not exist.

A useful exception to the requirement for consistency is that training and production runs may use different garbage collectors. Another useful exception is that training and production runs may use different main classes.

Synthetic training runs

A training run captures application configuration and execution history for use in subsequent production runs. The best training run is, therefore, a production run. Using a production run for training, however, is not always practical, especially for server applications which, e.g., create log files, open network connections, and access databases. For such cases we recommend creating a synthetic training run that resembles actual production runs as much as possible. It should, among other things, fully configure itself and exercise typical production code paths.

One way to achieve this is to add a second main class to your application specifically for training, e.g., com.example.AppTrainer. This class can invoke the production main class to exercise the common modes of the application using a temporary log-file directory, a local network configuration, and a mocked database if required. You might already have such a main class in the form of an integration test.

Some additional tips:

History

The ahead-of-time cache proposed here is a natural evolution of an old feature in the HotSpot JVM, class-data sharing (CDS).

CDS was first introduced in an update to JDK 5, in 2004. It initially aimed to shrink the memory footprint of multiple Java applications running on the same machine. It achieved this by reading and parsing JDK class files, storing the resulting metadata in a read-only archive file that could later be mapped directly into memory by multiple JVM processes using the same virtual-memory pages. We later extended CDS so that it could also store metadata for application classes.

Nowadays the sharing benefit of CDS has been reduced by new security practices such as address space layout randomization (ASLR), which makes the address at which a file is mapped into memory unpredictable. CDS still, however, offers a significant startup-time improvement — so much so that builds of JDK 12 and later include a built-in CDS archive containing the metadata of over a thousand commonly-used JDK classes. CDS is therefore, in a sense, ubiquitous, even though many Java developers have never heard of it and few have used it directly.

The AOT cache builds upon CDS by not only reading and parsing class files ahead-of-time but also loading and linking them. You can see the effect of the latter two optimizations by disabling them via the -XX:-AOTClassLinking option when creating a cache:

$ java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf \
       -XX:AOTCache=app.aot -XX:-AOTClassLinking

When we use this option, we can see that most of the improvement to the startup time of the HelloStream program is due to ahead-of-time loading and linking, while most of the improvement to the startup time of the PetClinic application is due to the ahead-of-time reading and parsing already done by CDS today (all times are in seconds):

HelloStream PetClinic
JDK 23 0.031 4.486
AOT cache, no loading or linking 0.027 (13%) 3.008 (33%)
AOT cache, with loading and linking 0.018 (42%) 2.604 (42%)

Users of Spring Boot and, more generally, the Spring Framework, can therefore enjoy significant startup-time improvements immediately simply by using CDS.

The new -XX:AOT* command-line options are, for the most part at this time, macros for existing CDS options such as -Xshare, -XX:DumpLoadedClassList, and -XX:SharedArchiveFile. We are introducing the -XX:AOT* options in order to provide a uniform user experience for both this and future ahead-of-time features, and to drop the potentially confusing words “share” and “shared”.

Specification conformance

The timing and ordering of class reading, parsing, loading, and linking is almost always immaterial to applications. When these operations are shifted from just-in-time to ahead-of-time, the application observes classes being loaded as if the JVM did the loading work at the exact moment requested, though unaccountably fast.

The JVM can change its order of execution, but it always it honors the promises made by the Java language and JVM specifications. If the application could observe a change in a behavior governed by the specifications then the new behavior must also be permitted by the specifications. Conversely, if the application cannot observe a change to specified behavior then the JVM is free to adjust the invisible order of its operations.

Therefore, if loading and linking a class ahead-of-time would result in an observable behavioral change that violates the specifications then the JVM gracefully falls back to just-in-time loading and linking that class, with some loss of startup performance.

Unlike loading and linking, initializing classes causes many side effects visible to Java code, and initialization order is specified in detail. For this reason, we do not initialize application classes ahead-of-time.

Future work

Testing

Risks and Assumptions