JEP draft: Ahead-of-Time Class Linking
Authors | Ioi Lam, Dan Heidinga, John Rose |
Owner | Ioi Lam |
Type | Feature |
Scope | JDK |
Status | Submitted |
Component | hotspot / runtime |
Reviewed by | Alex Buckley, Vladimir Kozlov |
Created | 2023/09/06 04:07 |
Updated | 2024/07/24 18:25 |
Issue | 8315737 |
Summary
Improve startup time by making the classes of an application instantly available, in a loaded and linked state, when the HotSpot Java Virtual Machine starts. Achieve this by monitoring the application during one run and storing the loaded and linked forms of all classes in a cache for use in subsequent runs. Lay a foundation for future improvements to both startup and warmup time.
Goals
-
Improve startup time by exploiting the fact that most applications start up in roughly the same way every time they run.
-
Do not require any change to the code of applications, libraries, or frameworks.
-
Do not require any change to how applications are configured and started by the
java
launcher, beyond the command-line options related directly to this feature. -
Lay a foundation for continued improvements to startup time and also to warmup time, i.e., the time required for the HotSpot JVM to optimize an application’s code for peak performance.
Non-Goals
- It is not a goal to cache classes that are loaded by user-defined class loaders. Only classes loaded by the JDK’s built-in class loaders can be cached. We may address this limitation in future work.
Motivation
The Java Platform is attractive for developing server applications because it combines a safe language, a vast choice of libraries, and the reliable HotSpot Java Virtual Machine which silently optimizes applications for peak performance. With the help of tools such as Maven and Gradle, application developers routinely rely on dozens or hundreds of libraries, and frequently upgrade them to get new functionality, performance improvements, and security fixes. With the help of test frameworks such as JUnit, developers readily validate that their application’s behavior does not change when the libraries underneath evolve.
All these technologies streamline the creation of applications, but they do not solve an old complaint: Java applications are slow to start.
The JVM does a lot of work during the startup of a typical server application, interleaving several kinds of activities:
-
It scans hundreds of JAR files on disk and reads and parses thousands of class files into memory;
-
It loads the parsed class data into class objects and links them together so that classes can use each others’ APIs, which involves verifying bytecodes and resolving symbolic references, which in turn instantiates lambda objects; and
-
It executes the static initializers of classes — their
static
field initializers andstatic { ... }
blocks — which can create many objects and even do I/O operations such as opening log files.
In addition, popular frameworks use reflection to determine an application’s configuration at startup; the Spring Framework, e.g., inspects application code for @Bean
, @Configuration
, and related annotations, which triggers yet more work. This further delays the moment when an application can serve requests.
As a result of all this, startup might take milliseconds for a small program but seconds or even minutes for a server application that uses a web application framework plus libraries for XML processing, persistence, and so forth.
From just-in-time to ahead-of-time
The slow startup of large Java applications is a consequence of the highly dynamic nature of the Java Platform. The work of reading, parsing, loading, and linking class files, of executing static initializers, of compiling bytecodes to native code, of reclaiming memory via garbage collection, and of many other kinds of activities, is all done on demand, lazily, just in time. This gives great flexibility to application, library, and framework developers. It comes at a price, however, which must be paid every time the application starts.
Yet applications tend to repeat themselves, often doing essentially the same thing every time they start: Scanning the same JAR files, reading and parsing and loading and linking the same classes, executing the same static initializers, and using reflection to configure the same application objects. The key to improving startup time is to do at least some of this work eagerly, ahead of time, rather than just in time. To put it another way, in the terms of Project Leyden, we aim to shift some of this work earlier in time.
Description
We extend the HotSpot JVM to support an ahead-of-time cache which can store classes after reading, parsing, loading, and linking them, along with any other necessary data. Once a cache is created for a specific application, it can be re-used in subsequent runs of that application to improve startup time.
To create a cache takes two steps. First, run the application once, in a training run, to record its AOT configuration, in this case into the file app.aotconf
:
$ java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf \
-cp app.jar com.example.App ...
Second, use the configuration to create the cache, in the file app.aot
:
$ java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf \
-XX:AOTCache=app.aot
(This second step doesn’t run the application, it just creates the cache.)
Subsequently, in production, run the application with the cache:
$ java -XX:AOTCache=app.aot -cp app.jar com.example.App ...
With the AOT cache and ahead-of-time class linking, the reading, parsing, loading, and linking work that the JVM would usually do just-in-time when the program runs in the third step is shifted ahead-of-time to the second step, which creates the cache. Subsequently, the program starts up faster in the third step because its classes are available instantly from the cache.
For example, here is a program which, though short, uses the Stream API and thus causes almost 600 JDK classes to be read, parsed, loaded, and linked:
import java.util.*;
import java.util.stream.*;
public class HelloStream {
public static void main(String ... args) {
var words = List.of("hello", "fuzzy", "world");
var greeting = words.stream()
.filter(w -> !w.contains("z"))
.collect(Collectors.joining(", "));
System.out.println(greeting); // hello, world
}
}
This program runs in 0.031 seconds on JDK 23 and in 0.018 seconds on JDK NN when using an AOT cache — an improvement of 42%. The AOT cache occupies 11.4 megabytes.
For a more realistic example, the Spring PetClinic application, version 3.2.0, starts up in 4.486 seconds on JDK 23 and in 2.604 seconds on JDK NN when using an AOT cache — also an improvement of 42%, by coincidence. It loads about 21,000 classes from an AOT cache of 130 megabytes.
Limitations
-
A training run must use only JAR-based class paths.
-
A training run must not use the
--limit-modules
,--patch-module
, or--upgrade-module-path
options. -
Training and production runs must not use ZGC, which is not yet supported.
If any of these constraints are violated then the JVM reports an error and exits without creating a cache.
Consistency of training and production runs
To enjoy the benefits of the AOT cache generated during a training run, the subsequent production runs must be essentially similar.
-
The training and production runs must use the same JDK release and be on the same hardware architecture (e.g.,
x64
oraarch64
) and operating system. -
The training and production runs must have consistent class paths. A production run may specify extra class-path entries, appended to the training class path; otherwise, the class paths must be identical. Directory-based class paths are not supported because the JVM cannot efficiently check them for consistency.
-
The training and production runs must have consistent module options on the command line, and consistent module graphs. The arguments to the
-m
or--module
options, if present, must be identical.
If any of these constraints are violated then the JVM, by default, issues a warning and ignores the cache. You can insist that the JVM use the cache by adding the option -XX:AOTMode=on
to the command line:
$ java -XX:AOTCache=app.aot -XX:AOTMode=on \
-cp app.jar com.example.App ...
If this option is present then the JVM reports an error and exits if any of the above constraints are violated, or if the cache does not exist.
A useful exception to the requirement for consistency is that training and production runs may use different garbage collectors. Another useful exception is that training and production runs may use different main classes.
Synthetic training runs
A training run captures application configuration and execution history for use in subsequent production runs. The best training run is, therefore, a production run. Using a production run for training, however, is not always practical, especially for server applications which, e.g., create log files, open network connections, and access databases. For such cases we recommend creating a synthetic training run that resembles actual production runs as much as possible. It should, among other things, fully configure itself and exercise typical production code paths.
One way to achieve this is to add a second main class to your application specifically for training, e.g., com.example.AppTrainer
. This class can invoke the production main class to exercise the common modes of the application using a temporary log-file directory, a local network configuration, and a mocked database if required. You might already have such a main class in the form of an integration test.
Some additional tips:
-
To optimize for startup time, structure the training run so that it loads the same classes that a production run loads when it starts. You can check which classes are loaded via the
-verbose:class
command-line option or thejdk.ClassLoad
event of the JDK Flight Recorder. -
To minimize the size of the AOT cache, avoid loading classes in the training run that are not used in production runs. Do not, e.g., use large test suites written with rich test frameworks. We may provide a way to filter such classes from the cache in future work.
-
If, in production, your application interacts with other hosts on the network or accesses a database then, in training, you may need to mock those interactions to ensure that the necessary classes are loaded. Such mocking, if done in Java code, will cause additional classes to be cached which are not needed in production. Again, we may provide a way to filter such classes from the cache in future work. If, for some reason, you cannot mock these kinds of interactions then the classes required in production to handle them will be loaded from the class path or from modules, just-in-time, as usual.
-
Focus on running a broad set of short verification scenarios, sometimes called “smoke tests” or “sanity tests.” This is often enough to load all the classes you will need in production. Avoid large test suites that cover rare corner cases and seldom-used functionality. Also avoid stress and regression tests, which generally do not characterize typical startup activities.
-
Keep in mind that an AOT cache only helps insofar as the training run does the same things as production runs. If the training run stops short of that then the cache will be less useful.
History
The ahead-of-time cache proposed here is a natural evolution of an old feature in the HotSpot JVM, class-data sharing (CDS).
CDS was first introduced in an update to JDK 5, in 2004. It initially aimed to shrink the memory footprint of multiple Java applications running on the same machine. It achieved this by reading and parsing JDK class files, storing the resulting metadata in a read-only archive file that could later be mapped directly into memory by multiple JVM processes using the same virtual-memory pages. We later extended CDS so that it could also store metadata for application classes.
Nowadays the sharing benefit of CDS has been reduced by new security practices such as address space layout randomization (ASLR), which makes the address at which a file is mapped into memory unpredictable. CDS still, however, offers a significant startup-time improvement — so much so that builds of JDK 12 and later include a built-in CDS archive containing the metadata of over a thousand commonly-used JDK classes. CDS is therefore, in a sense, ubiquitous, even though many Java developers have never heard of it and few have used it directly.
The AOT cache builds upon CDS by not only reading and parsing class files ahead-of-time but also loading and linking them. You can see the effect of the latter two optimizations by disabling them via the -XX:-AOTClassLinking
option when creating a cache:
$ java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf \
-XX:AOTCache=app.aot -XX:-AOTClassLinking
When we use this option, we can see that most of the improvement to the startup time of the HelloStream
program is due to ahead-of-time loading and linking, while most of the improvement to the startup time of the PetClinic application is due to the ahead-of-time reading and parsing already done by CDS today (all times are in seconds):
HelloStream |
PetClinic | |
JDK 23 | 0.031 | 4.486 |
AOT cache, no loading or linking | 0.027 (13%) | 3.008 (33%) |
AOT cache, with loading and linking | 0.018 (42%) | 2.604 (42%) |
Users of Spring Boot and, more generally, the Spring Framework, can therefore enjoy significant startup-time improvements immediately simply by using CDS.
The new -XX:AOT*
command-line options are, for the most part at this time, macros for existing CDS options such as -Xshare
, -XX:DumpLoadedClassList
, and -XX:SharedArchiveFile
. We are introducing the -XX:AOT*
options in order to provide a uniform user experience for both this and future ahead-of-time features, and to drop the potentially confusing words “share” and “shared”.
Specification conformance
The timing and ordering of class reading, parsing, loading, and linking is almost always immaterial to applications. When these operations are shifted from just-in-time to ahead-of-time, the application observes classes being loaded as if the JVM did the loading work at the exact moment requested, though unaccountably fast.
The JVM can change its order of execution, but it always it honors the promises made by the Java language and JVM specifications. If the application could observe a change in a behavior governed by the specifications then the new behavior must also be permitted by the specifications. Conversely, if the application cannot observe a change to specified behavior then the JVM is free to adjust the invisible order of its operations.
Therefore, if loading and linking a class ahead-of-time would result in an observable behavioral change that violates the specifications then the JVM gracefully falls back to just-in-time loading and linking that class, with some loss of startup performance.
Unlike loading and linking, initializing classes causes many side effects visible to Java code, and initialization order is specified in detail. For this reason, we do not initialize application classes ahead-of-time.
Future work
-
The two-step workflow proposed here is cumbersome. In the near future we expect to reduce this to one step which both performs the training run and creates the AOT cache.
-
As noted earlier, ZGC is not yet supported. We intend to address this limitation in future work.
-
In some cases the JVM cannot load classes ahead of time, much less link them. These include classes loaded by user-defined class loaders, old classes which require an old version of the bytecode verifier, and signed classes. If a class cannot be AOT-loaded then other, AOT-loadable classes cannot be AOT-linked to it. In all such cases the JVM falls back to loading and linking just-in-time. We may address these limitations in future work, if and when they prove significant.
-
Classes defined by user-defined class loaders are loaded and linked just-in-time, as usual. We cannot load and link them ahead-of-time because there is not, at present, any way to track their identities across training and production runs. We may address this limitation in future work.
-
At present, the only way to do a training run is to have the application run a representative workload, at least through startup, and then exit. In future work we may create new tools to help developers more flexibly define and evaluate such training runs and workloads, and perhaps also allow them to manually adjust what is stored in AOT caches. We may also enable training data to be gathered unobtrusively during production runs.
-
Loading and linking classes ahead-of-time enables future improvements to warmup time. During training runs we can record statistics about which code runs most frequently and cache any optimized code that is generated. This will enable applications to start immediately in an optimized state.
Testing
-
We will create new unit-test cases to cover the new command-line options.
-
Ahead-of-time loading and linking is independent of existing CDS features. Most CDS tests should pass when run with the
-XX:+AOTClassLinking
option. A few tests are sensitive to the order in which classes are loaded; we will revise them as appropriate.
Risks and Assumptions
-
We assume that the consistency required across training and production runs is tolerable to developers who want to use this feature. They must, especially, ensure that the class paths and the module configurations are consistent in all runs.
-
We assume that the limited support for user-defined class loaders is tolerable. Conversations with some potential users suggest that they are willing to accept fixed class paths and module configurations, and thus a fixed set of built-in class loaders, and to use specialized class loaders only when that flexibility is absolutely required.
-
We assume that the low-level side effects of class loading are immaterial in practice. These include the timing of filesystem accesses, log messages, JDK-internal bookkeeping activities, and changes in CPU and memory usage. Applications that observe and depend on such subtle effects may become unstable if classes are loaded and linked ahead-of-time. We assume that such applications are rare, and that they can be adjusted to compensate.