JEP draft: Ahead-of-Time Code Compilation
Owner | John Rose |
Type | Feature |
Scope | Implementation |
Status | Draft |
Component | hotspot / compiler |
Created | 2024/06/30 04:47 |
Updated | 2025/10/09 16:21 |
Issue | 8335368 |
Summary
Improve startup and warmup time by making native code from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. This will greatly reduce the initial load on the JIT compiler, reducing its interference with the application during startup, particularly in configurations with fewer cores. The JIT is then free to delay the generation of native code, unless and until the previously generated code proves insufficiently performant.
Goals
-
Help applications start up and warm up more quickly by shifting dynamic (JIT) method compilation from production runs to training runs, conveying the necessary native code via the AOT cache.
-
Do not require any change to the code of applications, libraries, or frameworks.
-
Do not introduce any new constraints on application execution.
-
Do not introduce new AOT workflows, but, rather, use the existing AOT cache creation commands.
- Provide full interoperability between all optimization levels and execution modes in the HotSpot Java Virtual Machine, including new AOT code, existing JIT code, and the bytecode interpreter.
Motivation
To prepare the best possible native code for an application, we must first run the application.
This means that, initially, an application must execute by means of less-than-optimal techniques. During this initial period, called warmup, the actual application behavior must be observed (or profiled) in order to track which code paths and object types need to be prioritized for optimization. As profiles accumulate during warmup, the system is able to compile and install highly optimized native code, which is organized to provide the best possible performance. When application execution is fully transferred to this optimized code, it stays at peak performance, as long as the profiled code paths and object types continue to dominate performance.
It may seem that there is no shortcut, that that peak application performance is only attained after a CPU-intensive warmup period, including application execution, profiling, and optimizing JIT compilation.
Recent work has reduced these warmup costs, in part. JEP 483 shifts application linking and loading to a training run by means of the AOT cache. JEP 515 shifts profiling work in the same way, so that a production run starts with ready-made profile data, so that the JIT compiler can run immediately. But warmup is still delayed, by seconds or even minutes, because the JIT compilation of optimized code uses many computing resources. On some platforms, the latency of JIT compilation can be hidden by running many JIT threads in parallel, but this trick requires the allocation of processors beyond those immediately useful to the application. Surely it would be helpful if the heavy work of JIT compilation could be shifted to a training run as well.
Description
We extend the AOT cache, introduced by JEP 483 and previously extended by JEP 515, to store natively compiled method code assets, also known as AOT code. During a production run, a request for native method code, normally fulfilled by the JIT compiler, can be immediately fulfilled if a matching method is found in the AOT cache. There does not need to be any delay for profiling or JIT compilation, if appropriate AOT code is available. This means that warmup happens quickly, and with less consumption of computing resources.
From the user’s point of view, all JIT compilation activity is transparent, except for effects on application performance. Likewise, all uses of AOT code are equally transparent. There are no new requirements on application configuration or VM invocation. Applications which use AOT code assets will usually start up and warm up more quickly. Even when peak performance requires additional JIT activity (to generate newly optimized code), there is likely to be less overall consumption of machine resources by JIT activity, and such activity will tend to spread more evenly across the lifetime of the application.
The presence of AOT code has two low-level effects: It makes the AOT cache larger, usually by a modest amount. And, it makes good native code appear quickly, almost as if the JIT is suddenly able to perform its compilation tasks instantly. The almost-instant loading of AOT code will cause even the earliest phases of application startup to run faster, since it is much faster to load precompiled code than to generate it from scratch. Application warmup will also be accelerated, since much profiling and JIT activity will be skipped, in favor of immediate use of AOT code assets.
Of course, if the application’s behavior in the production run is significantly different from the training run, some AOT code might not be usable, or it might be deoptimized and replaced. This is nothing new: JIT code also gets generated only conditionally (on proof of importance) and is then subject to deoptimization and replacement. (The JIT may also be necessary if the VM is running on a processor version that is unable to execute the AOT methods.) When generating new JIT code, AOT profiles are very useful, since they enable the optimizing JIT compiler to produce code that supports the appropriate hot code paths and hot object types, as observed during the training run.
Testing
-
We will create new unit tests for this feature.
-
We will run existing AOT cache tests with this feature enabled, and ensure that they pass.
Alternatives
As has been demonstrated many times, Java can be supported by a pure static compiler. Static compilation is always accompanied by compromises to performance, agility, or compatibility. At present, best performance requires a balanced mix of AOT and JIT execution modes (plus the interpreter), as provided by this JEP.
Since AOT code can be loaded immediately on startup, it might seem that profiles in the AOT cache (added by JEP 515) are now useless. In fact, they are used to sequence the loading of optimized AOT code, as well as helping the VM compiler regenerate JIT code.
Therefore, it is not presently a goal to rely completely on AOT code, as if a Java application were the same as a C++ application. When appropriate, applications can still make use of the interpreter, the JIT, and AOT profiles. Future work may investigate further minimization of JIT usage, and/or interpreter usage. However, initial experiments suggest that totally excluding the JIT often leads to lower peak performance. Likewise, excluding the interpreter results in bloated AOT cache files, which can be more expensive to load than running the interpreter.
Unlike a C++ application, a Java application is always compiled to use the highest and best instruction set architecture available at production time, including any available optional instructions. Vector ISAs change and develop, affecting the details of vectorized code generated by the HotSpot virtual machine. When running with an AOT cache that contains AOT code assets, the VM checks that the present processor can correctly execute the AOT code asserts. This check can fail if the AOT cache was created by a newer machine, but the production run is performed on an older model. The resulting execution is still correct, but it may exhibit lower performance, as some or all AOT code assets may be inappropriate for the current run.
Future work may investigate alternatives for finer control over optimization levels of AOT code, possibly allowing users to trade off speed for processor compatibility. Such work could potentially install several versions of a given AOT method, usable by differing processor levels. However, such fine control is not an initial goal.
Risks and Assumptions
There are no new risks beyond those already noted in JEP 483.
The base assumption of the AOT cache remains operative: A training run is assumed to be a good source of observations that, when passed through an AOT cache to a production run, will benefit the performance of that production run. This assumption applies fully to AOT code, which benefits similar production runs, without doing harm to divergent production runs.