JEP 515: Ahead-of-Time Method Profiling
Author | Igor Veresov & John Rose |
Owner | John Rose |
Type | Feature |
Scope | Implementation |
Status | Candidate |
Component | hotspot / compiler |
Discussion | leyden dash dev at openjdk dot org |
Effort | M |
Duration | M |
Relates to | JEP 483: Ahead-of-Time Class Loading & Linking |
Reviewed by | Alex Buckley, Dan Heidinga, Vladimir Kozlov |
Created | 2024/02/01 20:40 |
Updated | 2025/05/02 19:28 |
Issue | 8325147 |
Summary
Improve warmup time by making method-execution profiles from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. This will enable the JIT compiler to generate native code immediately upon application startup, rather than having to wait for profiles to be collected.
Goals
-
Help applications warm up more quickly by shifting the collection of initial method execution profiles from production runs to training runs, conveying the profiles via the AOT cache.
-
Do not require any change to the code of applications, libraries, or frameworks.
-
Do not introduce any new constraints on application execution.
-
Do not introduce new AOT workflows, but, rather, use the existing AOT cache creation commands.
Motivation
To truly know what an application does, we must run it.
We can draw simple conclusions about an application's behavior by inspecting its source code or its class files, but we will be uncertain about how it interacts with the highly dynamic Java Platform. One reason for this uncertainty is that, in the absence of final or sealed modifiers, any class can be subclassed at any time, so a method can be called many times and then overridden and never called again. Another reason is that new classes can be loaded in response to external input, extending the application's behavior in ways that even its author could not predict. Static analysis can always be defeated by program complexity.
When running an application, the JVM can identify which methods do the important work, and how they do it. For an application to reach peak performance, the JVM's just-in-time compiler (JIT) must find the unpredictable set of hot methods, i.e., those which consume the most CPU time, and compile their bytecode to native code. (Hence the name "HotSpot JVM".) Since previous application behavior is an excellent predictor of future behavior, a summary of previous behavior can focus the JVM's compilation efforts upon the code that really matters.
Since JDK 1.2, the HotSpot JVM has automatically collected this summary in the form of profiles. For any given method, a profile tallies many useful events, e.g., how many times its bytecode instructions are executed and which object types are encountered. With enough profile data, the JVM has a statistical basis to predict the method's future behavior, and thus to generate optimized code for that method. Profiles allow the JVM to both optimize hot methods and avoid optimizing cold methods; both conditions are necessary for peak performance.
Unfortunately, there is a chicken-and-egg problem: An application cannot achieve peak performance until its method behaviors are predicted, and method behaviors cannot be predicted until the application has run for a significant period of time.
The JVM currently solves this problem by devoting some resources to collecting profiles in the early part of an application's run. During this warmup period the application runs more slowly, until the JIT can compile the hot methods to native code. After warmup, no more methods need to be compiled unless the application changes its pattern of behavior, triggering a new warmup period.
We can improve warmup time by collecting profiles even earlier, in a training run of the application. This shifts the work of profiling and predicting behavior out of the application's production lifetime. As a result, the application's warmup time in production will be determined only by the costs of JIT compilation, and the application can achieve peak performance more rapidly.
Description
We extend the AOT cache, introduced by JEP 483, to collect method profiles during training runs. Just as the AOT cache currently stores classes that the JVM would otherwise need to load and link at startup, the AOT cache now also stores method profiles that the JVM would otherwise need to collect in the early part of an application's run. Accordingly, production runs of the application are both faster to start and faster to achieve peak performance.
Profiles cached during training runs do not prevent additional profiling during production runs. This is critical, since an application's behavior in production can diverge from what was observed in training. Even with cached profiles, the HotSpot JVM continues to profile and optimize the application as it runs, fusing the benefits of AOT profiles, on-line profiling, and JIT compilation. The net effect of cached profiles is that the JIT runs earlier and with more accuracy, using the profiles to optimize the hot methods so that the application experiences a shorter warmup period. JIT tasks are inherently parallel, so the wall-clock time for warmup can be short when enough hardware resources are available.
For example, here is a program which, though short, uses the Stream API and thus causes almost 900 JDK classes to be loaded. About 30 hot methods are compiled at the highest optimization level:
import java.util.*;
import java.util.stream.*;
public class HelloStreamWarmup {
static String greeting(int n) {
var words = List.of("Hello", "" + n, "world!");
return words.stream()
.filter(w -> !w.contains("0"))
.collect(Collectors.joining(", "));
}
public static void main(String... args) {
for (int i = 0; i < 100_000; i++)
greeting(i);
System.out.println(greeting(0)); // "Hello, world!"
}
}
This program runs in 90 milliseconds with an AOT cache that contains no profiles. After collecting profiles into the AOT cache, it runs in 73 milliseconds — an improvement of 19%. The AOT cache with profiles occupies an additional 250 kilobytes, about 2.5% more than the AOT cache without profiles.
A short program such as this has only a short warmup period, but with cached profiles that warmup goes even faster as a result of timely and accurate JIT activity. More complex and longer-running programs are also likely to warm up more quickly, for the same reason.
Alternatives
If an application is so predictable that we can compile its hot methods to native code ahead of time, and doing so enables it to reach peak performance without further JIT activity, then such AOT code is preferable to caching profiles. We intend to implement AOT compilation in future work.
Many applications, however, benefit from a mix of AOT compilation and JIT compilation, since their behavior cannot be accurately predicted by an AOT compiler. Cached profiles and cached AOT code are thus not mutually antagonistic, and will synergize to provide the best performance for a range of applications. A partial AOT solution, where reasonable AOT code is gradually replaced by better-optimized JIT code, seems likely to be the best solution in the end. The JIT can initially stay out of the application's way, taking its time to get the final code just right, based on the latest profiling information.
Testing
-
We will create new unit tests for this feature.
-
We will run existing AOT cache tests with this feature enabled, and ensure that they pass.
Risks and Assumptions
There are no new risks beyond those already noted in JEP 483.
The base assumption of the AOT cache remains operative: A training run is assumed to be a good source of observations that, when passed through an AOT cache to a production run, will benefit the performance of that production run.