JEP 516: Ahead-of-Time Object Caching with Any GC

OwnerErik Österlund
TypeFeature
ScopeJDK
StatusCandidate
Componenthotspot / gc
Discussionhotspot dash dev at openjdk dot org
EffortM
DurationM
Reviewed byAlex Buckley, Ioi Lam, Stefan Karlsson, Vladimir Kozlov
Created2024/02/16 09:49
Updated2025/05/09 15:42
Issue8326035

Summary

Enhance the ahead-of-time cache, which enables the HotSpot Java Virtual Machine to improve startup and warmup time, so that it can be used with any garbage collector, including the low-latency Z Garbage Collector (ZGC). Achieve this by making it possible to load cached Java objects sequentially into memory from a neutral, GC-agnostic format, rather than map them directly into memory in a GC-specific format.

Goals

Motivation

Most of the HotSpot JVM's garbage collectors pause application threads in order to reclaim memory. This causes applications to take significantly longer than usual to handle some requests, increasing their tail latency. For example, 99% of all requests may be handled within 10ms, but 1% of the requests may take 100ms or more. You can minimize the tail latency caused by garbage collection by using the Z Garbage Collector (ZGC). ZGC reclaims memory concurrently, never pausing application threads for more than a millisecond.

Garbage collection is, however, not the only cause of high tail latency.

Java applications are often scaled by starting new JVM instances to handle more requests, but requests sent to a new instance take significantly longer than requests sent to a warmed-up instance. To address this source of tail latency, you can enable ahead-of-time class loading and linking, introduced in JDK 24. This improves application startup by caching your application’s classes in a training run so that they are available immediately in production. For example, the Spring PetClinic demo application starts 41% more quickly in production because the cache enables some 21,000 classes to appear already loaded and linked when the application starts. Forthcoming features, such as ahead-of-time method profiling and code compilation, will further leverage the ahead-of-time cache to further extend these gains.

Unfortunately, the way that classes and other Java objects are cached is incompatible with ZGC. This forces you to choose between suffering GC-induced tail latency or suffering startup-induced tail latency. If you use ZGC to reduce the former then you cannot enable ahead-of-time class loading and linking to reduce the latter, and vice versa.

We could avoid this painful choice if AOT caches could be used with any of the HotSpot JVM's garbage collectors, including ZGC.

Description

An AOT cache contains, among other things, representations of the Java Class objects for classes that were loaded and linked during the training run. It also contains Java objects referenced by those Class objects, such as strings and byte arrays.

Today, cached Java objects are stored in a GC-specific format which is bitwise-compatible with the format of heap objects as understood by the GC. This enables the JVM to map them directly into the heap memory managed by the GC. (The other data in AOT cache files is not GC-specific.)

We propose to, optionally, cache Java objects in a neutral, GC-agnostic format that works with all garbage collectors, regardless of which GC is used in training or in production. As an additional benefit, this will allow the JDK to include a baseline AOT cache that works in all environments.

Obstacles to GC-agnostic object caching

The main challenge of caching objects in a GC-agnostic manner is in how to handle object references. From the perspective of Java code, the value of a field that holds a reference to an object is opaque. From the perspective of the JVM, however, each GC has its own policies for laying out objects in memory and representing references from one object to another:

The multitude of reference formats makes it challenging to take objects managed by one GC, cache them, and reify them later for a different GC.

Object caching today

The representation of a Java object in an AOT cache mirrors its representation in memory. For example, consider a String object, which has these fields:

public class String {
    private final byte[] value;
    private final byte coder;
    private int hash;
    private boolean hashIsZero;
}

In the cached form of a String object, the value field contains the 64-bit memory address of a byte array:

header: ...  |  value: 0x4002045278  |  coder: ...  |  hash: ...  |  hashIsZero: ...

The address is in a lowest-common-denominator format that is valid across the Serial, Parallel, and G1 collectors. Objects are stored in AOT caches such that none crosses the boundaries of heap regions, using a predetermined region size. This allows you to run in production with G1 even if you trained with Serial or Parallel.

ZGC does not use 64-bit addresses as object references, however, and it does not support a global size for regions. Hence ZGC cannot be used with AOT caches.

GC-agnostic object caching

We cache Java objects in a GC-agnostic manner by storing object references in a GC-agnostic format, namely as logical indices. In a String object cached in this format, the value field contains the logical index of the byte array:

header: ...  |  value: 5  |  coder: ...  |  hash: ...  |  hashIsZero: ...

Using objects cached in this format requires converting the logical indices back into memory addresses. The JVM therefore reads objects from the cache sequentially, i.e., streams them, into memory. When the cache is opened, a background thread eagerly starts materializing objects, one by one. Materializing an object involves allocating memory in the heap, initializing the object's fields according to the data in the cache, and building object references to other materialized objects via lookups in a side table. When the application uses a class for the first time, it synchronizes with the background thread to ensure that the Class object for the class, and any related objects, are materialized. (The other data in the cache continues to be mapped into memory, as it is today.)

Choosing GC-specific vs. GC-agnostic object caching

GC-specific cached objects are mapped directly into memory, while GC-agnostic cached objects are streamed into memory. Both create the appearance of instantly-loaded objects, but in some scenarios mapping performs better than streaming — and vice versa.

A cold start of an application is the first start of that application on a particular machine in a while. Cold starts can happen frequently when deploying applications in a cloud. The AOT cache is unlikely to be in the filesystem cache, and the larger the cache, the larger the cost of loading it from disk. Streaming GC-agnostic cached objects can hide some of the latency of reading data from the disk, at the cost of requiring an additional CPU core.

Conversely, a warm start is when an application starts close in time to a previous start, such as when running over and over on the same machine. Because the AOT cache stays in the filesystem cache between runs, mapping GC-specific cached objects can be done instantly.

The least advantageous situation for streamable, GC-agnostic object caching is a warm start in a constrained environment that does not have a spare CPU core. The JVM tries to avoid this situation in production by applying a heuristic when creating an AOT cache after a training run:

You can explicitly create a cache whose objects are in the streamable, GC-agnostic format by specifying -XX:+AOTStreamableObjects, even if you also specify -XX:+UseCompressedOops.

The JDK includes two baseline AOT caches, one with GC-agnostic cached objects and one with GC-specific cached objects, which the JVM uses when the application does not provide a cache. This ensures that the JVM can use streaming or mapping, as appropriate, to achieve the best startup performance.

Alternatives

Testing

Many object-caching tests already exist. We will adapt them to test with ZGC and the new streaming, GC-agnostic approach.