JEP draft: Ahead-of-Time Class Loading & Linking with Any GC

OwnerErik Österlund
TypeFeature
ScopeJDK
StatusSubmitted
Componenthotspot / gc
Discussionhotspot dash dev at openjdk dot org
EffortM
DurationM
Reviewed byAlex Buckley, Ioi Lam, Stefan Karlsson, Vladimir Kozlov
Created2024/02/16 09:49
Updated2025/05/06 19:14
Issue8326035

Summary

Enhance the ahead-of-time cache, which enables the HotSpot Java Virtual Machine to improve startup and warmup time, so that it can be used with any garbage collector, including the low-latency Z Garbage Collector (ZGC).

Goals

Motivation

Most of the HotSpot JVM's garbage collectors pause application threads in order to reclaim memory. This causes the application to take significantly longer than usual to handle some requests, increasing its tail latency. For example, 99% of all requests may be handled within 10ms, but 1% of the requests may take 100ms or more. You can minimize the tail latency caused by garbage collection by using the Z Garbage Collector (ZGC). ZGC reclaims memory concurrently, never pausing application threads for more than a millisecond.

Garbage collection is, however, not the only cause of high tail latency.

Java applications are often scaled by starting new JVM instances to handle more requests, but requests sent to a new instance take significantly longer than requests sent to a warmed-up instance. To address this source of tail latency, you can enable ahead-of-time class loading and linking, introduced in JDK 24. This improves application startup by caching your application’s classes in a training run so that they are available immediately in production. For example, the Spring PetClinic demo application starts 41% more quickly in production because the cache enables some 21,000 classes to appear immediately loaded and linked at startup. Forthcoming features, such as ahead-of-time method profiling and code compilation, will further leverage the ahead-of-time cache to further extend these gains.

Unfortunately, the way that classes are cached is incompatible with ZGC. This forces you to choose between suffering GC-induced tail latency or suffering startup-induced tail latency. If you use ZGC to reduce the former then you cannot enable ahead-of-time class loading and linking to reduce the latter, and vice versa.

We could avoid this painful choice if AOT caches could be used with any of the HotSpot JVM's garbage collectors, including ZGC.

Description

An AOT cache contains, among other things, representations of the Class objects for classes that were loaded and linked during a training run. It also contains objects referenced by those Class objects, such as strings and byte arrays.

Today, AOT cache files are GC-specific: Their format is bitwise-compatible with the format of heap objects understood by the GC, so that the JVM can map them directly into the heap memory managed by the GC.

We propose to make AOT caches optionally GC-agnostic so that they work with all garbage collectors, regardless of which GC is used in training or in production. As an additional benefit, this will allow the JDK to include a baseline AOT cache that works in all environments.

Obstacles to a GC-agnostic AOT cache

The main challenge of caching objects in a GC-agnostic manner is in how to handle object references. From the perspective of Java code, the value of a field that holds a reference to an object is opaque. From the perspective of the JVM, however, each GC has its own policies for laying out objects in memory and representing references from one object to another:

The multitude of reference formats makes it challenging to take objects managed by one GC, cache them, and reify them later for a different GC.

Object caching today

The representation of an object in an AOT cache mirrors its representation in memory. For example, consider a String object with these fields:

public class String {
    private final byte[] value;
    private final byte coder;
    private int hash;
    private boolean hashIsZero;
}

In the cached form of a String object, the value field contains the 64-bit memory address of a byte array:

header: ...  |  value: 0x4002045278  |  coder: ...  |  hash: ...  |  hashIsZero: ...

The address is in a lowest-common-denominator format that is valid across the Serial, Parallel, and G1 collectors. Objects are stored in AOT caches such that none crosses the boundaries of heap regions, using a predetermined region size. This allows you to run in production with G1 even if you trained with Serial or Parallel.

ZGC does not use 64-bit addresses as object references, however, and it does not support a global size for regions. Hence ZGC cannot be used with AOT caches.

GC-agnostic object caching

We can make a GC-agnostic AOT cache by storing object references in a format that is GC-agnostic, namely as logical indices. Then the cached form of a String object would look like this, with the value field containing the logical index of the byte array:

header: ...  |  value: 5  |  coder: ...  |  hash: ...  |  hashIsZero: ...

Using a GC-agnostic cache requires converting the logical indices back into memory addresses. The JVM therefore reads objects from the cache sequentially, i.e., streams them, into memory. When the cache is opened, a background thread eagerly starts materializing objects, one by one. Materializing an object involves allocating memory in the heap, initializing the object's fields according to the data in the cache, and building object references to other materialized objects via lookups in a side table. When the application uses a class for the first time, it synchronizes with the background thread to ensure that the Class object for the class, and any related objects, are materialized.

Choosing an AOT cache format

A GC-specific AOT cache is mapped directly into memory, while a GC-agnostic cache is streamed into memory. Both create the appearance of instantly-loaded objects, but in some scenarios mapping the cache into memory performs better than streaming the cache into memory — and vice versa.

A cold start of an application is the first start of that application on a particular machine in a while. Cold starts can happen frequently when deploying applications in a cloud. The AOT cache is unlikely to be in the filesystem cache, and the larger the cache, the larger the cost of loading it from disk. Streaming can, however, hide the latency of reading data from the disk, at the cost of requiring an additional CPU core.

Conversely, a warm start is when an application starts close in time to a previous start, such as when running over and over on the same machine. Because the AOT cache stays in the filesystem cache between runs, it can be mapped into the JVM's heap instantly.

The least advantageous situation for streaming is a warm start in a constrained environment that does not have a spare CPU core. The JVM tries to avoid this situation in production by applying a heuristic when creating an AOT cache after a training run:

You can explicitly create a streaming, GC-agnostic cache by specifying -XX:+AOTStreamableObjects, even if you also specify -XX:+UseCompressedOops.

The JDK includes two baseline AOT caches, one GC-agnostic and one GC-specific, which the JVM uses when the application does not provide a cache. This ensures that the JVM can use streaming or mapping, as appropriate, to achieve the best startup performance.

Alternatives

Testing

Many object-archiving tests already exist. We will adapt them to test with ZGC and the new streaming, GC-agnostic approach.