JEP draft: Loaded Classes in CDS Archives

AuthorsIoi Lam, John Rose
OwnerIoi Lam
TypeFeature
ScopeImplementation
StatusDraft
Releasetbd
Componenthotspot / runtime
Reviewed byVladimir Kozlov
Created2023/09/06 04:07
Updated2024/06/06 17:04
Issue8315737

Summary

Enhance CDS to store classes in the loaded state, rather than merely pre-parsed.

Goals

Both of these improvements to CDS (a HotSpot optimization mechanism) indirectly serve the larger goals of Project Leyden, which include better startup, warmup, and footprint for Java applications.

Non-Goals

Success Metrics

Motivation

In the startup of almost any Java application, the Java Virtual Machine (or "VM") spends a significant amount of time loading classes and linking them together. The HotSpot VM includes a feature, Cached Data Storage (CDS), that can shift some of the work involved in loading classes from run time to build time. This means that the VM has a headstart on loading classes at run time, and the application will start faster.

CDS requires that, at build time, there is a training run of the application. This involves running the application with a representative workload so that the VM can monitor which classes are loaded, which objects are created, and which optimizations are performed on the application code. The VM then executes a dump command, storing into a CDS archive whichever information is likely to be useful when running the application again.

The payoff comes when the application is deployed into production. This later run, called a production run, is given the CDS archive as an extra input. The archive contains a record of the VM’s state from the training run, which is mapped directly into the VM’s memory during the production run, rather being recreated from scratch. As loading and linking work is bypassed, startup gets faster.

To show the details of CDS in action, here is an example of a very small program that builds a stream, executes it, and finally prints a familiar message:

$ cat Test.java
import java.util.*;
import java.util.stream.*;
interface Test {
  static void main(String... av) {
    var words = List.of("hello", "fuzzy", "world");
    var greeting = words.stream()
      .filter(w->!w.contains("z"))
      .collect(Collectors.joining(", "));
    System.out.println(greeting); // => hello, world
  }
}

$ javac Test.java
$ jar cf Test.jar Test.class
$ perf stat -r 100 java -Xshare:off \
    -cp Test.jar Test > /dev/null
 …
 0.0640295 +- 0.0000934 seconds time elapsed  ( +-  0.15% )

The last benchmarking step shows that this toy program starts up (then exits) in about 64 milliseconds, in each of 100 runs. The -Xshare:off option turns off all CDS sharing, even the default sharing provided to all Java programs in JDK builds. We do this to establish a baseline for the comparison we are about to make.

This is a complicated workflow, but it is beneficial enough that standard builds of the JDK include a CDS archive that contains many JDK classes. By default, Java applications use this prebuilt archive, although it can be disabled by specifying -Xshare:off; disabling it affects startup time but not program semantics.

Now let's build a CDS archive for the program, so we can demonstrate the benefits of CDS in a current release (even before the present JEP improves CDS):

# Perform a CDS training run, which reports some information.
$ java -Xshare:off -XX:DumpLoadedClassList=Test.classlist \
    -cp Test.jar Test

 # Use the reported information to dump a CDS archive.
$ java -Xshare:dump -XX:SharedClassListFile=Test.classlist \
    -XX:SharedArchiveFile=Test.0.cds \
    -cp Test.jar  #not a training run, so no "Test" argument

 # Measure speed using the archive (Test.0.cds).
$ perf stat -r 100 java -XX:SharedArchiveFile=Test.0.cds \
    -cp Test.jar Test > /dev/null
 …
 0.0242744 +- 0.0000297 seconds time elapsed  ( +-  0.12% )

Thus, a current release of CDS can reduce startup, for one small program, on one Linux platform, from 64 milliseconds to 24 milliseconds.

Notes on this specific example: There are two runs of the VM with a -Xshare:… option set. The first run is the training run, and the second assembles a CDS archive. These two commands must be used together in current releases of CDS. The reader may wonder about the extra step that builds a JAR file; this is needed because CDS does not support directory components in class paths. Future improvements of CDS may provide simplified workflows. The example concludes with 100 production runs, made via perf stat. The toy we use here as an example is very small, although it sports a lambda. Larger programs obtain more benefits from CDS, since they do more work loading and linking classes, but even this toy program shows a measurable benefit.

The training run that starts a CDS workflow can be any application whatsoever. This run executes a representative workload of that application, which at least loads the classes required by that application. During this run, the VM performs its usual internal bookkeeping: It loads and initializes classes, links symbolic references, and performs various other computations to configure and optimize the program. In addition, as a special function of its training run logic, the VM runtime monitors these computations and collects their results, if they are likely to be useful in future runs of the same application.

A subsequent production run, using the same CDS archive, starts up faster if it uses the same classes in similar ways. This is because the archive file contains memory-mappable assets that represent the computation results previously collected from the training run. These assets, directly mapped into VM memory, are readily adopted into the production run. Adopted assets work in the VM just as if they had been created dynamically, but require less effort to load. CDS assets can be adopted into many kinds of VM memory, including metaspace (where class and method metadata reside) and the Java heap (where Java objects associated with classes reside).

Currently, a CDS asset cannot refer to a class directly (by its pointer) until the class has been loaded due to a request by the VM or Java application. This is because CDS class data is provided to the VM in a provisional (or “pickled”) pre-parsed state, which needs additional processing and registration to become a “live” class. When loading is requested, the VM associates a class name to its metadata, and this decision is permanent. Before the decision is made, a provisional asset in CDS must be treated as unstable, not reliable. After the decision, its address (as adopted into the VM) is treated as a “live”, stable, reliable part of the VM's online class metadata.

Since each actual loading decision is delayed (in current releases of CDS), there are no stable pointers available at startup. This means assets cannot be linked together directly in the CDS archive, and complex bookkeeping is required to link them together later, when the application asks for them. The root problem is delayed adoption of classes from CDS. The solution will be reorganizing CDS to adopt classes into a loaded state, immediately at startup.

Such a change is needed accelerate startup, and also to lay a foundation for future improvements to CDS, again accelerating both startup and warmup.

Description

Our descriptions in this section assume familiarity with CDS as described above, and as provided by most releases of the HotSpot Java Virtual Machine. We also appeal to low-level activities such as class loading and symbolic resolution, as defined in the Java Virtual Machine Specification. It is true that the activities of a main Java method and the APIs it uses are naturally the foreground in any Java application. But when reasoning about CDS, we must pay more attention to the background, where the VM performs the bookkeeping that supports all APIs.

A CDS archive will be enhanced to store (selected) classes in their loaded state. This in turn will direct the VM (when given the archive as an extra input) to start up with those classes immediately loaded. This is a change from the previous behavior, which is to load classes on demand, from a pre-parsed state in CDS.

A new option -XX:+CDSLoadedClasses may be selected by the user, when making a training run and dumping a CDS archive. (Therefore this is “opt-in” functionality.) The effect of this flag is twofold. First, it puts a new attribute into the CDS file that instructs the VM to adopt classes in the enhanced loaded state, as computed by the training run. This attribute ensures that (selected) classes used by the application are given “live” addresses immediately on startup. The normal request to load a class, which is normally an early side effect of application's main routine, is “short-circuited” and replaced by an even earlier side effect (a shifted computation, in Project Leyden terms).

No additional flag will be required when using such an archive, only when generating it. Initialization of application classes (see JVMS §5.5) will not be short-circuited in this JEP. The new flag affects all classes stored in CDS, as controlled by the pre-existing -XX:SharedClassListFile option.

As a second effect, -XX:+CDSLoadedClasses instructs CDS to preset some linkages between the class assets, so that when they are adopted, they have already resolved some of their symbolic references to each other. This eliminates more kinds of bookkeeping from startup.

Here is the previous example, now adjusted to run under a prototype for this JEP:

# Perform a CDS training run, which reports some information.
$ java -Xshare:off -XX:DumpLoadedClassList=Test.classlist \
    -cp Test.jar Test
 # Use the reported information to dump a CDS archive.
$ java -Xshare:dump -XX:SharedClassListFile=Test.classlist \
    -XX:SharedArchiveFile=Test.1.cds -XX:+CDSLoadedClasses \
    -cp Test.jar
 # Measure speed using the archive (Test.1.cds).
$ perf stat -r 100 java -XX:SharedArchiveFile=Test.1.cds \
     -cp Test.jar Test > /dev/null
 …
 0.023368 +- 0.0000366 seconds time elapsed  ( +-  0.16% )

Thus, for this particular toy program, the startup time previously reduced to 24 milliseconds is now further reduced to 23 milliseconds. This reflects the bookkeeping time saved by adopting stored classes at startup in a loaded and partially resolved state.

The modest 4% improvement shown by the toy example increases for realistic applications that use many more classes. For example, the well-known PetClinic startup benchmark improves by 20% to 30%, beyond previous uses of CDS, simply by adding the new optimizations.

Interestingly, that benchmark also includes a complicated application-specific “AOT mode”, which also accelerates class loading, but -XX:+CDSLoadedClasses confers similar relative benefits both with and without that AOT mode. For similar frameworks considering their own “AOT mode”, simply flipping the new CDS switch may be an attractive alternative. Remember also that another benefit of this JEP, independent of startup improvements by 4% or 30%, is that it lays a foundation for further automatic optimizations in CDS, which in their systemic effects may outperform any application-specific “AOT mode”.

From a programmer’s point of view, the overall effect of -XX:+CDSLoadedClasses is as if class loading is initiated by the platform, not the application, in a very early period before the application main routine starts to run. We call this early period the premain phase of execution. Loading and other linking activity appears to happen during this premain phase. It also appears to happen very quickly, since CDS assets are readily adopted into VM memory. Thus, loading and linking activities have a smaller impact on the execution of main itself. Roughly speaking, much of the work of loading and resolution happened long ago, in a training run which created a CDS archive.

The order of loading (as opposed to initialization) is insignificant for most Java classes. The JVMS allows the VM much freedom in choosing when to load, and programmers almost never care when it happens, as long as APIs are available when the main routine needs them. Programmers are much more aware of class initialization order, since that perturb program logic through side effects. Because CDS does not shift initialization of user-written classes, programmers perceive CDS, with or without the present new option, as a simple improvement to startup.

From the VM point of view, the training run (and associated dump command) creates directly usable memory images representing loaded classes, and stores them in the CDS archive for quick adoption, before main runs. A whole suite of performance-sensitive classes can be adopted all at once, with their interdependencies (their symbolic references) already resolved.

This JEP is also designed to enable future optimizations envisioned for Project Leyden. More and more preset linkages between CDS assets will be possible, thus avoiding more and more startup computations currently done dynamically. More constant pool entries can be preset in class assets stored by CDS. The likely result is a series of significant improvements to startup time.

Implementation Notes

In more detail, the specific shiftable computations associated with running some class C are as follows:

In most JDK releases, CDS already supports the shifting of (a) and (b): The CDS archive stores pre-parsed InstanceKlass structures for Java classes encountered at build time. As the Java program executes, each InstanceKlass can be loaded quickly from the CDS archive and added into each relevant ClassLoader. This JEP shifts the later steps as well.

Some resolution computations, such as resolved references from subclasses to superclasses, or symbolic resolution of some constant pool entries, will also be stored as assets from the CDS file. Resolution results which are cannot be adequately predicted or speculated (transparently to user observation) will not be recorded and will require the usual dynamic resolution (perhaps with errors) at runtime.

As is currently the case today, a few selected classes are initialized in the CDS archive, although this will not be visible to users. Java objects required for such initializations (such as the Integer box cache) will be adopted from CDS assets. However, most classes in the CDS archive are not initialized in the premain phase.

The new flag -XX:+CDSLoadedClasses instructs the CDS archive to include results from steps (c) and (d) as well. These steps involve not only parsing but also loading (which means permanent definition to the VM), linking, and preparation. No additional flag is required at application run time. The -Xshared:dump and -XX:SharedClassListFile=… options are unchanged.

A CDS archive created with the new flag will have the following behavior:

These classes should come from only “known” locations, such as the JDK's modules file, the module path, or the class path. Note that CDS doesn't store classes that are dynamically generated or loaded from other locations. Later enhancements may support more such classes, such as hidden classes required by lambdas, generated dynamically and attached to resolved constant pool entries.

Note that under the new flag, all classes on the CDS class list are stored in the loaded state. A mixed processing mode, where some classes are loaded and others merely pre-parsed, is not supported in this JEP. It could possibly be added in the future, although it does not seem to be required.

Consequences

Although this is a simple change, the benefits are deep and go far beyond faster class loading. Internal pointers to eagerly loaded class data, adopted from the CDS archive, will be immediately and unconditionally useful as “live” VM data, without waiting for resolution logic or other checks.

Once data is “live from the start” in this way, the VM can use it immediately when the application is deployed to production, accompanied by the CDS archive. This is true of all kinds of data assets used by the VM, such as metadata (classes and methods) and Java objects (class mirrors).

In addition, CDS assets (such as other classes, Java Class mirrors or eventually AOT code) can refer directly to class data by using pointers, instead of via complex provisional symbolic references or relocation records. When they are adopted into the VM, such mutually referential CDS assets are correctly configured with respect to each other immediately and without extra checks or resolution steps.

At most, a low-level relocation pass may be required, as is typical when mapping dynamically linked data that contains pointer. For CDS, this pass is extremely simple, being driven by a bitmap which shows where pointers occur in the mapped CDS assets.

In particular, the stability of class data pointers implies that constant pool entries can (in the future) be put into a resolved state when a CDS archive is created. (The VM does not need to re-resolve a constant pool entry once resolved.) This will allow the VM to avoid many dynamic operations normally required when a Java application configures itself at startup.

Stabilized, less speculative, immediately usable pointers are likely to greatly simplify the management of various kinds of data and metadata to be added in future JEPs. The modest 4% startup improvement cited above, in the toy example, is likely to improve significantly. Prototyping (outside the bounds of this JEP) indicates simply adding more constant pool resolutions can raise the improvement to at least 20%. As its second goal, this JEP removes technical debt blocking such substantial future improvements.

Some important activities associated with loading, such as verification or v-table generation, are deferred to startup time in the production run, for many classes, in CDS versions up to and including this JEP. Later, additional work on CDS may shift more computation results that such activities require and produce (such as verification flags or v-table adapter code) into CDS for faster startup.

Limitations

As a result of loading classes in the premain phase, the application will no longer be able to do the following in the production run:

In addition, the training and production runs must have consistent VM and platform configurations, or else the VM will not use the CDS archive offered to the production run. There may be lost performance, but no incorrect behavior. In such cases the production run will issue a warning, if the CDS "auto" mode is selected (via the flag -Xshared:auto). The user may also demand that CDS must be used ( via the flag -Xshared:on), in which case the VM will report configuration mismatches and exit.

The dump command (which uses the -Xshared:dump option) must use the same configuration as the training run the immediately precedes it. (Note: The existing two-step CDS workflow is likely, in the near future, to be accompanied by a new streamlined workflow, a one-step run-and-dump command. Since this JEP does not deliver this workflow, the user must continue to deal with a multi-step workflow for creating CDS archives.)

Some configuration options are inherently inconsistent, or otherwise not supported by CDS, and will be rejected by the training run, so that a CDS archive cannot be created at all. Such options include --limit-modules, --patch-module, and --upgrade-module-path.

Across the training and production runs, the -m or --module options must be the same (if present). The UseCompressedOops option, if given, must agree across both runs.

The class path configurations given to the training run and the production run must not conflict. The production run may specify extra class path entries, appended to the end. Otherwise, the class paths must be identical. Similar rules may apply to the module path in the future.

Note that directory-based class paths cannot be checked for consistency, since directory contents may change concurrently with VM execution. Thus, a training run must use only JAR-based class paths. (This is long-standing CDS restriction, noted here for clarity.)

Significant platform differences are likely to cause conflicts, leading to the CDS archive being ignored. These include uses of VMs from different JDK releases, and different hardware platforms. CDS is not a cross-compilation tool.

These limitations are emphasized here for clarity, not as documentation of new restrictions on CDS. They are either inherent to all versions of CDS, or else are inherent to the VM itself, as effects of a changed order of loading.

As a general principle, if a training run (and subsequent dump command) generates a CDS archive, that CDS archive will produce a correct execution of the production run, or else it will be ignored, followed by a differently ordered (but still correct) execution of the production run. A complete description of consistency requirements is beyond the scope of this document.

Testing

Risks and Assumptions

We assume, for most applications and frameworks that want to take advantage of the shifting afforded by -XX:+CDSLoadedClasses, that the corresponding constraint of not being able to request conflicting VM configurations when deploying to production is an acceptable tradeoff. For example, incompatible class paths or module system settings can prevent use of the CDS archive. Such restrictions may be softened by future work.

Note that CDS interoperates with user-defined class loaders, allowing users to dynamically configure part of their class loading activity, even while loading other classes ahead of time. Early conversations suggest that users are willing to accept fixed class paths, and to use additional class loaders when more flexibility is required.

Dependencies

This JEP is an evolution of the existing CDS implementation. Future work in Project Leyden, especially involving the premain phase, is likely to depend on it.