JEP draft: (DRAFT) Loaded Classes in CDS Archives

Authoriklam
OwnerIoi Lam
TypeFeature
ScopeImplementation
StatusDraft
Componenthotspot / runtime
Created2023/09/06 04:07
Updated2024/02/27 00:44
Issue8315737

Summary

Enhance CDS to store classes in the loaded state, rather than merely pre-parsed.

Goals

Non-Goals

Success Metrics

Motivation

A significant portion of startup time in almost any Java application is spent in loading and organizing blocks of class-related data, and then linking those blocks together into a fully configured application. The HotSpot CDS feature shifts some startup computations to application build time, by performing classfile parsing during a training run, then storing the resulting pre-parsed class data as assets in a CDS archive.

CDS operates by saving selected computation results from a training run as memory-mappable "assets" in an archive file. The assets in the file, directly mapped into JVM memory, are readily "adopted" in a subsequent deployment run. Adopted assets function just as if they had been created dynamically, but require less effort to load. CDS assets can represent computation results that are present in JVM memory, including metaspace (where class and method metadata reside) and the Java heap (where Java objects associated with classes reside).

But currently, a CDS asset cannot refer to a class directly (by its pointer) until the VM or Java application requests that its class be loaded. This is because CDS class data is provided to the VM in a provisional (or "pickled") pre-parsed state, which needs additional processing and registration to become a "live" class. When loading is requested, the JVM associates a class name to its metadata, and this decision is permanent. Before the decision is made, a provisional asset in CDS must be treated as unstable, not reliable. After the decision, its address can be treated as a "live", stable, reliable part of the VM's online class metadata.

If the user can request an early loading decision for selected classes in CDS, they would will appear loaded immediately on startup, rather than waiting for main to eventually request loading. When more classes are "live" at startup, associated computation results can move into the CDS archive. These results can include not only class loading but also symbolic resolution of constant pool entries in "live" classes, when they refer to "live" classes. Skipping the work of dynamic class loading and symbolic resolution will provide some improvements in startup time. The enhanced class assets will also provide a foundation for future Project Leyden optimizations.

Description

To enable CDS to store classes in their loaded state, a new option `-XX:+CDSLoadedClasses must be selected by the user, when dumping a CDS archive. (Therefore this is "opt-in" functionality.)

In general, CDS operates by saving selected computation results from a training run as "assets" in an archive file. The assets in the file, directly mapped into JVM memory, are readily "adopted" for reuse in subsequent deployment runs. In this work, additional loading and resolution computation results (from the training run) will stored as assets in the CDS file.

A new attribute in the CDS file will instruct the VM to adopt classes in the enhanced loaded state, as computed by the training run. Therefore, no additional flag will be required when using such an archive. This attribute ensures that (selected) classes from the classpath are given "live" addresses immediately on startup. The normal request to load (and perhaps initialize) a class, which is normally an early side effect of application's main routine, is "short-circuited" and replaced by an even earlier side effect (a shifted computation, in Leyden terms). Note that initialization of application classes will not be short-circuited in this JEP.

The resulting effect is as if the class were loaded by platform code, not application code, and this platform code is executed before the application main routine was entered. We call this very early period the premain phase of execution. From the application point of view, loading happened during the premain phase, a long time before main was invoked. Thus, no further loading activity delays the execution of main itself. From the user point of view, the loading happens in a training run which emits a CDS archive.

From the VM point of view, the training run creates directly usable memory images representing loaded classes, and stores them in the CDS archive for quick adoption, before main runs. And within the context of the Java VM specification, it is as if the class initializer for Object instigates the loading of classes listed for early loading in the CDS archive. Traditionally, the VM has always loaded and initialized a few selected JDK classes on startup, before main is executed. Because we do not shift initialization of user-written classes, the only user-visible differences are the time at which the loading decision is finalized, and the speed of class loading and other startup activities.

The specific shiftable computations associated with bootstrapping some class class C are as follows:

As of JDK 21, CDS already supports the shifting of (a) and (b): The CDS archive stores pre-parsed InstanceKlass structures for Java classes encountered at build time. As the Java program executes, each InstanceKlass can be loaded quickly from the CDS archive and added into each relevant ClassLoader. This JEP shifts the later steps as well.

Some resolution computations, such as resolved references from subclasses to superclasses, or symbolic resolution of some constant pool entries, will also be stored as assets from the CDS file. Resolution results which are cannot be adequately predicted or speculated (transparently to user observation) will not be recorded and will require the usual dynamic resolution (perhaps with errors) at runtime.

As is currently the case today, a few selected classes are initialized in the CDS archive, although this will not be visible to users. Java objects required for such initializations (such as the Integer box cache) will be adopted from CDS assets. However, most classes in the CDS archive are not yet initialized.

Implementation

This JEP addes a new VM command-line option to be used with the -Xshared:dump option. The flag -XX:+CDSLoadedClasses will instruct the CDS archive to include the results of step (c) as well. These steps involve not only parsing but also loading (which means permanent definition to the VM), linking, and preparation. The pre-existing option -XX:SharedClassListFile=... will provide fine control over which classes are loaded in this way. No additional flag is required at application run time.

(All classes on the CDS class list are stored in the loaded state. A mixed processing mode, where some classes are loaded and others merely pre-parsed, is not supported in this JEP. It could possibly be added in the future, if required.)

A CDS archive created with -XX:+CDSLoadedClasses will have the following behavior:

These classes should come from only "known" locations, such as the JDK's modules file, the module path, or the classpath. Note that CDS doesn't store classes that are dynamically generated or loaded from other locations, although it can (in principle) store "hidden classes" (required by lambdas) when they are generated at build time and attached to constant pool states.

As a result, the application will no longer be able to do the following at runtime:

Consequences

Although this is a simple change, the benefits are deep and go far beyond faster class loading. Internal pointers to eagerly loaded class data, adopted from the CDS archive, will be immediately and unconditionally useful as "live" VM data, without waiting for resolution logic or other checks.

Once data is "live from the start" in this way, the VM can use it immediately. This is true of all kinds of data assets used by the JVM, such as metadata (classes and methods) and Java objects (class mirrors).

In addition, CDS assets (such as other classes, Java Class mirrors or eventually AOT code) can refer directly to class data by using pointers, instead of via complex provisional symbolic references or relocation records. When they are adopted into the VM, such mutually referential CDS assets are correctly configured with respect to each other immediately and without extra checks or resolution steps.

At most, a low-level relocation pass may be required, as is typical when mapping dynamically linked data that contains pointer. For CDS, this pass is extremely simple, being driven by a bitmap which shows where pointers occur in the mapped CDS assets.

In particular, the stability of class data pointers implies that constant pool entries can (in the future) be put into a resolved state when a CDS archive is created. (The VM does not need to re-resolve a constant pool entry once resolved.) This will allow the VM to avoid many dynamic operations normally required when a Java application configures itself at startup.

Stabilized, less speculative, immediately usable pointers are likely to greatly simplify the management of various kinds of data and metadata to be added in future JEPs.

Some important activities associated with loading, such as verification or v-table generation, are deferred to startup time in the deployment run, for many classes, in CDS versions up to and including this JEP. Later, additional work on CDS may shift the computation results (or "states") such activities require and produce (such as verification flags or v-table adapter code) into CDS for faster startup.

Compatibility Issues

Advanced features like user-defined class loaders, reflective class definition, and bytecode rewriting will not be helped by this JEP. If they are to be applied to some class C, that class C must also be loaded directly from CDS.

We believe very few programs need to use MethodHandles.Lookup::defineClass() to redefine classes from JDK's modules file, the module path, or the classpath. Such applications should not use the -XX:+CDSLoadedClasses option. Hidden classes defined at build time, like those from the lambda metafactory, will be fine.

If the VM is started with a Java Instrumentation Agent that has the capability of transforming the bytecodes of Java classes, the VM will refuse to use any CDS archive that were created with the -XX:+CDSLoadedClasses option. All Java classes will be incrementally loaded from classfiles. The application will be able to interoperate with the Java Instrumentation Agent, but its startup time will be slower than would be possible when the CDS archive is used.

In the future, we may allow Java Instrumentation Agents to be used when the CDS archive is created. That would allow the bytecode transformation to be shifted to application build time.

Testing

Risks and Assumptions

We assume, for most applications and frameworks that want to take advantage of the shifting afforded by -XX:+CDSLoadedClasses, that the corresponding constraint of not being able to incompatibly reconfigure the class path at deployment time is an acceptable tradeoff. Note that CDS interoperates with customer-defined class loaders, allowing customers to dynamically configure part of their class loading activity, even while loading other classes ahead of time. Early customer conversations suggest that customers are willing to accept fixed class paths, and to use additional class loaders when more flexibility is required.

Dependencies

This JEP is an evolution of the existing CDS implementation. Future JEPs are likely to depend on it, as it enables whole suites of classes to be loaded all at once, with their interdependencies already resolved.