JEP draft: CDS Implementation Notes

AuthorsJohn Rose, Ioi Lam, Dan Heidinga
OwnerJohn Rose
TypeInformational
ScopeImplementation
StatusDraft
Componenthotspot / runtime
Created2024/07/11 06:15
Updated2024/07/18 00:06
Issue8336232

Description

This JEP presents technical information about the Java virtual machine’s Cache Data Store (CDS), its concepts, its internal operations, and its current limitations. This material is not intended as a tutorial, but rather as a detailed reference, and as such it assumes familiarity with the basic concepts of CDS.

Tutorials on CDS may be found elsewhere, such as in these places:

Implementation notes table of contents

How Java’s dynamism is supported by CDS

Java applications can be reliably and easily composed from a huge menu of libraries and frameworks, and can be configured for testing and deployment easily, with little ceremony. Programmers enjoy fast development cycles, easy observability, and a powerful set of tools. The foundation for all this is a pair of Java’s “super powers”: separate compilation and dynamic linking. Classes can be managed and inspected in isolation by inspecting their classfiles. When they are composed by dynamic linking, their integrity is protected by the VM, and yet the VM also gives them high performance access to each others’ API points. (Such API points include fields and methods, accessed both reflectively and directly via bytecode.) Crucially, the configuration of an application arises naturally from the classes presented at run time, as they connect to each other; there is no “linking ceremony” required, at build time, to exhaustively define the application configuration. Most of the mechanical steps of Java application configuration happen on the fly, invisibly to the programmer.

This works, in part, because Java, despite being staticly typed, is a highly dynamic language: Loading, linking, machine code generation, and storage reclamation are some of the dynamic behaviors. All of this dynamism, while it provides great flexiblity to the programmer, comes at a low-level cost. Each execution of the application must repeat the same work over and over, each time finding the right classfile bytes for a given class name, or the right addresses of methods or fields, or the right runtime support data, or the right machine code to optimize the application. This repetition is necessary in today’s Java VMs, as long as they perform most of their operations lazily, just in time. Dynamism allows computed decisions to be deferred until the last moment; dynamism allows loading and linking and optimization to be organized as just-in-time operations, maximizing flexibility.

When deploying an application, many of these dynamically computed decisions have stabilized and can be expected to have the same result as previous runs. Such stability does not cancel dynamism. If an application in production decides to perform a new behavior not previously expected, the VM can respond dynamically to the change, perhaps loading some new classes, perhaps discarding some previously optimized code and data, perhaps reoptimizing. Only the smallest and simplest Java applications are immune to such unpredicted behavior, but just-in-time processing, allowed by dynamism, covers all the possibilities in every application.

The overall set of configuration and optimization decisions made by an application (with the VM that runs it) are thus predictable, in many cases. The specification of the Java VM allow much freedom to schedule decisions, however dynamically they are requested. An unpredicted decision must always be handled as a just-in-time service, but a predictable one can also be handled ahead of time. In many cases, it is straightforward to provide AOT resources, serving them up without delay to the application, whenever it needs them. The information required to make this shift from JIT processing to AOT processing is prediction, foreknowledge of the decisions made to configure or optimize the application. The predictions do not need to be 100% accurate, as long as there is a way to recover from misprediction. Often, the most direct way to make these predictions is to perform a training run of the application and observe the decisions made during that run. Assuming similar future runs will make similar decisions, the VM can prepare, ahead of time, to execute them for the next run. This is the basis for the CDS technology.

Optimizations which optimistically assume some prediction, but have a fallback in case of misprediction, are sometimes called speculative optimizations. They are very common in the Java VM, since many conditions in Java programs are dynamically determined but also amenable to prediction ahead of time. The VM acts as though some fact is true, while also having fallback paths to compensate for speculation failure - that is, if the supposed true fact turns out to be false after all. Outside of CDS, the VM might speculate that some method is never overridden (at least until a class is loaded that defines an override), or that some branch of code is never taken (at least until it is taken), or that some local variable has exactly one dynamic type (at least until an object not of that type shows up), or that some method deserves extra compilation effort it is used often (and if the application stops using it, the method code can be removed).

When creating a CDS archive, the VM can speculate that previous decisions, recorded during a training run, will be made the same way again later. If application code in production makes different decisions, the VM can easily detect the new requirements. For example, if the production run turns out to need a different set of classes, the VM can simply process the new classes just in time, in the traditional fully dynamic way, as if CDS had never been involved. The same is true if a class in the production run asks to link to some API (another class, method, or field) not touched on in the training run; the unpredicted linking decision can be satisfied just in time. All of this is true no matter how the application initiates loading and linking of APIs, whether via bytecodes or via reflection. In all cases fully Java’s flexible dynamism coexists with stable predictions stored in the CDS archive.

What’s in a CDS archive file

The foundational ability of CDS is to speculate class loading decisions, based on an AOT training run. In some workflows, the list of classes observed in the training run is exported as a class-list file, which is then assembledXX It can also operate from a textual list of selected classes, although this is highly error-prone.

For each classfile it selects, it can save away a pre-parsed (or “pickled”) internal form, as an independently loadable asset within the CDS archive file. The internal form is substantially the same as that of the VM’s internal class metadata. It is accompanied by “pointer maps” that tell how to relocate pointers which are embedded in the metadata, so that the CDS archive can be loaded at unpredictable base addresses in the virtual memory of the production run.

But when the VM starts, although all CDS assets are immediately available in VM memory, they might not yet be usable as classes, nor can they be linked together, if they are only in the pre-parsed state.

When the Java application eventually gets around to requesting a CDS class for the first time, the VM permanently makes the pre-parsed form “live” and associates the class name to the live metadata. Only at that point can it can be linked to other loaded classes. This can be viewed as a partially AOT-order, partially JIT-order implementation of class loading.

On the other hand, if the archive is built with -XX:+AOTClassLinking, the VM itself initiates AOT loading, placing the metadata images into the VM’s system dictionary. This happens in a very early period before the application’s main method starts to run, and thus called the premain phase of execution. At this time, both loading and linking happen quickly, from CDS assets already present in VM memory, and pre-formatted for easy adoption as live metadata.

Because of the way assets are brought into VM memory from the CDS archive, they have stable and predictable memory locations. This stability in turn allows them to be pre-formatted in an already-linked state, with direct references to each other. Very specifically, the enhanced pre-formatting affects the constant pool entries in each class asset; they can be populated with resolved locations and sizes of fields, methods, and other classes, as long as those entities are also present in AOT loaded classes.

Thus, these AOT loading and linking activities happen more quickly, compared to classes which are processed piecemeal by just-in-time loading and linking. But by an appeal to an “as-if” optimization, the loading and linking may also be viewed as happening just in time, on demand by the application. The only evidence of the shift from JIT order to AOT order is indirect, perhaps from a change in file system activity, or from log messages emitted by the VM.

When an “as-if” optimization is working, the application cannot distinguish “ahead of time” linking from “just in time” linking, except for speed. Such as-if rules are routine in VM technology. As another example, code compiled by the VM runs “as if” the VM’s interpreter were running it, only it runs faster. Also, the GC allows the application unlimited allocations “as if” memory were infinite.

A benefit of the behavioral similarity of loading and linking, between JIT and AOT orders, is that CDS can still arrange to load or link some application classes the old way, to handle corner cases that would be awkward to load in the new ahead-in-time order. Thus, although the bulk of classes are likely to be pre-formatted in the CDS archive for AOT loading, some may not not be in the new form. This allows CDS to be flexible when dealing with more open-ended features of the VM, such as user-defined class loaders. Likewise, CDS may choose not to preset some individual linking decision, even in an AOT-loaded class, if CDS has some reason to believe that decision could vary in the production run, or if CDS believes it would be wasted effort. All these choices are transparent to the application.

The presence in VM memory of many application classes, at predictable (“stabilized”) addresses, is likely to be a springboard for further enhancements to CDS. Additional kinds of VM data, such as method profiles and compiled code, can be stored as new assets in the CDS archive, pre-formatted so as to directly link to whatever classes, methods, and fields that they need.

Kinds of AOT processing

Different versions of CDS perform different levels of ahead-of-time processing. The earliest versions of CDS simply pre-parse the class files, but do not attempt to install the classes until the usual “just in time” request is made for class loading, by application logic. Later versions perform increasing amounts of AOT processing.

The various kinds of AOT processing are enabled by command line options given when the CDS archive file is created. They are stored within the CDS archive file. When the VM makes a production run and is instructed to use a particular archive file, it performs the AOT processing requested by that archive file. No other command line option or configuration setting is required in the production run; it all comes from the CDS archive.

Some kinds of processing can be disabled, which may be useful for diagnosing problems. For example, -XX:-AOTClassLinking (note the - minus sign) disables class loading and linking. It would also disable subsequent AOT optimizations, if any, such as AOT compilation. If the production run told to disable AOT loading, the VM attempts to fall back to treating the CDS assets as pre-parsed classes, to be loaded in the traditional “just in time” order.

The -XX:+AOTClassLinking option puts an attribute into the Cache Data Store that instructs the VM to bring cached classes into an loaded state, immediately on startup. This ensures that classes used by the application (as discovered by the AOT training run) are immediately available. However, cached classes which cannot be AOT-loaded (such as those with user-defined class loaders) are loaded only on demand (that is, just in time), from a pre-parsed state in CDS.

The -XX:+AOTClassLinking option also enables subsequent AOT processing, specifically AOT linking of classes which are AOT-loaded. Only constants which refer to other AOT-loaded classes are linked.

Class constants which configure the building of lambdas and string concatenation logic are linked ahead of time. This is done by running the relevant invokedynamic bootstrap methods and dumping CDS assets which encode the resulting chains of method handles and hidden classes. In this way, -XX:+AOTClassLinking supports AOT loading and linking of classes which are dynamically generated, not just those which are on the class path or module graph. This AOT processing of bootstrap methods is limited to methods in java.base which are known to be free of side effects; it cannot (at present) be extended to arbitrary methods from other language runtimes.

Another kind of AOT processing (in the future) is the collection of profiles, under the (future) flag -XX:+AOTMethodProfiling. This would capture selected method profile information from the training run and assemble it into the CDS archive, for use during the production run. The production run contributes its own profiling information as well, and the VM compiler will use the “freshest” profile information available.

Another kind of AOT processing (in the future) is the saving of compiled code profiles, under the (future) flag -XX:+AOTMethodCompilation. This would compile methods observed to be hot during the training run, and assemble them into the CDS archive. The VM loads them as needed to accelerate startup or warmup. The production run contributes its own JIT-compiled methods as well, and the VM will execute the “freshest” methods available.

Consistency between training and production

As a general principle, if a training run (and any subsequent dump command) generates a CDS archive, and if the VM chooses to use it in a production run, the production run will produce substantially the same results as if the VM had ignored the CDS archive.

Of course, the two runs might have differences in timing, footprint, and order of access to system resources like the file system. And some aspects of Java execution are intrinsically non-reproducible, if they use the entropy generated by physical processor concurrency or a true random number generator. But with or without the archive, the VM will run the application in a way that complies with the Java VM specification, which means that, either way, results will comply with programmer expectations.

In order to ensure that CDS archive contents are relevant, CDS enforces rules ensuring consistency between training runs and production runs. In short, CDS ensures that, in a real sense, both runs are processing the same application. Indeed, these rules embody what it means for two application runs to be “the same”.

Here are the consistency rules CDS enforces:

In some cases, a training run will refuse to generate a CDS archive if there is no possibility of running “the same application” in production. Here are the cases:

Non-supported features may be supported in the future. Consistency requirements may be relaxed in the future.

Each CDS archive records enough information to make necessary consistency checks. Tools to inspect and manipulate such information may be created in the future.

If the VM determines it cannot a CDS archive, it will run without it (if -Xshare:auto is set) or emit an error diagnostic (if -Xshare:on is set).

CDS accepts many differences between training and production runs:

Some CDS optimizations, such as the provisioning of interned strings or the linking of invokedynamic bytecodes, are implemented using archived Java heap objects. Therefore, these optimizations will not be available for garbage collectors that do not support archived Java heap objects (e.g., ZGC). However, most CDS optimizations, such as the AOT class loading, and AOT linking of references to classes, fields, and methods, are available regardless of choice of collector.

Additional limitations

Here are some additional practical caveats and limitations on the use of CDS, beyond the basis requirement of consistency between training and production runs:

Choosing a training run

A training run captures application configuration decisions and execution history, in the expectation that this information will be relevant to the production runs that come later. Therefore, to be as useful as possible, a training run should resemble the intended production runs, to the extent that it fully configures itself and exercises all code paths that will be required in the production runs.

Here are some specific tips:

Measuring startup and warmup

Although startup and warmup are similar concepts, to measure them properly, one must understand their distinction. For practical purposes, they are defined in terms of some particular application performing a repeatable workload, such as a request server. Startup time is how long the VM takes to load and execute enough code in the JDK, in libraries on the class path or module graph, and in the application, so that the application can start to serve requests. Warmup time is how long the VM takes to optimize a running application so that it serves requests with peak performance. Warmup usually consumes more resources (time and memory) than startup.

In more detail, startup is a series of one-time setup tasks, while warmup is a continuing optimization. During startup, the VM and application load, link, and initialize classes, and configure other resources such as Java objects. An application warms up over time, first as the VM selectively compiles
byte code from class files to machine code, and then as the VM tracks “hot spots” in application code and reoptimizes their machine code. Besides code generation, the VM tunes certain ergonomic settings during warmup.

Warmup and startup overlap during the milliseconds after the application launches. And both activities can trail off into an indefinite future: An application can run for seconds or minutes and suddenly perform new startup activities because it accepts a new kind of request. The VM can also work for a long time optimizing the application, eventually (after seconds or minutes) reaching a steady state with peak performance. Even then, if a new kind of request suddenly arrives, the VM may have to re-enter warmup activities to accommodate new code paths. Both startup and warmup tasks can be addressed by AOT or JIT techniques, whether speculative or not, and usually all of the above. Thus, startup and warmup are distinct sets of activities, and each deserves its own attention when assessing and improving VM technology.

In the big picture, startup and warmup are not the only important measures of quality. In carrying out its duties, an application should consume moderate amounts of time and space, delivering good throughput (time per workload unit) and footprint (working memory size). Of course, it should also be correct (producing the right answers) and stable (predictable execution, without crashes or any other misbehavior). Throughput, correctness, and stability have always been core values within the Java ecosystem. Project Leyden is making a fresh focus on improving startup, warmup, and footprint, by shifting selected computations to new points in time, either earlier (ahead of time, AOT) or later (just in time, JIT). Within that big picture, this work is about AOT optimizations to improve startup, and eventually warmup.

Each deployed application will need its own specific definition of what constitutes one repetition of its repeatable workload; this could be a service request, or an integration test, or a benchmark, or a stress test, or some other “omnibus test” of many parts of the application. The first repetition loads and initializes all relevant classes and application data structures, while subsequent repetitions spur the VM to optimize the application, eventually reaching peak performance. In the setting of such an application and its repeatable workload, warmup can be measured as the time to reach a given fraction (such as 95%) of the eventual peak throughput, while startup can be measured as the time to bring the first workload repetition up to some application-specific “ready point”, or else to the end of the first repetition of the workload.

A brief history of CDS

In one form or another, CDS has been built into the HotSpot VM since JDK 5 in 2004. At first its expanded name was “Class Data Sharing”, which is reflected in options like -Xshare:…. Over the years it has expanded its capabilities beyond the storage of shareable class data, so the preferred alternative expansion for CDS is now “Cache Data Store”. Either phrase refers to the same technology.

Since 2017, every Java runtime has an AOT cache of over 1000 definitions for core JDK classes, created when the JDK is built from source code. In this sense, CDS is ubiquitous, even for Java programmers who have never heard of it. Even so, CDS has been very much a “power user” feature over most of its existence.

FIXME (More here about major CDS feature introductions, such as the dynamic archive, or cached objects.)

CDS and sharing

CDS uses memory mapping to quickly populate VM memory with the whole content of the CDS archive file, allowing the VM to pick and choose assets within the file to adopt into its live metadata. This mapping is relocatable, but is organized to prefer a certain base address, if that is available. If the preference is met, the mapped file does not need to have its pages edited to relocate their embedded pointers (and thus “dirtied” by copy on write). Clean pages allows sharing of mappings between VM processes, reducing footprint. This behavior is the motivation for the (now obsolete) acronym expansion “Class Data Sharing”.

But it should be noticed that modern CDS deployments often lose much of their page sharing due to dynamic relocations, because mapping addresses are made unpredictable by current practices such as address space layout randomization (ASLR).

With any AOT technology like CDS, there is always a tension between either under-provisioning, which may force VM startup to consume more CPU as it repeats work, or else over-provisioning, which may cause unused resources to be consume memory.

Future work is likely to improve footprint by some combination of “clawing back” sharing lost to ASLR, further tuning the tradeoff between over- and under-provisioning of assets, and compressing seldom-used assets offline (trading time for space).

Glossary

Here is a list of terms which are useful when discussing CDS.