JEP draft: (DRAFT) Method Profiles in CDS Archives

AuthorsIgor Veresov, John Rose
OwnerJohn Rose
TypeFeature
ScopeImplementation
StatusDraft
Componenthotspot / compiler
EffortM
DurationM
Created2024/02/01 20:40
Updated2024/02/08 08:44
Issue8325147

Summary

Store method profiles from training runs in the CDS archive, thereby enabling the JIT to begin compiling earlier during warmup. This enhancement to CDS is part of Project Leyden.

Goals

Non-Goals

Success Metrics

Motivation

This JEP is a part of Project Leyden, which aims to improve the startup and warmup time of Java applications. This JEP stores, in a CDS archive, the results of profiling during a training run for subsequent use at the time of application deployment.

Shifting the work of profiling to a training run means that, in a subsequent deployment run, a JIT can produce optimized code much sooner, which can improve warmup. Application startup may also improve, because there is less time spent collecting profile information.

Background

One may recall that shifting computational work is a fundamental Leyden technique; in this case the expensive work of gathering profiles is shifted out of application deployment and into an earlier training run. (Another foundational Leyden technique is constraining computation, which is not done by this JEP.) A CDS archive contains assets recording of the results of shifted computations, for later adoption during application deployment. This JEP makes CDS more useful by adding profile data as a new kind of asset.

Why We Profile

Profiles record observations about application dynamics for use by the JIT. Profile data includes method invocation counts, branch frequencies, and object classes encountered, throughout all warm parts of the program. Seldom-executed ("cold") methods are not allocated profile data. It is generally a bad idea to collect method profiles for all methods; the space costs are prohibitive, and the JIT doesn't need data for cold methods. Also, already-optimized methods do not accumulate additional profile data, unless they deoptimize, since collecting "more of the same" would slow the optimized code.

The JIT uses profiles to produce code which is highly optimized, for the specific application dynamics which are recorded in the profiles. The benefit depends on whether the dynamics are stable: If future application execution is similar to past execution, the JIT code continues to perform well. If a new branch or a new class appears, the JIT may recompile in response, causing a temporary slowdown, until the better code is installed. In this way the JIT responds to differences, either small or large, in future execution dynamics.

The previous description applies fully to classic HotSpot execution. Even in one application run, the "hot spots" can move around over time, and the JIT reoptimizes as they do. The description also applies fully to the present JEP, where profiles from past VM behavior are collected in training runs and passed to the VM through the CDS archive. It is always the case that the VM and its JIT are open to new behavior that invalidates past optimizations. With profiles stored in CDS, it is to be expected that the old and the new profiles will resemble each other. If they do, then the early runs of the JIT will produce properly optimized code, even with respect to the newer profile data.

A difference may be observed here, between adaptive JIT based and non-adaptive pure AOT code shapes. The JIT can make optimistic speculations and recover from their failure, while pure AOT code must make conservative assumptions about future execution, and cannot optimize as aggressively. Even the best static analysis techniques cannot predict a full set of optimizations for application behavior, since such behavior emerges from a Turing-complete evolution of computation states. That behavior is undecidable, even by the most powerful software, without actually running the program itself, on a particular input. But partially executed programs, in practice, provide useful information about likely future execution, information which can be gathered in no other way. This is why profiles, including the stored profiles of this JEP, are important to modern computing systems.

Although the JVM performs some static analysis, and Leyden will allow more expensive and comprehensive static analyses in the future, the special strength of the JVM is its ability to respond flexibly and dynamically to unpredictable application behaviors. These behaviors are often statically unpredictable, and may even include dynamic loading of code not present during any static analysis. But the VM optimizes it all, because it can use the evidence of actual prior execution, rather than rely solely on a weakly predictive static model. The resulting flexibility, with full optimization, benfits many Java workloads, and is one of the reasons for Java's success. The cost for this is time spent gathering and using the dynamic profile information, which is the issue addressed by Project Leyden.

Description

This is an enhancement to existing CDS workflows and archive file formats. A user of CDS executes a training run with a special switch (of the form Xshared:dump) telling the VM to emit a CDS archive when the training run exits. This archive contains various assets (loaded class data, and now profiles) extracted from the training run. With this JEP, the collected profile assets give an overall view of application dynamics, which can be used by a JIT to produce better code.

The specific command line options to enabling and controlling the collection of profile assets is TBD. It will be aligned with other CDS work for Project Leyden, including Loaded Classes in CDS Archives (JDK-8315737).

The CDS archive is then used to start a deployment run of the VM, using the same classpath but perhaps a different main class and/or command arguments and/or inputs. When possible, the VM adopts assets from CDS in preference to recomputing them, thus reducing startup and warmup time.

New markings in the CDS file will instruct the VM to adopt those profiles as relevant to the application deployment run, even before the application begins to run. The application will also collect its own profiles, as usual. The VM's compilation policy will be tuned to read from both profiles, and to allow the JIT to run earlier if profile information is available from a training run.

This is a win because the gathering of profiles normally requires many milliseconds of CPU time, sometimes even minutes for large applications. This work is required before the JIT can do its own work to create code optimized for those profiles. As previously noted, shifting this work out of the application deployment can makes the application warm up faster.

Note that method profiles are only collected for methods whose invocation counts indicate that further scrutiny is needed. It is generally a bad idea to collect method profiles for all methods; the space costs are prohibitive.

Method profiles will be stored in the CDS archive if they are created normally during execution of the training run. In addition, only methods which were compiled during the training run will have their method profiles ("method data objects") stored in the CDS archive. The intention is to avoid bloating the CDS archive with useless assets.

Format of Stored Profiles

CDS provides an excellent storage medium for data which is easy for the VM to adopt directly into its runtime data structures. CDS data is organized for efficient sharing, from a training run to any number of deployment runs. In particular, the CDS data is mapped directly into the VM's memory address space, and edited lightly to relocate pointers to be consistent with the base address of the mapped segments. This organization of data is closely similar to the shared libraries found on all platforms on which HotSpot runs today.

The stability of class pointers provided by the JEP for Loaded Classes in CDS Archives (JDK-8315737) will make execution profiles easier to adopt from CDS directly into the VM. If this JEP is made to apply to CDS classes in the unloaded states, additional barrier logic will be added to prevent the JIT from acting on class profile records which are not relevant to the application, because a class encountered during a training run is not yet loaded during the deployment run. Such barrier logic adds its own costs, but more crucially it reduces the amount of data the JIT can use during warmup. Therefore, these two JEPs work best together in concert.

Compatibility Issues

This JEP introduces no new compatibility issues. If a profile adopted from CDS turns out to mispredict the actual application behavior, then there will be some wasted effort by the JIT, slowing down the application.

This phenomenon, of a possible but unlikely slowdown, is akin to the well known problem that a data compression algorithm can sometimes increase the size of its input, when that input fails to be predictable.

Alternatives

Doing nothing does preserves the current profiling delays before the JIT can get to its useful work.

Making the JIT itself optional, by generating AOT code, is often a useful tactic. A pure AOT solution where there is no JIT cannot adequately optimize for actual application dynamics, as explained above ("Why We Profile").

A partial AOT solution, where reasonable AOT code is replaced by a delayed JIT, seems to be the best solution on balance. (The delayed JIT can stay out of the application's way, and take its time to get the final code just right, based on the latest profiling information.) That requires additional work to built out the AOT infrastructure, so it is a follow-on JEP. Even before AOT, the present JEP provides important speedups to warmup.

We could put all of our resources into a partial AOT solution, backed up by a delayed JIT phase. However, prototyping indicates that stored profiles improve the performance of that design as well, since they allow the JIT, though mostly delayed, to attack performance problems earlier during warmup, when there are small changes in application behavior that require recompilation. So this JEP has its own place, not subsumed by any other JEP.

Testing

// What kinds of test development and execution will be required in order // to validate this enhancement, beyond the usual mandatory unit tests? // Be sure to list any special platform or hardware requirements.

Risks and Assumptions

// Describe any risks or assumptions that must be considered along with // this proposal. Could any plausible events derail this work, or even // render it unnecessary? If you have mitigation plans for the known // risks then please describe them.

Dependencies

This JEP is independent of other JEPs, although it works best with the JEP for Loaded Classes in CDS Archives (JDK-8315737).

// Describe all dependencies that this JEP has on other JEPs, JBS issues, // components, products, or anything else. Dependencies upon JEPs or JBS // issues should also be recorded as links in the JEP issue itself. // // Describe any JEPs that depend upon this JEP, and likewise make sure // they are linked to this issue in JBS.