JEP draft: JFR Cooperative Sampling
Owner | Markus Grönlund |
Type | Feature |
Scope | Implementation |
Status | Submitted |
Component | hotspot / jfr |
Discussion | hotspot dash jfr dash dev at openjdk dot org |
Effort | M |
Duration | M |
Reviewed by | Erik Gahlin, Vladimir Kozlov |
Created | 2025/02/19 14:12 |
Updated | 2025/05/08 22:06 |
Issue | 8350338 |
Summary
Improve the stability of the JDK Flight Recorder (JFR) when it asynchronously samples Java threads by only walking call stacks at safepoints, while minimizing safepoint bias.
Motivation
A running program consumes computational resources such as memory, CPU cycles, and elapsed time. To profile a program is to measure the consumption of such resources by specific elements of the program. A profile might indicate that, e.g., one method consumes 20% of a resource, while another consumes only 0.1%.
Profiling can help make a program more efficient, and developers more productive, by identifying which program elements to optimize. Without profiling, we might optimize a method that was consuming few resources to begin with, having little impact on the program's overall performance while wasting effort. For example, optimizing a method that takes 0.1% of the program's total execution time to run ten times faster will only reduce the program's execution time by 0.09%.
JFR, the JDK Flight Recorder, is the JDK's profiling and monitoring facility. The core of JFR is a low-overhead mechanism for recording events emitted by the HotSpot JVM or by program code. Some events, such as loading a class, are recorded whenever an action occurs. Others, such as those used for profiling, are recorded by statistically sampling the program's activity as it consumes a resource. The various JFR events can be turned on or off, allowing a more detailed, higher-overhead collection of information during development and a less detailed, lower-overhead collection of information in production.
JFR can create an execution-time profile that shows which program elements consume significant elapsed real time, i.e., wall-clock time. It does this by sampling the execution stacks of program threads at fixed intervals of, say, 20 milliseconds. Each sample produces a JFR event containing a stack trace. Tools such as jfr and JDK Mission Control can summarize a stream of such events into a textual or graphical profile.
In order to produce a stack trace for a program thread, JFR's sampler thread must suspend the target thread and parse the call frames on the stack. The HotSpot JVM maintains metadata for this purpose, but that metadata is valid only when a thread is suspended at well-defined code locations known as safepoints. If we sample stacks only at safepoints, however, then we will likely suffer from the safepoint bias problem: We risk losing accuracy, since a frequently-executed span of code might not be anywhere near a safepoint. The safepoint bias problem is well known and thoroughly researched.
In order to avoid the safepoint bias problem, JFR samples the stacks of program threads asynchronously, suspending threads and parsing their stacks at code locations that are not necessarily safepoints. Since there is no guarantee that the metadata is accurate, to guide the parsing of stack frames at non-safepoints, JFR's sampler thread uses heuristics in order to generate a stack trace.
Unfortunately, these stack-parsing heuristics are inefficient and, worse, when their results are incorrect then they can crash the JVM. JFR attempts to prevent such crashes via platform-specific crash-protection mechanisms, but those mechanisms can fail in the presence of concurrent activity such as class unloading.
Description
We redesign JFR's sampling mechanism to avoid relying on risky stack-parsing heuristics. Instead, we parse thread stacks only at safepoints.
To avoid the safepoint bias problem, we take samples cooperatively. When it is time to take a sample, JFR's sampler thread still suspends the target thread. Rather than attempting to parse the stack, however, it just records the target's program counter and stack pointer in a sample request, which it appends to an internal thread-local queue. It then arranges for the target thread to stop at its next safepoint, and resumes the thread.
The target runs normally until its next safepoint. At that time, the safepoint handling code inspects the queue. If it finds any sample requests, then, for each one, it reconstructs a stack trace, adjusting for safepoint bias, and emits a JFR execution-time sampling event.
Aside from being safe, this approach has several other advantages:
-
Creating a sample request requires hardly any work, and could be done in response to a hardware event or inside a signal handler.
-
The code to create stack traces and emit events is simpler. For example, it can dynamically allocate memory when it runs on the target thread, which it could not do when running on the sampler thread.
-
The sampler thread has less work to do, since it need not run heuristics, improving scalability.
This approach works well when the target thread is running Java code, whether interpreted or compiled, but not when the target thread is running native code. In that case, we continue to use the existing approach.
Future Work
Our new approach does not entirely avoid safepoint bias. In some situations, such as when sampling inside a method for which the HotSpot JVM has an intrinsic implementation, it may be impossible to parse the stack. In these cases, the recorded stack trace will reflect the last Java stack frame, thereby introducing some bias. We intend to address this in future work.
Alternatives
The HotSpot JVM does have an existing internal but unsupported mechanism, AsyncGetCallTrace
, a facility used by some popular third-party Java tools. Unfortunately, this mechanism relies on the same kind of risky stack-parsing heuristics that JFR uses today, but without any crash protection, making it even more unsafe to use. Another drawback is that it is based on the POSIX SIGPROF
signal, an equivalent of which does not exist on Windows.
Testing
This is strictly an implementation change. Existing unit, integration, and stress tests will suffice.
Dependencies
The current implementation of JEP 590 (JFR CPU-Time Profiling) is derived from the AsyncGetCallTrace
mechanism, and suffers from all of the same problems. A safer implementation could be built on top of the work described here.