JEP draft: Post-mortem crash analysis with jcmd
Owner | Kevin Walls |
Type | Feature |
Scope | JDK |
Status | Submitted |
Component | core-svc / tools |
Discussion | serviceability dash dev at openjdk dot org |
Effort | M |
Duration | M |
Reviewed by | Alex Buckley |
Created | 2024/03/18 12:00 |
Updated | 2025/09/11 08:04 |
Issue | 8328351 |
Summary
The jcmd
tool supports the monitoring and troubleshooting of a running HotSpot JVM. Extend jcmd so that it can also be used to diagnose a JVM that has crashed. This will establish a consistent troubleshooting experience in both live and post-mortem environments.
Goals
-
Make the troubleshooting of crashed JVMs as familiar and productive as troubleshooting live JVMs.
-
Enable post-mortem diagnostics on Linux and Windows.
-
Reduce the future cost of JDK maintenance by focusing serviceability work on
jcmd
rather than other tools and components such asjhsdb
and the underlying Serviceability Agent.
Non-Goals
-
It is not a goal to support all
jcmd
diagnostics in post-mortem environments. -
It is not a goal to run and debug Java code in post-mortem environments.
-
It is not a goal to enable post-mortem diagnostics on all supported operating systems.
-
It is not a goal to remove legacy serviceability tools and components, such as
jhsdb
and the Serviceability Agent, at this time.
Motivation
Serviceability is the ability to monitor, observe, debug, and troubleshoot an application. Monitoring and observability tools allow you to connect to a live JVM and examine the application. This includes the application’s code, such as loaded classes and just-in-time compiled methods, as well as its state, such as the Java heap and the stacks of Java threads and native threads. JDK tools such as jstack
and jmap
produce heap dumps and thread dumps from a live JVM, while tools such as JDK Mission Control enable you to browse memory usage and threads visually. If a tool connects to the JVM via the JMX protocol then you can also troubleshoot the application by, e.g., activating more verbose logging by specific subsystems.
In extreme scenarios, the JVM may terminate unexpectedly in a way that cannot be monitored by such tools. This can occur because of buggy native code in the application or libraries, or due to bugs in the JVM itself. At termination, the HotSpot JVM emits a crash report file (hs_err_pidXXX.log
) that contains information about the fault and the state of the application, such as the stack trace of the failing thread and a list of loaded libraries. The operating system also saves the memory of the JVM process to a file known as a core dump. You can use crash reports and core dumps post-mortem to gain a deeper understanding of what went wrong and identify steps toward resolution.
Unfortunately, the tools available for the post-mortem analysis of JVM core dumps are problematic:
-
Using native debuggers such as
gdb
is frustrating because they cannot interpret JVM data structures in core dumps to display a Java-level view of application state. For example, if you determine that a Java object starts at a particular address in memory, then finding something as basic as the class of the object requires manually decoding words in the object's header. Debugger scripts can help automate the decoding of JVM data structures in core dumps, but the work remains error-prone and the scripts require ongoing maintenance since the layout of object headers changes over time. -
The
jhsdb
tool, introduced in JDK 9, can open a core dump and interpret JVM data structures. It uses a HotSpot-internal mechanism known as the Serviceability Agent (SA). Other launchers that invoke the SA have been available in previous releases. Unfortunately, the SA codebase is brittle and dated; it requires continuous maintenance as the JVM evolves, and major work to expose new JVM features. (Despite its name, the Serviceability Agent is not a Java agent, i.e., a component that can alter the code of a running application.)
jcmd
, introduced in JDK 7, is a lightweight tool for JVM diagnostics. It can connect to a live JVM via the Attach API and present Java-level information about an application. It offers over 50 commands for listing Java threads, detailing memory use, examining the state of the garbage collector, and so forth. However, jcmd
can attach only to live processes. Given its flexibility and popularity, it would be useful if jcmd
could also be used for the post-mortem analysis of core dumps. This would give a consistent experience in both live and post-mortem troubleshooting.
Description
We extend jcmd
to support post-mortem analysis by using the data in a core dump to recreate the JVM’s memory image at the time of the crash, and by executing code in the JVM binary to interpret the data structures in that image. This revival technique enables jcmd
’s diagnostic commands to work exactly as they do in a live JVM, with no changes to the commands or their implementations.
For example, if a JVM crash results in the core dump core.1234
, then running:
$ jcmd core.1234 Thread.print
will produce the same kind of output as when jcmd
is connected to a live JVM:
Opening dump file 'core.1234'...
2025-04-01 14:17:18
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25-internal-LTS-2025-03-30-1738352.name... mixed mode, sharing):
...
"Thread-0" #34 [1183517] prio=5 os_prio=0 cpu=0.99ms elapsed=0.07s tid=0x00007ff8fc208cc0 ...
java.lang.Thread.State: RUNNABLE
Thread: 0x00007ff8fc208cc0 [0x120f1d] State: _at_safepoint _at_poll_safepoint 0
JavaThread state: _thread_blocked
at ThreadsMem$1.run(ThreadsMem.java:25)
- locked <0x00000000fe300c98> (a java.lang.Object)
at java.lang.Thread.runWith(java.base@25-internal/Thread.java:1460)
at java.lang.Thread.run(java.base@25-internal/Thread.java:1447)
...
...
jcmd
in post-mortem environments
jcmd
currently supports 57 commands in a live JVM, of which 26 are relevant and available in the post-mortem environment:
Compiler.CodeHeap_Analytics Compiler.codecache Compiler.codelist Compiler.directives_print
Compiler.memory Compiler.perfmap Compiler.queue
GC.class_histogram GC.heap_dump GC.heap_info
JVMTI.data_dump
Thread.print
VM.class_hierarchy VM.classes VM.classloader_stats VM.classloaders
VM.command_line VM.events VM.flags VM.metaspace
VM.native_memory VM.stringtable VM.symboltable VM.systemdictionary
VM.version
help
The post-mortem environment must have the same operating system and CPU architecture as the environment in which the JVM crashed.
It is often difficult to access production servers where the JVM has crashed, so it is common to transport core dumps to developer workstations for analysis. Such workstations typically run newer JDKs than production servers, so to facilitate analysis, it is not necessary for the jcmd
tool to come from the same JDK as the JVM that crashed. The jcmd
tool in one JDK release can revive core dumps from another JDK release as long as the JVM binary from the other release is available. The other release may be older or newer than the release where jcmd
is running, as long as both releases are at least JDK NN. When running jcmd
, the path to the JVM binary is specified via the new -L
option:
$ jcmd -L /transported_files/libjvm.so core.1234 Thread.print
In JDK NN, jcmd
can take either the name of a Java class or the filename of a core dump as an argument. Since the filename of a core dump might resemble a class name, the new -c
option indicates that the argument is, in fact, a core dump:
$ jcmd -c MyApp GC.heap_dump
Reviving a core dump
To revive a JVM instance from a core dump, jcmd
creates a subprocess so that the revived instance has its own address space, distinct from the address space of the JVM running jcmd
. It populates that address space by memory-mapping the core dump to recreate the internal data structures of the JVM, the Java heap, and the stacks of native threads, all at their original memory addresses so that pointers remain valid. jcmd
also loads the JVM binary (libjvm.so
) at its original memory address.
The revived JVM instance is not live in the same way it was at run time. No Java code can be executed and no garbage collection occurs. The instance is, however, sufficiently complete that jcmd
can interpret data structures in the revived instance by calling native JVM functions in the revived instance — the exact same functions it invokes on a live instance when using the Attach API. This approach makes the jcmd
diagnostic code independent of whether the observed JVM is alive or dead; either way, the diagnostics call the same native functions. Thus no new code is needed to, e.g., inspect an object, extract a heap dump, or obtain a thread's stack frames.
Aside from the JVM binary, it is not necessary to load any native library from the crashed process. This enables troubleshooting a core dump after transporting it to a different machine, where the same native libraries might not be available. For post-mortem analysis, jcmd
needs only the core dump and the JVM binary which crashed.
Future Work
-
We plan eventually to support post-mortem troubleshooting with
jcmd
on MacOS, in addition to Linux and Windows. -
We expect to make further enhancements to
jcmd
to aid troubleshooting in both live and post-mortem environments. Two examples are new commands for inspecting arbitrary Java objects and for extracting Java class definitions ("class dumping"). We also expect to enhance some existing commands, e.g.,VM.uptime
, to work in both environments. -
Some existing diagnostic commands are implemented in Java rather than in native code. This means they are not compatible with process revival and cannot be used in post-mortem environments. Of these Java commands, some are only of value live, e.g.
ManagementAgent.start
,JFR.start
. We may investigate other commands written in Java to see if they could work in post-mortem environments. For example,Thread.dump_to_file
, which outputs a list of virtual threads in JSON format, could be useful post-mortem. -
In the future, developers of new commands will need to consider the possibility of post-mortem execution when choosing the implementation language.
Alternatives
-
Invest in improvements to the Serviceability Agent (SA).
SA is written in Java code. It relies on a native library (
libsaproc
) to return the contents of raw memory from either a running process or a core dump. This makes the SA code independent of whether the observed JVM is alive or dead, but it also means that SA must turn byte arrays returned bylibsaproc
into instances of Java classes that model threads, stack frames, locks, and so forth. This interpretation is tedious and intricate; it requires the maintainers of SA to, e.g., know how each garbage collector lays out Java objects in the heap. It requires the SA code to be updated continously as the JVM evolves. Finally, it duplicates the functionality of vast swathes of native code in the JVM which manage run-time data structures.By contrast, the new
jcmd
technique of reviving the memory and code of a crashed JVM instance reuses the native code that managed the JVM's data structures when the JVM was alive. Duplicating the memory of a formerly-live process is much more efficient than duplicating the code required to understand it.SA embraced high implementation complexity in order to support a rich Java API, but the rapid pace of JVM development has made that complexity costly to maintain and so the functionality of the API has suffered. The rich functionality of SA's Java API is, further, more than is necessary for JVM troubleshooting. Instead of SA's high-cost/rich-feature approach,
jcmd
with process revival takes a low-cost/core-feature approach. Since it is low cost, it can be supported over the long term. -
Continue to rely on native debuggers such as
gdb
.Native debuggers can provide low-level JVM diagnostics, and will remain an essential part of JDK troubleshooting. However, the technical effort needed to display Java objects in human-readable form makes them a poor alternative to an enhanced
jcmd
.
Risks and Assumptions
-
The set of
jcmd
commands usable with revived JVM instances is fixed when the JVM is built. We assume this is acceptable because the existingjcmd
commands are widely applicable and well known, with additional planned commands mentioned above. Additional commands can be created for specific requirements in future and incorporated into updated versions of the JVM via the JDK build process. -
A risk of allowing
jcmd
to be used post-mortem is that core dumps containing sensitive information may be transferred to insecure environments for analysis. However, this is no different than the situation today withjhsdb
and SA, so there is no new security risk. -
We assume that the JVM binary in use at the time of the crash is available in the post-mortem environment. This is reasonable as access to the correct binary is required by existing crash analysis methods.