JEP 528: Post-Mortem Crash Analysis with jcmd

Owner	Kevin Walls
Type	Feature
Scope	JDK
Status	Candidate
Component	core-svc / tools
Discussion	serviceability dash dev at openjdk dot org
Effort	M
Duration	M
Reviewed by	Alan Bateman, Alex Buckley
Created	2024/03/18 12:00
Updated	2025/10/16 19:11
Issue	8328351

Summary

The jcmd tool supports the monitoring and troubleshooting of a running HotSpot JVM. Extend jcmd so that it can also be used to diagnose a JVM that has crashed. This will establish a consistent experience in both live and post-mortem environments.

Goals

Make the troubleshooting of crashed JVMs as familiar and productive as troubleshooting live JVMs.
Enable post-mortem diagnostics on Linux and Windows.
Reduce the future cost of JDK maintenance by focusing serviceability work on jcmd rather than other tools and components such as jhsdb and the underlying Serviceability Agent.

Non-Goals

It is not a goal to support all jcmd diagnostics in post-mortem environments.
It is not a goal to run and debug Java code in post-mortem environments.
It is not a goal to enable post-mortem diagnostics on all supported operating systems.
It is not a goal to remove legacy serviceability tools and components, such as jhsdb and the Serviceability Agent, at this time.

Motivation

Serviceability is the ability to monitor, observe, debug, and troubleshoot an application. Monitoring and observability tools allow you to connect to a live JVM and examine the application. This includes the application’s code, such as loaded classes and just-in-time compiled methods, as well as its state, such as the Java heap and the stacks of Java threads and native threads. JDK tools such as jstack and jmap produce heap dumps and thread dumps from a live JVM, while tools such as JDK Mission Control enable you to browse memory usage and threads visually. If a tool connects to the JVM via the JMX protocol then you can also troubleshoot the application by, e.g., activating more verbose logging by specific subsystems.

In extreme scenarios, the JVM may terminate unexpectedly in a way that cannot be monitored by such tools. This can occur because of buggy native code in the application or libraries, or due to bugs in the JVM itself. (This is distinct from termination due to an uncaught exception, such as NullPointerException or OutOfMemoryError.)

At termination, the HotSpot JVM emits a crash report file (hs_err_pidXXX.log) that contains information about the fault and the state of the application, such as the stack trace of the failing thread and the set of loaded libraries. The operating system also saves the memory of the JVM process to a file known as a core dump. You can use crash reports and core dumps post-mortem to gain a deeper understanding of what went wrong and identify steps toward resolution.

Unfortunately, the tools available for the post-mortem analysis of JVM core dumps are problematic:

Using native debuggers such as gdb is frustrating because they cannot interpret JVM data structures in core dumps to display a Java-level view of application state. For example, if you determine that a Java object starts at a particular address in memory, then finding something as basic as the class of the object requires manually decoding words in the object's header. Debugger scripts can help automate the decoding of JVM data structures in core dumps, but the work remains error-prone and the scripts require ongoing maintenance since the layout of object headers changes over time.
The jhsdb tool, introduced in JDK 9, can open a core dump and interpret JVM data structures. It uses a HotSpot-internal mechanism known as the Serviceability Agent (SA). Unfortunately, the SA codebase is brittle and dated; it requires continuous maintenance as the JVM evolves, and major work to expose new JVM features. (Despite its name, the Serviceability Agent is not a Java agent, i.e., a component that can alter the code of a running application.)

jcmd, introduced in JDK 7, is a lightweight tool for JVM diagnostics. It can connect to a live JVM via the Attach API and display Java-level information about an application. It offers over 50 commands for listing Java threads, detailing memory use, examining the state of the garbage collector, and so forth. However, jcmd can attach only to live processes. Given its flexibility and popularity, it would be useful if jcmd could also be used for the post-mortem analysis of core dumps. This would give a consistent experience in both live and post-mortem troubleshooting.

Description

We extend jcmd to support post-mortem analysis by using the data in a core dump to recreate the JVM’s memory image at the time of the crash, and by executing code in the JVM binary to interpret the data structures in that image. This revival technique enables jcmd’s diagnostic commands to work exactly as they do in a live JVM, with no changes to the commands or their implementations.

For example, if a JVM crash results in the core dump core.1234, then running:

$ jcmd core.1234 Thread.print

will produce the same kind of output as when jcmd is connected to a live JVM:

Opening dump file 'core.1234'...
2025-04-01 14:17:18
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25-internal-LTS-2025-03-30-1738352.name... mixed mode, sharing):
...
"Thread-0" #34 [1183517] prio=5 os_prio=0 cpu=0.99ms elapsed=0.07s tid=0x00007ff8fc208cc0 ...
   java.lang.Thread.State: RUNNABLE
Thread: 0x00007ff8fc208cc0  [0x120f1d] State: _at_safepoint _at_poll_safepoint 0
   JavaThread state: _thread_blocked
        at ThreadsMem$1.run(ThreadsMem.java:25)
        - locked <0x00000000fe300c98> (a java.lang.Object)
        at java.lang.Thread.runWith(java.base@25-internal/Thread.java:1460)
        at java.lang.Thread.run(java.base@25-internal/Thread.java:1447)
        ...
...

`jcmd` in post-mortem environments

jcmd currently supports 57 commands for a live JVM, of which 26 are relevant and available in the post-mortem environment:

Compiler.CodeHeap_Analytics    Compiler.codecache    Compiler.codelist    Compiler.directives_print
Compiler.memory                Compiler.perfmap      Compiler.queue

GC.class_histogram             GC.heap_dump          GC.heap_info

JVMTI.data_dump

Thread.print

VM.class_hierarchy             VM.classes            VM.classloader_stats VM.classloaders
VM.command_line                VM.events             VM.flags             VM.metaspace
VM.native_memory               VM.stringtable        VM.symboltable       VM.systemdictionary
VM.version

help

The post-mortem environment must have the same operating system and CPU architecture as the environment in which the JVM crashed.

It is often difficult to access production servers where the JVM has crashed, so it is common to transport core dumps to developer workstations for analysis. Such workstations typically run newer JDKs than production servers. To facilitate analysis, it is not necessary for the jcmd tool to come from the same JDK as the JVM that crashed. The jcmd tool in one JDK release can revive core dumps from another JDK release as long as the JVM binary from the other release is available. The other release may be older or newer than the release where jcmd is running, as long as both releases are at least JDK NN. When running jcmd, the path to the JVM binary is specified via the new -L option:

$ jcmd -L $TRANSPORTED_FILES/libjvm.so core.1234 Thread.print

In JDK NN, jcmd can take either the name of a Java class or the filename of a core dump as an argument. Since the filename of a core dump might resemble a class name, the new -c option indicates that the argument is, in fact, a core dump:

$ jcmd -c MyApp GC.heap_dump

Reviving a core dump

To revive a JVM instance from a core dump, jcmd creates a subprocess so that the revived instance has its own address space, distinct from the address space of the JVM running jcmd. It populates that address space by mapping the core dump into memory to recreate the internal data structures of the JVM, the Java heap, and the stacks of native threads, all at their original memory addresses so that pointers remain valid. jcmd also loads the JVM binary (libjvm.so) at its original memory address.

The revived JVM instance is not live in the same way it was at run time. No Java code can be executed and no garbage collection occurs. The instance is, however, sufficiently complete that jcmd can interpret data structures in the revived instance by calling native JVM functions in the revived instance — the exact same functions it invokes on a live instance when using the Attach API. This approach makes the jcmd diagnostic code independent of whether the observed JVM is alive or dead; either way, the diagnostics call the same native functions. Thus no new code is needed to, e.g., inspect an object, extract a heap dump, or obtain a thread's stack frames.

Aside from the JVM binary, it is not necessary to load any native library from the crashed process. This enables troubleshooting a core dump after transporting it to a different machine, where the same native libraries might not be available. For post-mortem analysis, jcmd needs only the core dump and the JVM binary which crashed.

Future Work

We plan eventually to support post-mortem troubleshooting with jcmd on MacOS, in addition to Linux and Windows.
We expect to make further enhancements to jcmd to aid troubleshooting in both live and post-mortem environments. Two examples are new commands for inspecting arbitrary Java objects and for extracting Java class definitions (class dumping). We also expect to enhance some existing commands, e.g., VM.uptime, to work in both environments.
Some existing diagnostic commands are implemented in Java rather than native code. This means they are not compatible with process revival and cannot be used in post-mortem environments. Of these commands, some are of value only in live environments; e.g. ManagementAgent.start and JFR.start. Others, however, could be useful post-mortem; e.g., Thread.dump_to_file generates a list of virtual threads as JSON text. We may investigate such commands to see if we can make them work in post-mortem environments.
In the future, developers of new commands will need to consider the possibility of post-mortem execution when choosing the implementation language.

Alternatives

Invest in improvements to the Serviceability Agent (SA).

SA is written in Java code. It relies on a native library (libsaproc) to return the contents of raw memory from either a running process or a core dump. This makes the SA code independent of whether the observed JVM is alive or dead, but it also means that SA must turn byte arrays returned by libsaproc into instances of Java classes that model threads, stack frames, locks, and so forth. This interpretation is tedious and intricate; it requires the maintainers of SA to, e.g., know how each garbage collector lays out Java objects in the heap. It requires the SA code to be updated continuously as the JVM evolves. Finally, it duplicates the functionality of vast swaths of native code in the JVM which manage run-time data structures.

By contrast, the new jcmd technique of reviving the memory and code of a crashed JVM instance reuses the native code that managed the JVM's data structures when the JVM was alive. Duplicating the memory of a formerly-live process is much more efficient than duplicating the code required to understand it.

SA embraced high implementation complexity in order to provide a rich Java API. However, the rapid pace of JVM development has made that complexity costly to maintain, and so the functionality of the API has suffered. The rich functionality of SA's Java API is, further, more than is necessary for JVM troubleshooting. Instead of SA's high-cost/rich-feature approach, jcmd with process revival takes a low-cost/core-feature approach. Since it is low cost, it can be supported over the long term.
Continue to rely on native debuggers such as gdb.

Native debuggers can provide low-level JVM diagnostics, and will remain an essential part of JDK troubleshooting. However, the technical effort needed to display Java objects in human-readable form makes them a poor alternative to an enhanced jcmd.

Risks and Assumptions

A risk of allowing jcmd to be used post-mortem is that core dumps containing sensitive information may be transferred to insecure environments for analysis. However, this is no different than the situation today with jhsdb and SA, so there is no new security risk.
We assume that the JVM binary in use at the time of the crash is available in the post-mortem environment. This is reasonable since access to the correct binary is required by existing crash analysis methods.