JEP draft: Post-mortem crash analysis with jcmd

OwnerKevin Walls
TypeFeature
ScopeJDK
StatusSubmitted
Componentcore-svc / tools
Discussionserviceability dash dev at openjdk dot org
EffortM
DurationM
Reviewed byAlex Buckley
Created2024/03/18 12:00
Updated2025/09/11 08:04
Issue8328351

Summary

The jcmd tool supports the monitoring and troubleshooting of a running HotSpot JVM. Extend jcmd so that it can also be used to diagnose a JVM that has crashed. This will establish a consistent troubleshooting experience in both live and post-mortem environments.

Goals

Non-Goals

Motivation

Serviceability is the ability to monitor, observe, debug, and troubleshoot an application. Monitoring and observability tools allow you to connect to a live JVM and examine the application. This includes the application’s code, such as loaded classes and just-in-time compiled methods, as well as its state, such as the Java heap and the stacks of Java threads and native threads. JDK tools such as jstack and jmap produce heap dumps and thread dumps from a live JVM, while tools such as JDK Mission Control enable you to browse memory usage and threads visually. If a tool connects to the JVM via the JMX protocol then you can also troubleshoot the application by, e.g., activating more verbose logging by specific subsystems.

In extreme scenarios, the JVM may terminate unexpectedly in a way that cannot be monitored by such tools. This can occur because of buggy native code in the application or libraries, or due to bugs in the JVM itself. At termination, the HotSpot JVM emits a crash report file (hs_err_pidXXX.log) that contains information about the fault and the state of the application, such as the stack trace of the failing thread and a list of loaded libraries. The operating system also saves the memory of the JVM process to a file known as a core dump. You can use crash reports and core dumps post-mortem to gain a deeper understanding of what went wrong and identify steps toward resolution.

Unfortunately, the tools available for the post-mortem analysis of JVM core dumps are problematic:

jcmd, introduced in JDK 7, is a lightweight tool for JVM diagnostics. It can connect to a live JVM via the Attach API and present Java-level information about an application. It offers over 50 commands for listing Java threads, detailing memory use, examining the state of the garbage collector, and so forth. However, jcmd can attach only to live processes. Given its flexibility and popularity, it would be useful if jcmd could also be used for the post-mortem analysis of core dumps. This would give a consistent experience in both live and post-mortem troubleshooting.

Description

We extend jcmd to support post-mortem analysis by using the data in a core dump to recreate the JVM’s memory image at the time of the crash, and by executing code in the JVM binary to interpret the data structures in that image. This revival technique enables jcmd’s diagnostic commands to work exactly as they do in a live JVM, with no changes to the commands or their implementations.

For example, if a JVM crash results in the core dump core.1234, then running:

$ jcmd core.1234 Thread.print

will produce the same kind of output as when jcmd is connected to a live JVM:

Opening dump file 'core.1234'...
2025-04-01 14:17:18
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25-internal-LTS-2025-03-30-1738352.name... mixed mode, sharing):
...
"Thread-0" #34 [1183517] prio=5 os_prio=0 cpu=0.99ms elapsed=0.07s tid=0x00007ff8fc208cc0 ...
   java.lang.Thread.State: RUNNABLE
Thread: 0x00007ff8fc208cc0  [0x120f1d] State: _at_safepoint _at_poll_safepoint 0
   JavaThread state: _thread_blocked
        at ThreadsMem$1.run(ThreadsMem.java:25)
        - locked <0x00000000fe300c98> (a java.lang.Object)
        at java.lang.Thread.runWith(java.base@25-internal/Thread.java:1460)
        at java.lang.Thread.run(java.base@25-internal/Thread.java:1447)
        ...
...

jcmd in post-mortem environments

jcmd currently supports 57 commands in a live JVM, of which 26 are relevant and available in the post-mortem environment:

Compiler.CodeHeap_Analytics    Compiler.codecache    Compiler.codelist    Compiler.directives_print
Compiler.memory                Compiler.perfmap      Compiler.queue

GC.class_histogram             GC.heap_dump          GC.heap_info

JVMTI.data_dump
Thread.print

VM.class_hierarchy             VM.classes            VM.classloader_stats VM.classloaders
VM.command_line                VM.events             VM.flags             VM.metaspace
VM.native_memory               VM.stringtable        VM.symboltable       VM.systemdictionary
VM.version
help

The post-mortem environment must have the same operating system and CPU architecture as the environment in which the JVM crashed.

It is often difficult to access production servers where the JVM has crashed, so it is common to transport core dumps to developer workstations for analysis. Such workstations typically run newer JDKs than production servers, so to facilitate analysis, it is not necessary for the jcmd tool to come from the same JDK as the JVM that crashed. The jcmd tool in one JDK release can revive core dumps from another JDK release as long as the JVM binary from the other release is available. The other release may be older or newer than the release where jcmd is running, as long as both releases are at least JDK NN. When running jcmd, the path to the JVM binary is specified via the new -L option:

$ jcmd -L /transported_files/libjvm.so core.1234 Thread.print

In JDK NN, jcmd can take either the name of a Java class or the filename of a core dump as an argument. Since the filename of a core dump might resemble a class name, the new -c option indicates that the argument is, in fact, a core dump:

$ jcmd -c MyApp GC.heap_dump

Reviving a core dump

To revive a JVM instance from a core dump, jcmd creates a subprocess so that the revived instance has its own address space, distinct from the address space of the JVM running jcmd. It populates that address space by memory-mapping the core dump to recreate the internal data structures of the JVM, the Java heap, and the stacks of native threads, all at their original memory addresses so that pointers remain valid. jcmd also loads the JVM binary (libjvm.so) at its original memory address.

The revived JVM instance is not live in the same way it was at run time. No Java code can be executed and no garbage collection occurs. The instance is, however, sufficiently complete that jcmd can interpret data structures in the revived instance by calling native JVM functions in the revived instance — the exact same functions it invokes on a live instance when using the Attach API. This approach makes the jcmd diagnostic code independent of whether the observed JVM is alive or dead; either way, the diagnostics call the same native functions. Thus no new code is needed to, e.g., inspect an object, extract a heap dump, or obtain a thread's stack frames.

Aside from the JVM binary, it is not necessary to load any native library from the crashed process. This enables troubleshooting a core dump after transporting it to a different machine, where the same native libraries might not be available. For post-mortem analysis, jcmd needs only the core dump and the JVM binary which crashed.

Future Work

Alternatives

Risks and Assumptions