JEP 435: Asynchronous Stack Trace VM API

AuthorJohannes Bechberger
OwnerChristoph Langer
TypeFeature
ScopeJDK
StatusCandidate
Componenthotspot / svc
Discussionserviceability dash dev at openjdk dot org
EffortS
DurationS
Reviewed byAndrei Pangin, Christoph Langer, Jaroslav Bachorík
Created2022/04/04 11:02
Updated2022/11/04 07:25
Issue8284289

Summary

Define an efficient and reliable API to collect stack traces asynchronously and include information on both Java and native stack frames.

Goals

Non-Goals

Motivation

The AsyncGetCallTrace API is used by almost all available profilers, both open-source and commercial, including, e.g., async-profiler. Yet it has two major disadvantages:

These issues make implementing profilers and related tooling more difficult. Some additional information can be extracted from the HotSpot VM via complex code, but other useful information is hidden and impossible to obtain:

Such data can be helpful when profiling and tuning a VM for a given application, and for profiling code that uses JNI heavily.

Description

We propose a new AsyncGetStackTrace API, modeled on the AsyncGetCallTrace API:

void AsyncGetStackTrace(CallTrace *trace, jint depth, void* ucontext,
                        uint32_t options);

This API can be called by profilers to obtain the stack trace for the current thread. Calling this API from a signal handler is safe, and the new implementation will be at least as stable as AsyncGetCallTrace or the JFR stack walking code. The VM fills in information about the frames and the number of frames. The caller of the API should allocate the CallTrace array with sufficient memory for the requested stack depth.

Parameters:

Currently only the lowest bit of the options is considered: It enables (1) or disables (0) the inclusion of C/C++ frames. All other bits are considered to be 0.

The trace struct

typedef struct {
  jint num_frames;                // number of frames in this trace
  CallFrame *frames;              // frames
  void* frame_info;               // more information on frames
} CallTrace;

is filled in by the VM. Its num_frames field contains the actual number of frames in the frames array or an error code. The frame_info field in that structure can later be used to store more information, but is currently NULL.

The error codes are a subset of the error codes for AsyncGetCallTrace, with the addition of THREAD_NOT_JAVA related to calling this procedure for non-Java threads:

enum Error {
  NO_JAVA_FRAME         =   0,
  NO_CLASS_LOAD         =  -1, 
  GC_ACTIVE             =  -2,    
  UNKNOWN_NOT_JAVA      =  -3,
  NOT_WALKABLE_NOT_JAVA =  -4,
  UNKNOWN_JAVA          =  -5,
  UNKNOWN_STATE         =  -7,
  THREAD_EXIT           =  -8,
  DEOPT                 =  -9,
  THREAD_NOT_JAVA       = -10
};

Every CallFrame is the element of a union, since the information stored for Java and non-Java frames differs:

typedef union {
  FrameTypeId type;     // to distinguish between JavaFrame and NonJavaFrame 
  JavaFrame java_frame;
  NonJavaFrame non_java_frame;
} CallFrame;

There a several distinguishable frame types:

enum FrameTypeId : uint8_t {
  FRAME_JAVA         = 1, // JIT compiled and interpreted
  FRAME_JAVA_INLINED = 2, // inlined JIT compiled
  FRAME_NATIVE       = 3, // native wrapper to call C methods from Java
  FRAME_STUB         = 4, // VM generated stubs
  FRAME_CPP          = 5  // C/C++/... frames
};

The first two types are for Java frames, for which we store the following information in a struct of type JavaFrame:

typedef struct {     
  FrameTypeId type;       // frame type
  int8_t comp_level;      // compilation level, 0 is interpreted
  uint16_t bci;           // 0 < bci < 65536
  jmethodID method_id;
} JavaFrame;              // used for FRAME_JAVA, FRAME_JAVA_INLINED and FRAME_NATIVE

The comp_level indicates the compilation level of the method related to the frame, with higher numbers representing higher levels of compilation. It is modeled after the CompLevel enum in HotSpot but is dependent on the compiler infrastructure used. A value of zero indicates no compilation, i.e., bytecode interpretation.

Information on all other frames is stored in NonJavaFrame structs:

typedef struct {
  FrameTypeId type;  // frame type
  void *pc;          // current program counter inside this frame
} NonJavaFrame;

Although the API provides more information, the amount of space required per frame (e.g., 16 bytes on x86) is the same as for the existing AsyncGetCallTrace API.

We propose to place the above declarations in a new header file, profile.h, which will be placed in the include directory of the JDK image. The header’s license should include the Classpath Exception so that it is consumable by third-party profiling tools.

A prototype implementation can be found here, and a demo combining it with a modified async-profiler can be found here.

Risks and Assumptions

Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of AsyncGetCallTrace since they leak details of the implementation of standard library files and include native wrapper frames.

Testing

We will add new stress tests to identify stability problems on all supported platforms. We plan to profile a set of example programs (e.g., the DaCapo and Renaissance benchmark suites) repeatedly with small profiling intervals (<= 0.1ms). We will also add substantial unit tests which should cover all options and test the basic usage of the API.