JEP draft: Improved way of obtaining call traces asynchronously for profiling

AuthorJohannes Bechberger
TypeFeature
ScopeJDK
StatusSubmitted
Componenthotspot / svc
EffortS
DurationS
Reviewed byAndrei Pangin, Christoph Langer, Jaroslav Bachorík
Created2022/04/04 11:02
Updated2022/05/25 21:07
Issue8284289

Summary

Define an efficient, secure, and supported API for asynchronous stack traces with information on Java and native frames.

Goals

Non-Goals

Motivation

The AsyncGetCallTrace routine has seen increasing use in recent years in profilers like async-profiler with almost all available profilers, open-source and commercial, using it. But it is only an internal API, as it is not exported in any header, and the information on frames it returns is pretty limited: Only the method and byte code index for Java frames is captured. Both make implementing profilers and related tooling harder. Tools like async-profiler have to resort to complicated code to at least partially obtain information that the JVM already has. Information that is currently hidden and impossible to get is

Such data can be helpful when profiling and tuning a VM for a given application and also for profiling code that uses JNI heavily.

There are two stack walking implementations for profiling (in JFR and AsyncGetCallTrace) that benefit from being unified. This improves maintainability and stability by removing redundant code and increasing coverage.

Description

This JEP proposes an AsyncGetCallTrace2 API which is modeled after AsyncGetCallTrace:

void AsyncGetCallTrace2(CallTrace *trace, jint depth, void* ucontext,
                        uint32_t options);

This API can be called by profilers to obtain the call trace for the current thread. Calling this API from a signal-handler is safe and the new implementation will be at least as stable as AsyncGetCallTrace or the JFR stack walking code. The VM fills in information about the frames and the number of frames. The caller of the API should allocate the CallTrace structure with enough memory for the requested stack depth.

Arguments:

The trace struct

typedef struct {
  jint num_frames;                // number of frames in this trace
  CallFrame *frames;              // frames
  void* frame_info;               // more information on frames
} CallTrace;

is filled by the VM. Its num_frames field contains the actual number of frames in the frames array or an error code. The frame_info field in that structure can later be used to store more information but is currently supposed to be NULL.

The error codes are a subset of the error codes for AsyncGetCallTrace, with the addition of THREAD_NOT_JAVA related to calling this procedure for non-Java threads:

enum Error {
  NO_JAVA_FRAME         =   0,
  NO_CLASS_LOAD         =  -1, 
  GC_ACTIVE             =  -2,    
  UNKNOWN_NOT_JAVA      =  -3,
  NOT_WALKABLE_NOT_JAVA =  -4,
  UNKNOWN_JAVA          =  -5,
  UNKNOWN_STATE         =  -7,
  THREAD_EXIT           =  -8,
  DEOPT                 =  -9,
  THREAD_NOT_JAVA       = -10
};

Every CallFrame is the element of a union, as the information stored for Java and non-Java frames differs:

typedef union {
  FrameTypeId type;     // to distinguish between JavaFrame and NonJavaFrame 
  JavaFrame java_frame;
  NonJavaFrame non_java_frame;
} CallFrame;

There a several distinguishable frame types:

enum FrameTypeId : uint8_t {
  FRAME_JAVA         = 1, // JIT compiled and interpreted
  FRAME_JAVA_INLINED = 2, // inlined JIT compiled
  FRAME_NATIVE       = 3, // native wrapper to call C methods from Java
  FRAME_STUB         = 4, // VM generated stubs
  FRAME_CPP          = 5  // C/C++/... frames
};

The first two types are for Java frames for which we store the following information in a struct of type JavaFrame:

typedef struct {     
  FrameTypeId type;            // frame type
  uint8_t comp_level;      // compilation level, 0 is interpreted
  uint16_t bci;            // 0 < bci < 65536
  jmethodID method_id;
} JavaFrame;               // used for FRAME_JAVA and FRAME_JAVA_INLINED

The comp_level states the compilation level of the method related to the frame with higher numbers representing "more" compilation. 0 is defined as interpreted. It is modeled after the CompLevel enum in compiler/compilerDefinitions but is dependent on the used compiler infrastructure.

Information on all other frames is stored in the NonJavaFrame struct:

typedef struct {
  FrameTypeId type;  // frame type
  void *pc;          // current program counter inside this frame
} NonJavaFrame;

Although the API provides more information on the frames, the amount of space required per frame (e.g. 16 bytes on x86) is the same as for the original AsyncGetCallTrace API.

The underlying stack walking code can be unified such that AsyncGetCallTrace, AsyncGetCallTrace2, and the JFR call stack collection become thin wrappers for a single implementation.

A prototype implementation can be found https://github.com/parttimenerd/jdk/tree/parttimenerd_asgct2 with a demo at https://github.com/parttimenerd/asgct2-demo/.

Alternatives

Keep AsyncGetCallTrace as is, meaning a lack of maintenance and stability for a widely used de-facto API.

Risks and Assumptions

Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of AsyncGetCallTrace as they leak details of the implementation of standard library files and include native wrapper frames.

Testing

Unifying the existing profiling-related stack walking code allows for testing it more efficiently by combining the existing tests. The implementation of this JEP will also add new stress tests to find rare stability problems on all supported platforms. The idea is to run the profiling on a set of example programs (for example the dacapo and renaissance benchmark suites) repeatedly with small profiling intervals (<= 0.1ms). A prototype implementation can be found at https://github.com/parttimenerd/jdk-profiling-tester.