JEP draft: Hot Code Heap

OwnerDmitry Chuyko
TypeFeature
ScopeJDK
StatusDraft
Componenthotspot / compiler
EffortS
DurationS
Created2024/03/14 16:30
Updated2024/03/29 10:20
Issue8328186

Summary

Extend the segmented code cache with a new optional "hot" code heap to compactly accommodate a part of non-profiled methods. Extend the compiler control mechanism to mark certain methods as hot so that they compile into the hot code heap.

Goals

Non-Goals

Success Metrics

Motivation

Sparse hot code is slower

Some applications may lose their performance due to the huge JVM code cache. This happens if several conditions are met:

  1. A lot of code has been JIT-compiled (hundreds of megabytes, gigabytes).
  2. There is a vast amount of hot code.
  3. The code that is really important (hot) is scattered throughout the code cache.
  4. The CPU has penalties for executing large amounts of scattered code.

On systems where this problem is significant, it cannot be solved by other means such as large pages. The slowdown depends on the amount of hot code, its sparseness, and the type of the processor. The slowdown simulated in benchmarks can reach tens of percent.

Not all compiled code is called frequently all the time

In HotSpot, methods are JIT compiled after they have been used intensively enough, in the order of their active usage detection. Many Tier 4 methods may be used not so frequently, although they may remain important for latency. As a result, there are compiled methods that are:

Hot code co-location

Co-locating hot code using profile information can improve performance.

The segmented code cache https://openjdk.java.net/jeps/197 helps achieve this goal, but the code is not further separated after instrumentation has helped place it in non-profiled space. Later work in this direction resulted in code heaps reordering https://bugs.openjdk.org/browse/JDK-8280872.

The prototype demonstrated slowdown mitigation (results comparable to measured regressions and limited by the profiling quality): https://github.com/bell-sw/hotcode-agent/blob/master/results/performance.adoc

Hot code marking

It is also possible to enhance compilation policy for the hot code for better performance and to perform more aggressive sweeping of colder code to reduce code cache.

Description

New code heap

If the segmented code cache is enabled (-XX:+SegmentedCodeCache), the hot code heap can be allocated when the JVM starts. The following command line switches are introduced:

The hot code is placed between non-nmethods and non-profiled code heaps to maintain joint locality of the hot code with stubs and cooler optimized code:

|      Tier 2,3 nmethods   |              |    Tier 4     |      Tier 1,4 nmethods       |
|         profiled         | non-nmethods |     *hot*     |         non-profiled         |

Existing code cache diagnostics such as -XX:+PrintCodeCache are naturally extended with the information about the hot part. Additional data is provided through extended logging:

New compiler directive option

The following C2-only flag is introduced:

So a C2 compiled method should placed in the hot code heap if there is a matching compiler directive like

[ 
  {
    match: [
        "scala/runtime/ScalaRunTime$ _hashCode (Lscala/Product;)I",
        "dotty/tools/dotc/parsing/Scanners$Scanner nextToken ()V",
    ],
    c2: {Hot: true},
  }
]

A hot method can be placed in another code heap if there is no free space. As usual, directives can be specified in the JVM parameter -XX:CompilerDirectivesFile or in the Compiler.add_directives diagnostic command. Directives are applied to compiled methods that match given patterns, so if a frequently called method is inlined, it makes sense to detect the compiled caller method, and mark it as "hot" using the appropriate pattern.

New method flag

As part of the implementation, the new status is declared in MethodFlags (src/hotspot/share/oops/methodFlags.hpp):

The flag provides an indication of hotness during allocation and for diagnostic and debug purposes.

Alternatives

Quite similar code placement can be achieved without explicit allocation of a separate code heap. Regular allocations in non-profiled heap can be made from its top boundary, and hot allocations could be made from its bottom boundary.

Pros:

Cons:

Even after the proposed implementation, if necessary, it will be possible to switch to another allocation scheme.

Testing

Risks and Assumptions

Fixed code heap size leads either to a waste of memory or to allocations in another code heap. This is the current segmented code cache approach, which can be replaced by dynamic resizing.

If the hot code heap is enabled, but no methods are marked as hot, the memory is wasted.

It's not easy to determine the right size for the hot code heap; it depends on the CPU and application. The default selection of 8 MB was rated as good enough to improve performance on related platforms. It is also relatively small compared to the default code cache size of 256 MB.

Dependencies

This JEP is based on JEP 197: Segmented Code Cache and JEP 165: Compiler Control.

A broader scope and various possible improvements are described in the Instruction Issue Cache Hardware Accommodation draft.

Preliminary work related to adding new code heaps has been extracted as JDK-8311248: Refactor CodeCache::initialize_heaps.

The refresh extension for compiler directive diagnostic commands JDK-8309271 is orthogonal to code heap work, but it helps move hot methods to the hot code heap when they are determined in running applications.

For benchmarking purposes, code cache fragmentation can be simulated using a patched JVM.