JEP draft: Hot Code Heap
Owner | Dmitry Chuyko |
Type | Feature |
Scope | JDK |
Status | Draft |
Component | hotspot / compiler |
Effort | S |
Duration | S |
Created | 2024/03/14 16:30 |
Updated | 2024/03/29 10:20 |
Issue | 8328186 |
Summary
Extend the segmented code cache with a new optional "hot" code heap to compactly accommodate a part of non-profiled methods. Extend the compiler control mechanism to mark certain methods as hot so that they compile into the hot code heap.
Goals
- Separate the code that is known to be hot.
- Decrease fragmentation of highly-optimized code.
- Reduce the negative impact of compiled code hotness scattering on the performance of Java applications.
- Method-context dependent control for placing C2 compiled blobs into the hot code heap.
- Provide a basis for the possibility of profiling and compacting the code using the virtual machine itself.
Non-Goals
- Determine hot or cold code. That can be done using existing profilers now.
- Optimize code placement within the hot code heap. The current allocation mechanism is reused.
Success Metrics
- Reduced application execution time.
- Reduced fragmentation of selected methods in the code cache.
Motivation
Sparse hot code is slower
Some applications may lose their performance due to the huge JVM code cache. This happens if several conditions are met:
- A lot of code has been JIT-compiled (hundreds of megabytes, gigabytes).
- There is a vast amount of hot code.
- The code that is really important (hot) is scattered throughout the code cache.
- The CPU has penalties for executing large amounts of scattered code.
On systems where this problem is significant, it cannot be solved by other means such as large pages. The slowdown depends on the amount of hot code, its sparseness, and the type of the processor. The slowdown simulated in benchmarks can reach tens of percent.
Not all compiled code is called frequently all the time
In HotSpot, methods are JIT compiled after they have been used intensively enough, in the order of their active usage detection. Many Tier 4 methods may be used not so frequently, although they may remain important for latency. As a result, there are compiled methods that are:
- only a fraction of the number and size of all compiled methods;
- responsible for a significant portion of the execution time;
- can be detected during the selected period of program activity.
Hot code co-location
Co-locating hot code using profile information can improve performance.
The segmented code cache https://openjdk.java.net/jeps/197 helps achieve this goal, but the code is not further separated after instrumentation has helped place it in non-profiled space. Later work in this direction resulted in code heaps reordering https://bugs.openjdk.org/browse/JDK-8280872.
The prototype demonstrated slowdown mitigation (results comparable to measured regressions and limited by the profiling quality): https://github.com/bell-sw/hotcode-agent/blob/master/results/performance.adoc
Hot code marking
It is also possible to enhance compilation policy for the hot code for better performance and to perform more aggressive sweeping of colder code to reduce code cache.
Description
New code heap
If the segmented code cache is enabled (-XX:+SegmentedCodeCache), the hot code heap can be allocated when the JVM starts. The following command line switches are introduced:
- -XX:+HotCodeCache: enables hot code heap, disabled by default.
- -XX:HotCodeHeapSize: sets the size in bytes of the code heap containing hot non-profiled methods. The default is 0 if the hot code heap is disabled, or 8M if it is enabled (enough for 1000-2000 methods).
The hot code is placed between non-nmethods and non-profiled code heaps to maintain joint locality of the hot code with stubs and cooler optimized code:
| Tier 2,3 nmethods | | Tier 4 | Tier 1,4 nmethods |
| profiled | non-nmethods | *hot* | non-profiled |
Existing code cache diagnostics such as -XX:+PrintCodeCache are naturally extended with the information about the hot part. Additional data is provided through extended logging:
- -Xlog:codecache+hot=debug
New compiler directive option
The following C2-only flag is introduced:
- bool Hot, false by default.
So a C2 compiled method should placed in the hot code heap if there is a matching compiler directive like
[
{
match: [
"scala/runtime/ScalaRunTime$ _hashCode (Lscala/Product;)I",
"dotty/tools/dotc/parsing/Scanners$Scanner nextToken ()V",
],
c2: {Hot: true},
}
]
A hot method can be placed in another code heap if there is no free space. As usual, directives can be specified in the JVM parameter -XX:CompilerDirectivesFile or in the Compiler.add_directives diagnostic command. Directives are applied to compiled methods that match given patterns, so if a frequently called method is inlined, it makes sense to detect the compiled caller method, and mark it as "hot" using the appropriate pattern.
New method flag
As part of the implementation, the new status is declared in MethodFlags (src/hotspot/share/oops/methodFlags.hpp):
- status(is_hot, 1 << 17)
The flag provides an indication of hotness during allocation and for diagnostic and debug purposes.
Alternatives
Quite similar code placement can be achieved without explicit allocation of a separate code heap. Regular allocations in non-profiled heap can be made from its top boundary, and hot allocations could be made from its bottom boundary.
Pros:
- No need to limit hot code size.
Cons:
- Risky change to the current allocation algorithm.
- Segmented code cache is already designed to host any special type of code in a separate code heap.
- Worse joint locality of hot and non-profiled code.
Even after the proposed implementation, if necessary, it will be possible to switch to another allocation scheme.
Testing
- Performance will be tested on the platforms where scattered hot code is severely degraded. To satisfy all the problem conditions in execution time benchmarks, they can be combined with artificial fragmentation of the code cache.
- During performance evaluation, code placement will be examined using newly developed log information.
Risks and Assumptions
Fixed code heap size leads either to a waste of memory or to allocations in another code heap. This is the current segmented code cache approach, which can be replaced by dynamic resizing.
If the hot code heap is enabled, but no methods are marked as hot, the memory is wasted.
It's not easy to determine the right size for the hot code heap; it depends on the CPU and application. The default selection of 8 MB was rated as good enough to improve performance on related platforms. It is also relatively small compared to the default code cache size of 256 MB.
Dependencies
This JEP is based on JEP 197: Segmented Code Cache and JEP 165: Compiler Control.
A broader scope and various possible improvements are described in the Instruction Issue Cache Hardware Accommodation draft.
Preliminary work related to adding new code heaps has been extracted as JDK-8311248: Refactor CodeCache::initialize_heaps.
The refresh extension for compiler directive diagnostic commands JDK-8309271 is orthogonal to code heap work, but it helps move hot methods to the hot code heap when they are determined in running applications.
For benchmarking purposes, code cache fragmentation can be simulated using a patched JVM.