JEP draft: Automatic Heap Sizing for ZGC
Owner | Erik Österlund |
Type | Feature |
Scope | Implementation |
Status | Submitted |
Component | hotspot / gc |
Effort | M |
Duration | M |
Reviewed by | Axel Boldt-Christmas, Ron Pressler, Vladimir Kozlov |
Created | 2024/04/05 09:47 |
Updated | 2025/03/10 15:14 |
Issue | 8329758 |
Summary
Automatically size the heap appropriately by default, adapting dynamically to the workload and environment, when using the Z Garbage Collector (ZGC).
Goals
While using ZGC, then:
- Make the default heap sizing policy good out of the box.
- Dynamically adapt the heap size to changes in the workload and environment.
- Performance should not change noticeably compared to a manually correctly-tuned heap size.
Non-Goals
It is not a goal of this JEP to:
- Find the optimal heap size.
- Remove the existing configurability of static heap bounds using existing heap sizing JVM options.
Motivation
The JVM stores most Java objects in the heap, where they are garbage-collected when no longer needed. The size of the heap significantly impacts the performance and memory usage of an application, so the JVM allows the size to be constrained manually: java -Xms...
sets the minimum and initial heap size and -Xmx...
sets the maximum heap size. All garbage collectors respect these settings.
Within these boundaries, there is a delicate performance trade-off between memory use and the program's throughput. In a highly simplified view of tracing garbage collectors — the kind offered by the JVM — the amount of work required to collect garbage is not related to the amount of dead objects in the heap but to the amount of live data in the heap, and so it is independent of the heap size. However, with a larger heap, the GC will need to perform less frequent collection cycles than with a smaller heap, and so the overall CPU overhead of garbage collection is lower the bigger the heap and vice-versa. This overhead may come at the expense of program throughput, and so a larger heap will yield a lower GC overhead and possibly higher program throughput.
When users choose the low-latency Z Garbage Collector (ZGC), the maximum heap size is the only memory-related option they need to set. Unfortunately, setting a good maximum heap size is notoriously difficult. If set too low, the application can run out of memory; if set too high, the machine can run out of memory. Finding a good maximum heap size involves measuring memory and throughput for different heap sizes in an experimental setup that provides a representative workload to an application. Such a setup is challenging for most developers.
With a concurrent garbage collector such as ZGC, the application's actual memory consumption is dependent not only on the application's code but also on the maximum heap size setting. This is because a higher maximum heap size is taken as an expression of the user's preference for consuming more memory in exchange for a lower CPU overhead as described above. The GC is encouraged to spend close to the maximum heap size to reduce its CPU overhead, and the application's heap is likely to grow close to the maximum heap size even if the program could have run with a smaller heap.
To stay within the maximum heap size while attempting not to use more CPU time than needed, ZGC monitors the application's allocation rate and predicts the right time to start a collection cycle. If ZGC starts the collection too early — when the heap usage is well below the maximum heap size — it might do too many collections, wasting CPU time and harming application throughput. But if the allocation rate suddenly rises, ZGC might start the collection too late — the heap will be exhausted before ZGC can free memory, and application threads that try to allocate memory will be paused until sufficient memory is freed. Such allocation stalls are disastrous for the latency-sensitive applications that ZGC is intended to support. Accordingly, ZGC uses a "soft" maximum heap size (lower than the maximum heap size) as long as the allocation rate remains predictable, ensuring there is some "safety buffer" of memory available in case the allocation rate jumps unpredictably. Users can configure the "soft" maximum heap size with the -XX:SoftMaxHeapSize
option, but it is challenging to quantify the level of unpredictability of an application's workload.
We believe that the JVM can automatically control the heap size and balance CPU and memory consumption than the user can manually with the explicit configurations of -Xmx
and -XX:SoftMaxHeapSize
. The JVM should monitor the program's allocation rate, the GC's CPU consumption, and the overall memory usage of the machine, and use that current information to automatically determine a good heap size. Users should not need to set -Xmx
when using ZGC. Furthermore, ZGC should be able to use all the memory on the machine if needed, just as a C program does, yet respond to a reduction in free memory on the machine. Finally, users should be able to directly express a preference between less GC overhead (but more memory use) and more GC overhead (but less memory use).
Description
ZGC will automatically select a heap size that dynamically adapts to changing circumstances in the program and the machine.
The selected heap size will lie between the minimum (-Xms
) and maximum (-Xmx
) heap configuration, but when one or both are not configured by the user, the default maximum and minimum heap sizes will be changed when using ZGC to give the automatic heap sizing as much flexibility as possible, as follows:
- Default minimum and initial heap sizes (
-Xms
) are changed to 16MB. - Default maximum heap size (
-Xmx
) is changed to 100% of the available RAM of the computer minus a small reserve (see Responding to machine memory pressure). The entire available machine memory serves as the safety buffer.
Automatic heap resizing does the following, which will be expounded on below:
- A dynamic maximum heap size adapts to changes in the availability of free memory on the machine.
- A heuristic target heap size is maintained internally. It is similar to the
-XX:SoftMaxHeapSize
except for being determined automatically and dynamically, with no user configuration. - The heap quickly grows to accommodate a sudden increase in allocation pressure.
- The heap proactively grows to accommodate the allocation rate of the program — as measured by the JVM. This growth is done concurrently with the program; the added memory is proactively and concurrently committed and paged to avoid any slowdowns due to OS operations.
- The heap automatically grows if the CPU consumed by the GC is higher than some overhead target.
- The heap automatically shrinks if the CPU consumed by the GC is lower than the overhead target.
- The heap automatically shrinks in response to memory on the machine being used up by other programs.
- The relative memory space allotted to the young and old generations is dynamically adjusted to minimize the GC's CPU consumption.
With these changes, the need for configuring the heap size when using ZGC should drop significantly.
All you need to do to benefit from automatic heap resizing is to start your Java program with the option -XX:+UseZGC
.
Rapid expansion
Avoiding allocation stalls is the most important goal for a low-latency GC. When the JVM boots with an initial heap size of 16 MB on a large computer with many cores, it will quickly find itself in a situation where 16 MB of heap is not enough. The application might require a heap size of, for example, 160 GB, in which case the GC will need to expand the heap and do so very quickly.
With the heap starting out small, garbage collection will likely trigger early on, and the program is likely to need more memory faster than the GC can free it. To accommodate the program's allocation, the heap will expand during garbage collection instead of stalling the allocation until heap memory can be freed.
The "safety buffer" of free memory no longer needs to be accounted for inside the heap. Instead, it becomes the unused memory on the machine, outside of the JVM. If more memory is suddenly needed, the GC will take more memory from the system instead of stalling the allocating thread.
In addition to expanding the heap in the event of sudden allocations, the GC will estimate the normal allocation rate for the program and use that to grow the heap proactively. Growing the heap allows the GC to reduce the frequency of collections, which in turn reduces the GC's CPU overhead and so may improve the program's throughput.
Concurrent heap committing and heating
When the GC grows the heap, it needs to commit memory pages from the OS; this operation takes some time. When a memory page is used for the first time, the OS needs to page it in, which also takes time and can harm the program's latency as it waits for the OS.
At present, users can choose to avoid these potential slowdowns to the program by performing this work upfront. Setting -Xms
and -Xmx
to the same value (instead of a much lower -Xms
) will cause ZGC to commit all memory upfront; the -XX:+AlwaysPreTouch
option further tells the GC to page in, or "preheat", the pages. These options, however, may significantly increase the JVM's startup time.
With this proposal of automatic heap resizing, when the GC proactively increases the heap size based on its estimate of the program's allocation rate, both committing and paging the memory is done concurrently with the program's operation. There is neither an increase in startup time nor a slowdown to the program as it runs.
(On Linux, this concurrent pre-touching can also ensure that large pages are used by default, even in systems with hostile transparent huge page configurations. This configuration mismatch between common Linux distributions and ZGC's operation is the most common performance problem encountered with ZGC in the field.)
Automatic dynamic tuning
After finding an initial lower bound heap size, the GC continuously monitors the behavior of the GC, the program, and the machine, and applies incremental tuning of the heap size at the end of every GC cycle.
GC CPU overhead
With no preexisting knowledge about the application, it is difficult to guess a reasonable heap size. An initial guess can easily be off by several orders of magnitude. For example, a heartbeat application that only occasionally pings a service to check if it is running can start out using a large fraction of the computer's memory though it only requires a small amount.
Automatic heap resizing will adjust the heap size as the program runs based on some selected target GC CPU overhead, based on the idea in the work of Tavakolisomeh et al.. The GC will monitor its own CPU impact. If it rises beyond the selected CPU overhead target, the heap will grow to allow for less intensive GC activity; if it falls below the target, the heap will shrink as the GC may increase its activity to keep the memory utilization within a smaller heap.
By default, the GC will select a some reasonable level of target CPU overhead that tries to strike a reasonable balance between memory and CPU usage for the GC while keeping the heap within the boundaries of the minimum and maximum settings.
ZGCPressure
How much the user wishes to trade CPU overhead for memory footprint is unknown.
If the user is unhappy with the default, the intensity of the GC can be controlled by a new JVM flag called -XX:ZGCPressure
corresponding to a desired target CPU overhead. It takes an integer value between 0 and infinity, although a reasonable value would be between 0 and 10. The default is 5.
We intentionally do not define an exact relationship between -XX:ZGCPressure
and the CPU overhead of the GC as a machine CPU percentage, or as percentage of the program's overall CPU utilization. This is so we may evolve and improve the automatic tuning policies over time. Instead, the guiding principle is that higher values for -XX:ZGCPressure
make the GC more intensive, resulting in more frequent collections, higher CPU usage and lower memory usage; lower values make the GC less intensive, triggering less frequent collections but requiring a larger heap.
-XX:ZGCPressure
is a single knob through which to control the behavior of ZGC. It makes explicit the tradeoff between memory and CPU that is only implied by -Xmx
(as described in the Motivation), it is more intuitive and easier to determine, and when unspecified, its default value corresponds to a reasonable balance of CPU and memory usage. Raise it for a larger CPU overhead and a smaller heap; lower it for less CPU overhead and a larger heap.
The -XX:ZGCPressure
flag is manageable, meaning it may be updated at runtime, if desired.
Measuring CPU overhead
ZGC is a generational GC (by default). Young objects are placed in a young generation, collected more frequently in minor collections, while old objects are promoted to an old generation, collected less frequently in major collections (which collect both generations).
To decide whether to grow or shrink the heap based on the GC's CPU overhead as described in the previous section, the GC's CPU usage must be measured. Minor and major collections differ in their CPU usage, and there are usually many minor collections between major collections. Therefore, the overall CPU usage by the GC is examined at every major collection and takes into account the preceding minor collections. However, if a sequence of minor collections alone consumes more CPU than the CPU target, then the heap is enlarged without waiting for a subsequent major collection.
The garbage collection activity is not the only CPU overhead imposed by the GC. Frequent collection cycles also impose other CPU penalties on the program, such as increased GC barrier execution and processor-cache invalidation. The automatic heap resizing heuristics take this into account and expand the heap to avoid such impacts. These impacts can be larger the more CPU the program itself consumes, and this, too, is taken into consideration.
Generation sizing
When updating the heap size, the distribution of memory between the young and old generations is also reconsidered. If the cost of a major collection is estimated to be smaller than that of more frequent minor collections which would be needed if the space allocated for the young generation is too small, then a major collection will be triggered and old generation space will be automatically redistributed to the young generation.
Since ZGC does all this work concurrently with the application, there should be no noticeable latency impact to tuning the memory distribution between the generations to get the lowest overall CPU cost.
Responding to machine memory pressure
With the tactics described thus far, a JVM process may automatically find an appropriate heap size for a target GC CPU overhead. However, if we let the JVM use as much memory as it wants, the machine may not have enough memory available to run other processes.
In addition to monitoring the behavior of the Java program, ZGC will also continuously monitor the overall available memory on the machine. In response to less free memory on the machine, the GC will attempt to shrink the heap.
A small portion of the computer's memory is treated as a reserve that GC avoids using. That reserve being depleted as it is used by other programs will have the effect of increasing the GC pressure, making it more aggressive to consume less memory and shrink the heap at the cost of spending more CPU. As the memory reserve gets consumed, the GC pressure increases first linearly and then exponentially. (Multiple JVMs using ZGC and running on the same machine will reach an equilibrium of GC pressure rather than fight over memory with each other.)
The maximum heap size is dynamically adapted to be all the memory available on the computer minus a small reserve.
Even in single-application deployments, having a reserve of system memory unused by the JVM, is generally a good idea, particularly with a concurrent GC. For example, it allows for file caches to be populated, which typically improves the performance of the system. At the same time, as explained in rapid expansion this unused memory acts as a safety buffer that can be used to avoid allocation stalls if the program's allocation rate rises suddenly.
On MacOS or Windows with memory compression enabled, the ratio of compressed and uncompressed memory is continuously monitored. The perceived size of the memory reserve is scaled according to that compression ratio. When the OS starts compressing more memory, the GC will work harder to reclaim garbage and give memory back to the OS, relieving its compression pressure.
Note: This means that the memory usage of a Java program using ZGC may decrease if there are more memory-hungry programs running on the same machine and increase when there are fewer.
Heap-size smoothing
Finally, all the factors described above constitute a control system's "error signals" — how far away the system is from the desired state — used to decide how much to expand or shrink the heap. Sometimes, such signals can have extreme values. Extreme changes in the heap size should be avoided, as the reason for the extreme signals may be transient and misleading, and we want the heap size to remain relatively stable. The signals are, therefore, smoothed by a sigmoid function that yields values between 0.5 and 1.5 and is almost linear in the middle.
Alternatives
Some GCs on other platforms let users select the memory vs. CPU tradeoff by setting a target residency — the ratio between the amount of live data and total heap size. However, for the same residency, the CPU overhead of garbage collecting can vary drastically, and measuring the amount of live data is challenging for generational GCs. Measuring the total CPU overhead of the GC is not only more straightforward, but it more directly represents what users care about (and monitor).
Testing
This enhancement primarily affects performance metrics. Therefore, it will be thoroughly performance-tested with a wide variety of workloads. The defined success metrics will be tested on said workloads.
Risks and Assumptions
By changing the default maximum heap size from 25% of the available memory to all available memory, there is a risk that the new heuristics use more memory than the current implementation would, and so other processes may run out of memory. However, even with a 25% default maximum heap policy there is already a risk of that happening when several JVMs using that default run on the same machine. Moreover, the dynamically updated max heap size is very likely to be able to throw OOM before exceeding the computer memory limits.