JEP 308: Improve Dynamic Number of Thread Sizing for G1

Owner	Thomas Schatzl
Type	Feature
Scope	Implementation
Status	Closed / Withdrawn
Component	hotspot / gc
Discussion	hotspot dash gc dash dev at openjdk dot java dot net
Effort	L
Duration	M
Reviewed by	Mikael Vidstedt
Endorsed by	Mikael Vidstedt
Created	2017/01/13 14:37
Updated	2022/03/04 13:53
Issue	8172792

Summary

Improve automatic dynamic thread-sizing enabled by -XX:+UseDynamicNumberOfThreads in G1 to improve thread resource usage particularly in situations where using all available threads is a waste.

Goals

Improve the default thread sizing policies of G1 in the areas of:

sizing the number of threads used during garbage collection pauses, refinement, and marking,
for reference processing in particular, automatically determine whether parallelism should be enabled, and determine an optimal number of threads to use,

A user should not need to do much more than set the maximum heap size and pause time goal in order to get better thread resource usage than before.

Non-Goals

These new default policies should significantly improve the ease of use of the G1 collector. The resulting policies may still not be ideal for all situations.

The changes are limited to change how resources are used in existing algorithms, but avoid changes in the actual garbage collection algorithms.

We specifically target thread usage in situations where G1 at the moment uses too many threads. Overall performance should not regress, but at the same time improvements are incidental, not required.

Success Metrics

A user of G1 should automatically benefit from improved heuristics in the areas covered in this JEP. Depending on the applicability of the change, we expect that G1 will exhibit better throughput, less resource usage or better start-up behavior or any combination of that.

In cases where the VM already utilizes all available resources, there should be no long-term difference.

Motivation

We intend to address two of the most typical tuning related issues users have with G1, automatically applying necessary tunings for the number of used threads:

worker threads for GC operation are always allocated upfront, decreasing startup performance and increasing memory usage particularly for small, short running batch operations.
current heuristics for thread sizing can only be specified by the user at start-up (e.g. enabling parallelism during reference processing), or are determined once at start-up based on the environment or other user supplied parameter values. None of them are based on metrics gathered from the actual application (e.g. live set size, amount of survivors), they are global static decisions to be made at startup. So these decisions are often sub-optimal for a few or even all phases of the application.

Improving self-tuning of these resources in these situations should increase the out-of-the-box usability of G1.

Description

The main change in the area of thread management will be that the current number of threads option (-XX:ParallelGCThreads, -XX:ConcGCThreads) will change their semantics slightly: instead of being the exact number of threads to be used during GC, marking and refinement, they will always, if they do not already, have the meaning of the maximum number of threads G1 will be allowed to use. Also, threads will not be pre-allocated in full at startup by default, but lazily as they are required.

The number of threads will be determined by the number of work items for a particular phase of work. The exact definition of work item will depend on the phase, and the resulting number of threads determined by heuristics (e.g., for the evacuation phase of GC, the number of threads to use for that particular phase will be determined by the expected amount of live data to be evacuated during that phase). Decisions need to be communicated via appropriate channels (e.g., log messages).

For phases such as reference processing, where at the moment the user needs to enable parallelism manually (via -XX:+ParallelRefProcEnabled), G1 will automatically decide whether to enable parallelism and, if so, determine the number of threads to use for each phase given the number of work items to process.

No decision has been made on whether to de-allocate thread resources after long-term non-use.

There will be ways to revert to old behavior, i.e., allow static distribution of resources. This can be done via the existing option -XX:-UseDynamicNumberOfGCThreads.

Alternatives

Provide better documentation for users to help them better tune their applications for their environment. This has the disadvantage that the users still need to perform this kind of tuning for every application and every deployment. There are no alternatives for some of the suggested enhancements such as thread tuning dependent on current amount of work, since this level of control is not available at the moment.

Testing

Default performance measurements for any collector should not regress in general. There are no particular platform requirements.

Risks and Assumptions

The heuristics we intend to implement originate from discussions with many users and tuning efforts. These reports may have been only from a non-representative subset of use cases, or use cases that are not representative any more. This may ultimately make this work unnecessary. Some of the changes may cause unintended performance regressions due to changes to heuristics. Some of this work requires experimenting with heuristics that may not be successful in the end. We do have ideas for all these that seem plausible, but unforeseen interactions within the garbage collector and between the garbage collector and applications might make them perform very badly. In these cases we intend to simplify the heuristics until eventually the user will need to give more detail.