JEP 144: Reduce GC Latency for Large Heaps

AuthorJesper Wilhelmsson
OwnerTony Printezis
TypeFeature
ScopeImplementation
StatusCandidate
Componenthotspot / gc
Discussionhotspot dash gc dash dev at openjdk dot java dot net
EffortXL
DurationXL
Reviewed byBengt Rutisson
Endorsed byMikael Vidstedt
Created2011/11/01 20:00
Updated2014/10/06 20:59
Issue8046134

Summary

Improve the performance of applications that require large heaps, of up to 32GB, by reducing garbage-collector latency.

Goals

Enable the use of 32GB heaps with 60% live data and fairly consistent 250--500ms pause times, as specified by pause target, for a set of relevant benchmarks. Average pause times can fluctuate ±10%.

Non-Goals

Pause-time guarantees are not required. It is not a goal or a requirement to improve GC characteristics for smaller heap sizes that are currently served well by existing GC technology.

Motivation

We see continued increases in random-access memory (RAM) sizes. For workloads that require low and consistent GC pause times, the JVM using the CMS collector can manage heap sizes of up to 4--8GB. A normal low-cost server now has much more memory than can be effectively used with a single JVM instance for these workloads, so Java applications are often scaled out across several JVM instances. While this may be a viable solution for many applications, vertical scaling solutions are effectively blocked by JVM GC limitations. Non-garbage-collected environments do not suffer from these limitations. The intent of this proposal is to enable low and consistent pause times with large RAM sizes so that Java is viable for existing use cases where GC limitations have forced complicated re-architecting or migration to other platforms.

Description

This will done by improving the G1 collector. There are a number of CRs that we aim to work on to improve performance. The list below is our starting point; it is not the final list of CRs. Extensive performance measurements are required to find the actual bottlenecks and choose the CRs (and write new ones) for those areas where more work is needed.

Apart from these bugs and RFEs, to achieve constant pause times in the 250--500ms range we need to avoid full GCs completely. This will most likely require a new scheme where, rather than start a full GC, we instead increase the amount of work we do gradually if we notice that we can't keep up with the allocation rate or if fragmentation becomes an issue.