JEP 450: Compact Object Headers (Experimental)

OwnerRoman Kennke
TypeFeature
ScopeImplementation
StatusCandidate
Componenthotspot / runtime
Discussionhotspot dash dev at openjdk dot org
EffortL
DurationL
Reviewed byAleksey Shipilev, Erik Österlund, John Rose, Stefan Karlsson, Thomas Stuefe
Endorsed byVladimir Kozlov
Created2022/10/07 19:27
Updated2024/06/18 21:15
Issue8294992

Summary

Reduce the size of object headers in the HotSpot JVM from between 96 and 128 bits down to 64 bits on 64-bit architectures. This will reduce heap size, improve deployment density, and increase data locality.

Goals

When enabled, this feature

When disabled, this feature

This experimental feature will have a broad impact on real-world applications. The code might have inefficiencies, bugs, and unanticipated non-bug behaviors. This feature must therefore be disabled by default, and enabled only by explicit user request. We intend to enable it by default in later releases and eventually remove the code for legacy object headers altogether.

Non-Goals

It is not a goal to

Motivation

A Java object stored in the heap has metadata, which the HotSpot JVM stores in the object's header. The size of the header is constant; it is independent of object type, array shape, and content. In the 64-bit HotSpot JVM object headers are between 96 bits (12 bytes) and 128 bits (16 bytes), depending on how the JVM is configured.

Objects in Java programs tend to be small. Experiments conducted as part of Project Lilliput show that many workloads have average object sizes of 256 to 512 bits (32 to 64 bytes). This implies that more than 20% of live data can be taken by object headers alone. Thus even a small improvement in object header size could bring a large improvement in footprint. Cutting down the header of each object from 128 to 64 bits means improving overall heap usage by more than 10%, since the header is a fixed cost for every object. A smaller average object size leads to improvement in memory usage, GC pressure, and data locality.

Early adopters of Project Lilliput who have tried it with real-world applications confirm that live data is typically reduced by 10%–20%.

Description

In the HotSpot JVM, object headers support many different features:

Current object headers

The current object header layout is split into a mark word and a class word.

The mark word comes first, has the size of a machine address, and contains:

Mark Word (normal):
 64                     39                              8    3  0
  [.......................HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH.AAAA.TT]
         (Unused)                      (Hash Code)     (GC Age)(Tag)

In some situations the mark word is overwritten with a tagged pointer to a separate data structure:

Mark Word (overwritten):
 64                                                           2 0
  [ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppTT]
                            (Native Pointer)                   (Tag)

When this is done, the tag bits describe the type of pointer stored in the header. If necessary, the original mark word is preserved (displaced) in the data structure to which this pointer refers, and the fields of the original header (e.g., the age bits and hash code) are accessed by dereferencing the pointer to get to the displaced header.

The class word comes after the mark word. It takes one of two shapes, depending on whether compressed class pointers are enabled:

Class Word (uncompressed):
64                                                               0
 [cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc]
                          (Class Pointer)

Class Word (compressed):
32                               0
 [CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC]
     (Compressed Class Pointer)

The class word is never overwritten, which means that an object's type information is always available, so no additional steps are required to check a type or invoke a method. Most importantly, the parts of the runtime that need that type information do not need to cooperate with the locking, hashing, and GC subsystems that can change the mark word.

Compact object headers

For compact object headers we remove the division between the mark and class words by reducing the size of the hash code and subsuming the class pointer into the mark word:

Header (compact):
64                    42                             11   7   3  0
 [CCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHVVVVAAAASTT]
 (Compressed Class Pointer)       (Hash Code)         /(GC Age)^(Tag)
                              (Valhalla-reserved bits) (Self Forwarded Tag)

Overwriting the mark word with a tagged pointer makes certain operations more complex since we lose direct access to the compressed class pointer, as discussed below.

Compact object headers are guarded by a new experimental runtime option, -XX:(+/-)UseCompactObjectHeaders. This option is disabled by default.

Compressed class pointers

Today's compressed class pointers encode a 64-bit pointer into 32 bits. JDK 8 introduced compressed class pointers, which first depended on compressed object pointers; JDK 15 removed this dependency. Today, there are only two scenarios in which one would disable compressed class pointers:

The theoretical limit of three million classes is imposed by the way that class space is implemented today. An application that hits this limit would use approximately six to 30 GB of metaspace alone; we have yet to see an application that does this.

Compact object headers reduce the bits available to identify a class to 22. They therefore require compressed class pointers to be enabled unconditionally. The encoding will be modified so that the JVM can address a comparable number of classes with fewer bits.The two scenarios mentioned above must be addressed before eventually removing support for legacy object headers; in the mean time, applications can always disable compact headers.

Stack locking

Stack locking is the first light-weight step in HotSpot's object-locking subsystem. It works in cases where the object's monitor is uncontended, no thread control methods (wait(), notify(), etc.) are called, and no JNI locking is used. In such cases HotSpot displaces the original mark word to the stack of the locking thread and overwrites the word with a pointer to that stack location. This associates the locking thread with the object, since we know the thread stack's address range, while retaining the original header.

With compact object headers, stack locking is inherently racy during unlocking from the perspective of non-owner threads. Suppose that thread A tries to access the compressed class pointer in an object's displaced header, and thread B stack-unlocks the object and removes the displaced header from the stack. Thread A is then exposed to a dangling stack reference, which likely contains garbage.

The same problem exists in the current stack locking code when, e.g., accessing the identity hash code stored in the displaced header. In that situation the race condition is avoided by inflating the lock into a full-blown monitor lock (see next section). This approach would not work well with compact object headers because accesses to class pointers are orders of magnitude more frequent than accesses to identity hash codes. Inflating the majority of stack locks would result in unacceptable performance overheads.

To address the race condition for compact object headers, a prerequisite improvement for this work is an alternative light-weight locking protocol (8291555) that stores locking data in a separate thread-local area rather than in the object header, preserving the original header and class information.

Monitor locking

Monitor locking is the final heavy-weight step in HotSpot's object-locking subsystem. It is invoked when the monitor is contended or thread control methods are used, or any light-weight locking step fails. Monitor locking creates a separate ObjectMonitor structure, stores the original header there, and overwrites the header with the pointer to that ObjectMonitor.

Compared to stack locking, monitor locking is not as racy. Once a monitor has been created and installed, i.e., inflated, it remains until it is disposed of, i.e., deflated. Java threads coordinate with monitor deflation so that, when a Java thread loads a header that carries a monitor pointer, it can safely access that monitor without risk of accessing a stale monitor. Once a lock has been inflated to a full ObjectMonitor it is safe to load the displaced header from the ObjectMonitor and decode the class from there.

However, not all GC threads coordinate with monitor deflation, and in some cases can not be made to coordinate without introducing a lot of complexity. Therefore, a new way to map heap objects to ObjectMonitors is introduced, which, together with the new lightweight locking, will eliminate the concept of 'displaced headers' altogether. This also removes the need for threads which needs to access object headers to coordinate with monitor deflation.

GC forwarding

Moving GCs perform object relocations in two steps: Move the object and record the mapping between its old and new locations (i.e., forwarding), and then use this mapping to update the references in the entire heap or in just a particular generation.

Of the current HotSpot GCs, only ZGC uses a separate forwarding table to record forwardings. All other GCs record forwarding information by overwriting the header of the old object with the location of the new object. There are two distinct scenarios that involve headers.

GC walking

Garbage collectors frequently walk the heap by scanning objects linearly. This requires determining the size of each object, which requires access to each object's class pointer.

When the class pointer is encoded in the header, some simple arithmetic is required to decode it. The cost of doing this is low compared to the cost of the memory accesses involved in a GC walk. No additional implementation work is needed here, since the GCs already access class pointers via a common VM interface.

Identity hash codes

The current implementation of identity hash codes stores only 31 bits on 64-bit platforms and only 25 bits on 32-bit platforms. With compact object headers, this is not going to change.

Alternatives

Testing

Changing the header layout of Java heap objects touches many HotSpot JVM subsystems: the runtime, all garbage collectors, all just-in-time compilers, the interpreters, the serviceability agent, and the architecture-specific code for all supported platforms. Such massive changes warrant massive testing.

Compact object headers will be tested by:

All of these tests will be executed with the feature turned on and off, with multiple combinations of GCs and JIT compilers, and on several hardware targets.

We will also deliver a new set of tests which measure the size of a variety of objects, e.g., plain objects, primitive type arrays, reference arrays, and their headers.

The ultimate test for performance and correctness will be real-world workloads once this experimental feature is delivered.

Risks and Assumptions

Dependencies