JEP draft: Low-level Object layout introspection methods

OwnerAleksey Shipilev
TypeFeature
ScopeJDK
StatusDraft
Componentcore-libs / java.lang
EffortS
DurationS
Created2020/07/10 17:23
Updated2023/01/05 15:21
Issue8249196

Summary

Provide the methods for polling implementation-dependent object sizes, field offsets, object addresses, to enable low-level introspection of JVM behavior.

Goals

Extend publicly available introspection APIs to shrink the dependency on Unsafe for read-only JVM introspection. A successful implementation should provide enough APIs to make introspection tools ditch the Unsafe completely, while making no security/reliability compromises. A stretch goal for a successful implementation is to provide superior reliability and performance advantages over the similar Unsafe methods, thus providing the iron-clad reason to migrate.

Non-Goals

It is not a goal to provide overly deep JVM introspection APIs. It is not a goal to provide APIs that can be abused to force JVM into behaving in incompatible manner. It is not a goal to change Unsafe APIs to fit new methods.

Motivation

While Java implementations deliberately avoid the implementation-dependent questions about object sizes, object placement, field layouts, it is still useful for low-level performance work. This is why there are multiple libraries in the ecosystem that provide this kind of introspection, and there are many other internal utilities that provide the same kind of info.

To name a few:

All these tools use sun.misc.Unsafe to get the low-level runtime information, either as primary mechanism or a fallback one. The ever-shrinking opportunities to use Unsafe in modern JDKs makes supporting these libraries the ongoing hassle with the unclear future. Even though Unsafe is used, the data those tools get is usually safe. Introducing a common API to poll this kind of data eliminates this part of dependency on Unsafe.

There are a few common use cases that make these APIs useful:

Existing APIs

Some APIs already exist to cover the purpose above.

The alternatives for object sizes:

The alternatives for field offsets:

The alternatives for address information:

Description

For the sake of discussion, let us assume the methods go to Runtime class. The proof-of-concept implementation can be found in "JDK-8249196-low-level-object" branch in jdk-sandbox repo:

$ git clone https://github.com/openjdk/jdk-sandbox
$ cd jdk-sandbox
$ git checkout --track origin/JDK-8249196-low-level-object
$ sh ./configure 
$ make images

One can see the difference between the baseline and the patched runtime by using:

$ git diff master..JDK-8249196-low-level-object

Comparison for that branch can be found here, and nightly builds for that branch can be found here. The easiest way to get the feel of the APIs is using jshell from nightly builds:

$ curl https://builds.shipilev.net/openjdk-jdk-jep-8249196/openjdk-jdk-jep-8249196-latest-linux-x86_64-release.tar.xz | tar xJf -
$ jdk/bin/jshell 
|  Welcome to JShell -- Version 16-testing
|  For an introduction type: /help intro

jshell> Runtime.sizeOf(new Object());
$1 ==> 16

The prototype is used to explore the API choices and get the basic performance characteristics for prospective implementations.

Informal API specification, implementation and performance guidelines, safety discussion follows in this section.

Suggested Methods

sizeOf

First, the object size method:

/**
 * Returns the runtime estimate of storage taken by a given object.
 * ...
 */
public static long sizeOf(Object obj) { ... }

This method returns the byte size of the instance, pretty much like C-style sizeof() would do. Note this method is only a "shallow" sizeOf, and its result does not include the sizes of the objects referenced from the given object. The implementations are free to return a special value (to be decided, for example -1) if size estimate is not available, or JVM refuses to provide one.

(Bike-shedding question: should this be called shallowSizeOf?)

Example that mirrors JOL's "internals" view that is done with Unsafe:

$ java -jar jol-cli.jar internals java.util.ArrayList
java.util.ArrayList object internals:
OFFSET  SIZE                 TYPE DESCRIPTION
     0     4                      (object header)
     4     4                      (object header)
     8     4                      (object header)
    12     4                  int AbstractList.modCount
    16     4                  int ArrayList.size
    20     4   java.lang.Object[] ArrayList.elementData
Instance size: 24 bytes

$ jshell
jshell> Runtime.sizeOf(new ArrayList<>())
$1 ==> 24

The basic implementation of this methods goes to native VM code and executes obj->size(). size() method decodes so called class "layout helper" that carries the full object size information for instances, and parts of size information for arrays (the other part being the array length itself). The majority of the cost is the JNI transition to the native JDK method. This implementation takes ~25 ns per call on modern desktop.

C1 and C2 intrinsics can be used to inline the layout helper decoding and saving the cost of JNI transition. Currently implemented C2 intrinsics cuts the cost to about ~4 ns per call.

The largest object one could allocate is new long[Integer.MAX_VALUE-epsilon], which takes about 32 GB of space, which requires long result for sizeOf. This matches what Instrumentation.getObjectSize() is doing.

deepSizeOf

There is a temptation to provide the deepSizeOf method that would give the size taken by the entire reachable subgraph starting from the given object:

Example that mirrors JOL's GraphLayout, which does it with Reflection and Unsafe:

$ jshell
jshell> Object o = List.of("1", "2", "3");
o ==> [1, 2, 3]

jshell> GraphLayout.parseInstance(o).toFootprint()
$2 ==> "java.util.ImmutableCollections$ListN@254989ffd footprint:
        COUNT       AVG       SUM   DESCRIPTION
            3        24        72   [B
            1        32        32   [Ljava.lang.Object;
            3        24        72   java.lang.String
            1        16        16   java.util.ImmutableCollections$ListN
            8                 192   (total)"

jshell> Runtime.deepSizeOf(o)
$1 ==> 192

It does run into a bunch of implementation problems:

Additionally, there is a possibility to add the "callback" function to deepSizeOf, which would enable users to filter some of the walked objects. It comes with the major problem: that callback would see every object walked in the object graph, including those normally only reachable through the private fields. This can be somewhat mitigated by requiring the special security privileges, but it still breaks encapsulation in a major way.

Current prototype handles most of the work on JDK side. It maintains the wavefront in the IdentityHashSet, records the current object queue with ArrayDeque, uses sizeOf (implemented with intrinsics, see previous section) to weigh the objects.

The VM is called through the JDK native method to peel the references from the given object. Again, the significant cost there is JNI transition. The method itself call into VM APIs that poke deep into VM internals, and so intrinsifying the whole "peeling" method might prove problematic. It is possible, however, to call the known JDK native method directly from C1/C2 generated code, as current prototype demonstrates.

The upside of doing this whole thing on JDK side is accuracy and speed. It is accurate, because JDK/JVM is able to access all (even private) fields without access controls getting in the way. It is fast, because it can do magic that is not available to 3rd party libraries.

Taking as example weighting a linked list of 1K nodes, current prototype deepSizeOf takes about 40 us and 30 KB of heap memory on modern desktop. The nearest contender, JOL, does the same in about 90 us and 300 KB of heap memory.

fieldOffsetOf

The field layout method is something like this:

/**
 * Returns the offset of the field within the object.
 * ...
 */
public static long fieldOffsetOf(Field field) { ... }

This method returns the byte offset of the field within the its declaring class. Again, implementation may return special "don't know" value when the offset is not available, or JVM refuses to provide one.

Example that mirrors JOL's "internals" view, which is done with Unsafe:

$ java -jar jol-cli.jar internals java.util.ArrayList
java.util.ArrayList object internals:
OFFSET  SIZE                 TYPE DESCRIPTION
     0     4                      (object header)
     4     4                      (object header)
     8     4                      (object header)
    12     4                  int AbstractList.modCount
    16     4                  int ArrayList.size
    20     4   java.lang.Object[] ArrayList.elementData
Instance size: 24 bytes

$ jshell
jshell> Runtime.fieldOffsetOf(ArrayList.class.getDeclaredField("size"))
$1 ==> 16

The awkward part: Field also describes static fields, which gives the interesting offsets:

jshell> Runtime.fieldOffsetOf(Integer.class.getDeclaredField("MIN_VALUE"))
$3 ==> 144

This offset is the offset within the java.lang.Class mirror. This may be accepted with caveat specification that offset is from the head of the holding container, not necessarily the object itself.

Implementation-wise, this method goes to native VM code that resolves the field offset. This data is always available in VM, because it is needed for field access linkage and efficient code generation. This method deals with reflective Field and the result can be used for all instances of a given class (unless class redefinition happens; see the discussion below). Therefore, this method does not have to be intrinsified.

fieldSizeOf

The field size method is something like this:

/**
 * Returns the size of the field within the object.
 * ...
 */
public static long fieldSizeOf(Field field) { ... }

This method returns the size taken by a field in the given object. Once again, implementation may return special "don't know" value.

Example that mirrors JOL's "internals" view, which is done with Unsafe:

$ java -jar jol-cli.jar internals java.util.ArrayList
java.util.ArrayList object internals:
OFFSET  SIZE                 TYPE DESCRIPTION
     0     4                      (object header)
     4     4                      (object header)
     8     4                      (object header)
    12     4                  int AbstractList.modCount
    16     4                  int ArrayList.size
    20     4   java.lang.Object[] ArrayList.elementData
Instance size: 24 bytes

$ jshell
jshell> Runtime.fieldSizeOf(ArrayList.class.getDeclaredField("elementData"))
$1 ==> 4

In current JVMs, the return values are pretty boring: primitive field type sizes are almost always the same across all JDK releases, and reference field sizes depend on JVM bitness and compressed references mode. The story gets much more interesting with inline classes, where Reflection replies that the declared field is the object projection of the inline type. This method allows JVM to answer the full size of the inline-type-bearing class, including all flattened fields. See more discussion in the further sections.

fieldOffsetOf and fieldSizeOf should be enough to understand where the fields are in the objects, and by exclusion where the field gaps are.

addressOf

The address resolution method:

/**
 * Returns the current memory address taken by a given object.
 * ...
 */
public static long addressOf(Object obj) { ... }

The implementation always deals with uncompressed references, and so the result does not depend on the JVM mode.

Example that mirrors JOL's GraphLayout, which uses Unsafe to get the same address:

$ jshell
jshell> Object o = new ArrayList<>()
o ==> []

jshell> GraphLayout.parseInstance(o).toPrintable()
$1 ==> "java.util.ArrayList@5910e440d object externals:
 ADDRESS       SIZE TYPE                PATH                  VALUE
ff016ec8         16 [Ljava.lang.Object; .elementData          []
ff016ed8    1113008 (something else)    (somewhere else)      (something else)
ff126a88         24 java.util.ArrayList                       (object)"

jshell> Long.toHexString(Runtime.addressOf(o))
$2 ==> "19c47dc08"

This method might look unsafe since it returns something like a pointer, but Java type system does not know about pointers, and so no pointer-like operations are possible with it. From the perspective of the Java program, it is just a primitive long value, and there is no way to access the object behind it. As such, it is only something of diagnostic value. We know it is useful, because JOL "externals" view that exposes object addresses is used for performance diagnostics in the wild.

If the method replies the true object address, then there are ways to use this value in by feeding it into already unsafe APIs:

This misuse can be further mitigated by mixing up the value with the random per-JVM cookie. It would make the relative address comparisons useful, without exposing the true address for accidental or intentional misuse. Current prototype does this, and therefore the result of addressOf differs from the result that JOL returns (see the example above).

This method can be trivially intrinsified, as it only does the "cast" of Java reference to raw long, plus mixes up the cookie. Current prototype indicates the performance characteristics similar to that of sizeOf: ~25 ns with JNI call, ~1 ns with C1/C2 intrinsics.

Interaction With Other JVM/JDK Features

@Contended

@Contended is known to modify field layout strategies to insert large gaps around the protected field blocks, which affects instance sizes and field offsets.

sizeOf replies the true instance size as JVM sees it, which includes all @Contended gaps, without any specific treatment or code adjustments needed.

fieldOffsetOf and fieldSizeOf reply the true field offset/size as JVM sees them. Offset is normally affected by @Contended, and size is not affected. There is no special handling needed.

addressOf has no interaction with @Contended.

Stack Allocation

Hotspot does not do Stack Allocation for Java objects yet. But the future interaction with it looks safe.

sizeOf replies the storage size, without having an opinion where that storage is taken. offsetOf replies the offset from the object start, which is the same in both stack and heap allocations.

addressOf replies the value that does not have to be a Java heap address, but can rather be the thread stack interior pointer; both are safe, since they are coerced to "long"-s.

fieldOffsetOf and fieldSizeOf has no interaction with stack allocation.

Scalar Replacement

Hotspot does Scalar Replacement, i.e. exploding the non-escaping object into fields and then allocate them as plain operands. It does pose an interesting question. Which one of these roads should be taken?

fieldOffsetOf and fieldSizeOf has no interaction with scalar replacement.

Off-heap Allocation

Off-heap allocation still implies non-Java objects (DirectByteBuffers are on heap, but their native contents are not), so neither sizeOf nor addressOf are applicable to it.

Even if we imagine the future where we get Java objects off the normal heap, the new APIs are not doing anything on their own. Rather, they ask runtime to provide the information. So, if runtime tolerates off-heap Java objects, it must provide the VM-side accessors to their sizes and offsets, and then new APIs should work automagically.

Inline Types

The advent of Valhalla's Inline Types would pose problems with this APIs, like they pose problems with Unsafe. Inline type fields can be "flattened" in the holder object, without being reflected as holder's declared fields.

sizeOf poses no problems: it replies the object size as JVM sees it, which includes flattened fields.

fieldOffsetOf poses no problem, except we cannot reflect over the flattened field, and so it would be opaque to introspectors.

fieldSizeOf is the thing that promises to bridge some of the gap here. See for example what JOL replies on current Valhalla prototype:

public class Holder {
   int myF;
   Line line;
}

inline class Line {
   Point p1, p2;
   Line() { p1 = new Point(); p2 = new Point(); }
}

inline class Point {
   int x, y, z;
   Point() { x = 0; y = 0; z = 0; }
}

$ java -jar jol-cli.jar internals Holder -cp .

Holder object internals:
 OFFSET  SIZE   TYPE DESCRIPTION
      0     4        (object header)
      4     4        (object header)
      8     4        (object header)
     12     4   Line Holder.line
     16    20        (alignment/padding gap)
     36     4    int Holder.myF
 Instance size: 40 bytes
 Space losses: 20 bytes internal + 0 bytes external = 20 bytes total

Since Reflection does not see the flattened type (yet?), the introspector thinks the reference to Line is the normal Java reference of 4 bytes. The rest of 20 bytes is deduced to be the alignment/padding gap. But, it is actually taken by flattened fields in Line and Point. In this case, fieldOffsetOf can actually reply the full flattened size for Line (24 bytes), letting the tools know that thing is flattened.

addressOf has the usual interaction with inline types: the attempt to call it on value type "reference" may end up being called on the reference projection of the value object.

Unsafe

sizeOf has no interactions with Unsafe.

fieldOffsetOf looks duplicating the Unsafe.staticFieldOffset and Unsafe.objectFieldOffset, with a few improvements. First, it is accessible without security privileges required for Unsafe. Second, it replies the true offset, in contrast to Unsafe that replies a cookie that looks like offset in current implementation. Because of this, the offset that is given by offsetOf might be incompatible with Unsafe accessor methods like Unsafe::putInt(Object, int, int) that expect the cookie given by Unsafe.*FieldOffset, not the naked offset. This is well within the Unsafe contract.

addressOf has no direct alternatives in Unsafe, but can be emulated with it. There are Unsafe methods that can be used to write to raw memory. Since addressOf normally replies some Java heap address and that address is not guaranteed to still point to the legitimate object later, trying to write with Unsafe using the address from addressOf may corrupt the Java heap. This is the risk that is similar to having access to Unsafe in the first place.

Garbage Collectors

Ultimately, garbage collectors are responsible for managing Java heap memory. This means methods that introspect Java objects without involving the garbage collector directly would need to coordinate with GC, or guarantee no ill effects.

sizeOf and offsetOf have no direct interaction with GC. At least in Hotspot, the object size is reachable through the klass word and its associated layout helper, and field offsets are discoverable by polling the class and its field maps. This means that as long as a proper reference to Java object and/or Field is passed to these methods, no further treatment is needed. Since both are Java methods, the reference to passed object is correct at all times, maintained by the usual GC-mutator interaction protocols.

addressOf has no direct interaction with GC, but the address result it replies may be outdated very early, even before the return from the method: for example, when STW GC hits at safepoint poll before method exit. This is not a problem for correctness, because users can do nothing with the long address that would treat it as the pointer to the object.

It does not seem worthwhile to try and mitigate this, because: a) every other STW GC opportunity could invalidate it (technically only doable at safepoints); b) concurrent GC can move the object without notifying the user code holding the "naked address" at any time. In both cases, the "solution" would be to block GC until user releases the "naked address". This does not seem worthwhile for a diagnostic API.

Class Redefinition

Current class redefinition only allows changing the method bodies, so it does not affect the object/field layout.

When JEP 159 ("Enhanced Class Redefinition") is implemented, some methods would be sensitive to the redefinitions: sizeOf, fieldOffSetOf might change by adding/removing fields. The caveat specification that sizeOf and fieldSizeOf values might change during runtime might be needed to future-proof them with the advancements in this area. Intrinsifying both methods may help to make the costs of polling the actual sizes/offset negligible.

Panama

Panama does not seem to have interactions with sizeOf and offsetOf.

Current Panama code provides the ability to access the memory for the entire native heap, see MemoryAccess and MemorySegment.ofNativeRestricted ("everything" segment). That heap includes JVM's own Java heap, and it is technically possible to use the addressOf result to access object internals without language checks. This is already the problem with Panama, and so it is mitigated by Panama itself requiring to opt-in for use of "everything" segment with a special runtime flag. The API-side mitigation is to to return the addresses salted with the random offset, making sure pointers do not point to Java heap.

Alternatives

Continue using Unsafe. This contradicts the goal for the JEP. This is still doable, but requires inventing more Unsafe hacks as language and runtime is changing. This is the path JOL and other libraries take today. While it does work today, it is not as future-proof as the public API from the JDK itself.

Introduce new methods in Unsafe. This contradicts the goal for the JEP. Introducing new shortcut methods in Unsafe would make some tools happier, since they would not need to implement heavy-weight tricks (for example doing awkward type-converting load/stores for addressOf, or polling all field offsets to figure out sizeOf). However, there seem to be no interest in extending the sun.misc.Unsafe, as the intent is to minimize Unsafe usages outside of JDK. The best case is extending jdk.internal.misc.Unsafe, but then it is specifically protected by module system, and not readily exposed to external tools.

Continue using Instrumentation, adding methods there. Currently, Instrumentation only covers sizeOf case, it can be extended to handle other cases too. It may appear more protected to have this APIs to made available only to applications that explicitly opt-in to use Java agents. This implicitly relies on assumption that Java Agents would continue to be available for libraries that want these capabilities.

Continue using JVMTI, adding methods there. This has the same advantages as Instrumentation alternative above, plus one disadvantage: forcing power users to ship platform-specific native code.

Include JOL wholesale into JDK. Bringing the entire JOL as the JDK module might give it the access to VM internals without exposing it as public API. However, unmodified JOL would still expose the same kind of data the primitives in this JEP expose: object sizes, field offsets, object addresses. This comes with a major maintenance disadvantage: JOL development process would be tied to JDK development process, and will require separate backports to support previous JDKs, etc. It does not look worthwhile to pursue.

Extending Panama. While Panama deals with interoperability, maybe folding the new APIs there is beneficial. For example, having the ObjectLayout to go alongside the MemoryLayout APIs. That might be interesting to explore. The downside is making more or less indepenent primitives to depend on a larger project.

Testing

Normal unit testing is needed to make sure the API does what it is supposed to do. Since the code gives implementation-dependent answers, testing on multiple platforms would be required to verify it works as expected. We expect no OS-specific or arch-specific code to handle new functionality, but some shared code is known to handle things differently on 32- and 64-bit code paths. Current prototype is being tested on x86_64 and x86_32 to verify this.

The API completeness can be verified by special versions of JAMM and JOL that use the proposed APIs.

Risks and Assumptions

Unforeseen interaction with new language/VM features. It might so happen that some new language/VM feature shows up and turns out to be a conflict with either of new API methods. This risk is mitigated by having the APIs reply "don't know, don't care" values for the corner cases where VM is unable to figure something out.

Implementability in other VMs. The prototype implementation proves that Hotspot can implement new API methods without much problem, without relying on much of Hotspot specifics. It provides a weak evidence new methods are implementable in other VMs: they have to know object sizes, addresses, field offsets and sizes anyway. In case some VM cannot implement the method, it should be allowed to return "don't know, don't care" value.

Dependencies

This work has no dependencies.