JEP draft: Value Objects (Preview)

OwnerDan Smith
TypeFeature
ScopeSE
StatusClosed / Withdrawn
Discussionvalhalla dash dev at openjdk dot java dot net
EffortXL
DurationXL
Relates toJEP 401: Value Classes and Objects (Preview)
Reviewed byBrian Goetz
Created2021/11/16 00:14
Updated2023/09/23 00:43
Issue8277163

Summary

Enhance the Java object model with value objects, class instances that have only final instance fields and lack object identity. This is a preview language and VM feature.

Goals

This JEP provides for the declaration of identity-free value classes and specifies the behavior of their instances, called value objects, with respect to equality, synchronization, and other operations that traditionally depend upon identity. To facilitate safe construction of value objects, value classes make use of regulated constructors.

Certain value-based classes in the standard library will become value classes when preview features are enabled.

At runtime, the HotSpot JVM will prefer inlining value objects where feasible. An inlined value object is encoded directly with its field values, avoiding any overhead from object headers, indirections, or heap allocation.

Non-Goals

This JEP allows for limited inlining of value objects in field and array storage, but doesn't attempt to optimize the storage footprint by excluding null from the variable's value set. It also doesn't propose inlined storage for variables whose encoding would exceed a natural atomic read/write size (such as 64 bits). Improvements to value object storage will be pursued in a separate JEP.

Values of the primitive types behave like value objects in many ways, but continue on as a distinct concept in the language model. Enhancements to the treatment of primitive types will be explored in Enhanced Primitive Boxing.

Future enhancements to the JVM are anticipated to support inlining of value objects within generic APIs. For now, generic APIs work with erased types and heap-allocated objects, as usual.

Motivation

Java's objects and classes offer powerful abstractions for representing data, including fields, methods, constructors, access control, and nominal subtyping. Every object also comes with identity, enabling features such as field mutation and locking.

Many classes don't take advantage of all of these features. In particular, a significant subset of classes don't have any use for identity—their field values can be permanently set on instantiation, their instances don't need to act as synchronization locks, and their preferred notion of equality makes no distinction between separately-allocated instances with matching field values. (For example, in the standard library, certain classes that discourage any dependency on identity have been designated value-based.)

At run time, support for identity can be expensive. It generally requires that an object's data be located at a particular memory location, packaged with metadata to support the full range of object functionality. As objects are shared between program components, data structures and garbage collectors end up with tangled, non-local webs of objects created at different times. Sometimes, JVM implementations can optimize around these constraints, but the resulting performance improvements can be unpredictable.

An alternative is to encode program data with primitive types. Primitive values don't have identity, and so can be copied freely and encoded as compact bit sequences. But programs that represent their data with primitive types give up all the other abstractions provided by objects and classes. (For example, if a geographic location is encoded as two floats, there's no way to restrict the valid range of values, keep matching pairs of floats together, prevent re-interpreting the values with the wrong units, or compatibly switch to a double-based encoding.)

Value classes provide programmers with a mechanism to opt out of object identity without giving up the other features of Java classes. This avoids unwanted degrees of freedom (like surprising == mismatches) and enables many of the performance benefits of primitive types.

Description

The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

Overview

A value object is a class instance that does not have identity. That is, a value object does not have any particular memory address or any other property to distinguish it from other instances of the same class whose fields have the same values. Value objects cannot mutate their fields or be used for synchronization. The == operator on value objects compares their fields. A value class declaration introduces a class whose instances are value objects.

An identity object is a class instance or array that does have identity—the traditional behavior of objects in Java. An identity object can mutate its non-final fields and is associated with a synchronization monitor. The == operator on identity objects compares their identities. An identity class declaration—the default for a concrete class—introduces a class whose instances are identity objects.

Value class declarations

A concrete class can be declared a value class with the value contextual keyword.

value class Substring implements CharSequence {
    private String str;
    private int start;
    private int end;
    
    public Substring(String str, int start, int end) {
        checkBounds(start, end, str.length());
        this.str = str;
        this.start = start;
        this.end = end;
    }
    
    public int length() {
        return end - start;
    }
    
    public char charAt(int i) {
        checkBounds(0, i, length());
        return str.charAt(start + i);
    }
    
    public Substring subSequence(int s, int e) {
        checkBounds(s, e, length());
        return new Substring(str, start + s, start + e);
    }
    
    public String toString() {
        return str.substring(start, end);
    }
    
    private static void checkBounds(int start, int end, int length) {
        if (start < 0 || end < start || length < end)
            throw new IndexOutOfBoundsException();
    }
}

A concrete value class declaration is subject to the following restrictions:

In most other ways, a value class declaration is just like an identity class declaration. It implicitly extends Object if it has no explicit superclass type. It can be an inner class. It can declare superinterfaces, type parameters, member classes and interfaces, overloaded constructors, static members, and the full range of access restrictions on its members.

A concrete class can be declared an identity class with the identity contextual keyword. In the absence of the value and identity modifiers, a concrete class (other than Object) is implicitly an identity class.

identity class Id1 {
    int counter = 0;
    void increment() { counter++; }
}

class Id2 { // implicitly 'identity'
    synchronized void m() {}
}

The value and identity modifiers are supported by record classes. Records are often good candidates to be value classes, because their fields are already required to be final.

value record Name(String first, String last) {
    public String full() {
        return "%s %s".formatted(first, last);
    }
}

identity record Node(String label, Node next) {
    public String list() {
        return label + (next == null) ? "" : ", " + next.list();
    }
}

Just like regular classes, identity is the default modifier for record classes.

Working with value objects

Value objects are created and operated on just like normal objects:

Substring s1 = new Substring("abc", 0, 2);
Substring s2 = null;
if (s1.length() == 2)
    s2 = s1.subSequence(1, 2);
CharSequence cs = s2;
System.out.println(cs.toString()); // prints "b"

The == operator compares value objects of the same class in terms of their field values, not object identity. Fields with primitive types are compared by their bit patterns. Other field values—both identity and value objects—are recursively compared with ==.

assert new Substring("abc", 1, 2) == s2;
assert new Substring("abcd", 1, 2) != s2;
assert s1.subSequence(0, 2) == s1;

The equals, hashCode, and toString methods, if inherited from Object, along with System.identityHashCode, behave consistently with this definition of equality.

Substring s3 = s1.subSequence(0, 2);
assert s1.equals(s3);
assert s1.hashCode() == s3.hashCode();
assert System.identityHashCode(s1) == System.identityHashCode(s3);

Synchronization is disallowed on value objects: the compiler disallows synchronization on any value class type, and attempting to synchronize on a value object at run time results in an exception.

Object obj = s1;
synchronized (obj) { } // IllegalMonitorStateException

Other low-level APIs that depend on identity, like java.lang.ref.Reference, will similarly either reject value objects or simulate identity using value objects' field values.

Interfaces and Abstract Classes

By default, an interface may be implemented by both value classes and identity classes. In a special case where the interface is only meant for one kind of class or the other, the value or identity modifier can be used to declare a value interface or an identity interface.

value interface JsonValue {
    String toJsonString();
}

identity interface Counter {
    int currentValue();
    void increment();
}

It is an error for a value class or interface to extend an identity class or interface, or vice versa. This applies to both direct and indirect superclasses and superinterfaces—e.g., an interface with no modifiers may extend an identity interface, but still its implementing classes must not be value classes.

Similarly, it is an error for any class or interface to implement, either directly or indirectly, both a value superclass or superinterface and an identity superclass or superinterface.

(To be a functional interface, compatible with lambda expressions, an interface must allow for both value and identity implementations. This rule avoids constraining the language runtime, and may be relaxed in the future.)

An abstract class can similarly be extended by both value classes and identity classes by default, or can use the identity or value modifier to restrict its subclasses. In addition, an abstract class that declares a non-final instance field or a synchronized instance method is implicitly an identity class.

The class Object is special. Despite being a concrete class, it is not an identity class and supports both identity and value subclasses. However, calls to new Object() continue to create direct identity object instances of the class (suitable, e.g., as synchronization locks).

regulated Constructors

The regulated keyword (name subject to change) indicates that a constructor must not make any use of this in its body, except to write to an instance field. This is a useful property that ensures an object does not "leak" to the rest of the program during construction.

Within the body of a regulated constructor, any of the following are a compiler error:

Local and anonymous classes may be declared, but (as in a static context) they have no enclosing instance. Inner classes may refer to enclosing instances or captured enclosing variables from their own regulated constructors without error.

These rules coincide with the restrictions imposed in a pre-construction context, as described by JEP 447, except that they allow for writes to instance fields.

Any constructor of any class may be marked regulated. Value class constructors are implicitly regulated. The implicitly-declared constructor of a non-value class is also regulated, as long as the no-arg constructor of the superclass is regulated. (In extremely rare occasions, this rule may cause an existing class to fail to compile, due to a use of this in its initializers.) The constructor of the class Object is regulated.

Because of the rule about super() calls, an abstract class may not act as a superclass of a value class unless it declares at least one regulated constructor. Due to value classes' use of regulated constructors, value objects will never be observed to mutate, and will never participate in circular object graphs. (A value object under construction is referred to as a "larval value object", and is unusable for any other purpose.)

Migration of existing classes

If an existing concrete class meets the requirements of value class declarations, it may be declared as a value class without breaking binary compatibility.

There are some behavioral changes that users of the class may notice:

Developers are encouraged to identify and migrate value class candidates in their code, where appropriate.

Value Classes in the Standard Library

Some classes in the standard library have been designated value-based, with the understanding that they would become value classes in a future release.

Under this JEP, when preview features are enabled, the following standard library classes are considered to be value classes, despite not having been declared or compiled with the value modifier:

class file representation & interpretation

The identity and value modifiers are encoded in a class file using the ACC_IDENTITY (0x0020) and ACC_VALUE (0x0040) flags. In older-versioned class files, ACC_IDENTITY is considered to be set in classes and unset in interfaces.

(Historically, 0x0020 represented ACC_SUPER, and all classes, but not interfaces, were encouraged to set it. The flag is no longer meaningful, but coincidentally will tend to match this implicit behavior.)

Format checking ensures that identity and value are not both set, and that every class has at least one of identity, value, or abstract set.

The regulated modifier is encoded in a class file using the ACC_REGULATED flag (value TBD). Format checking ensures that this flag is only applied to methods named <init>.

Format checking fails if a value class is not final, has a non-final instance field, has a synchronized instance method, or declares an non-regulated <init> method. Similarly, format checking fails if a non-identity abstract class has a non-final instance field or a synchronized instance method.

At class load time, superclasses and superinterfaces are checked for conflicting identity or value modifiers; if a conflict is detected, the class fails to load.

When verifying a regulated <init> method, the type uninitializedThis is not replaced with a standard class type after the super/this call. Instead, references to this have type uninitializedThis throughout the method body. The verifier also ensures that the constructor named by the super/this call is regulated.

A value class's type is represented using the usual L descriptor (LSubstring;). To facilitate inlining optimizations, a Preload attribute can be provided by any class, communicating to the JVM that a set of referenced CONSTANT_Class entries should be eagerly loaded to locate potentially-useful layout information.

Preload_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 number_of_classes;
    u2 classes[number_of_classes];
}

Each class file generated by javac includes a Preload attribute naming any concrete value class that appears in the descriptor of any declared or referenced field or method.

The if_acmpeq and if_acmpne operations implement the == test for value objects, as described above. The monitorenter instruction throws an exception if applied to a value object.

API & tool support

A new preview API method, java.util.Objects.isValueObject, indicates whether an object is a value object or an identity object. It always returns false for arrays and direct instances of the class Object. (We may consider a similar method as a member of class Object.)

java.lang.reflect.Modifier adds support for the identity, value, and regulated flags; these are also exposed via new isIdentity and isValue methods in java.lang.Class, and isRegulated in java.lang.reflect.Constructor.

java.lang.ref recognizes value objects and treats them specially (details TBD).

The java.lang.invoke.LambdaMetafactory class rejects identity and value superinterfaces.

javax.lang.model supports the identity, value, and regulated modifiers.

The javadoc tool surfaces the identity, value, and regulated modifiers.

The class file API JEP may need updates to support new class file features.

Performance model

Because value objects lack identity, JVMs may freely duplicate and re-encode them in an effort to improve computation time, memory footprint, and garbage collector performance.

Implementations are free to use different encodings in different contexts, such as stack vs. heap, as long as the values of the objects' fields are preserved. However, these encodings must account for the possibility of a null value, and must ensure that fields and arrays storing value objects are read and written atomically.

In practice, this means that local variables, method parameters, and expression results can regularly use inline encodings, while fields and array components might only store small objects inline (e.g., with fields of 56 bits or less). Assignments to polymorphic supertypes will typically require heap allocation if it has been avoided to that point.

Inlining of value objects in stack code execution will tend to minimize heap allocations and garbage collection activities. Inlining of value objects in heap field and array storage will additionally reduce memory footprint and increase data locality.

Previously, JVMs have used similar optimization techniques to inline identity objects in local code when the JVM is able to prove that an object's identity is never used. Developers can expect more predictable and widespread optimizations for value objects.

HotSpot implementation

This section describes implementation details of this release of the HotSpot virtual machine, for the information of OpenJDK engineers. These details are subject to change in future releases and should not be assumed by users of HotSpot or other JVMs.

Value objects in HotSpot are encoded as follows:

Optimizations rely on the Preload attribute to identify value class types at preparation time. If a value class is not named by Preload (for example, if the class was an identity class at compile time), fields and methods may end up using a heap object encoding instead. In the case of a method overriding mismatch—a method and its super methods disagree about scalarization of a particular type—the overriding method may dynamically force callers to de-opt and use the pointer-based entry point.

To facilitate the special behavior of instructions like if_acmpeq, value objects in the heap are identified with a new flag in their object header.

Alternatives

JVMs have long performed escape analysis to identify objects that never rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization, including storage in fields and arrays.

Hand-coded optimizations via primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.

The C language and its relatives support inline storage for structs and similar class-like abstractions. For example, the C# language has value types. Unlike value objects, instances of these abstractions have identity, meaning they support operations such as field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.

Risks and Assumptions

The feature makes significant changes to the Java object model. Developers may be surprised by, or encounter bugs due to, changes in the behavior of operations such as == and synchronized. It will be important to validate that such disruptions are rare and tractable.

Some changes could potentially affect the performance of identity objects. The if_acmpeq instruction, for example, typically only costs one instruction cycle, but will now need an additional check to detect value objects. The identity class case should be optimized as the fast path, and we will need to minimize any performance regressions.

There is a security risk that == and hashCode can indirectly expose private field values. Further, two large trees of value objects can take unbounded time to compute ==, potentially a DoS attack risk. Developers need to understand these risks.

Dependencies

In anticipation of this feature we already added warnings about potential incompatible changes to value class candidates in javac and HotSpot, via JEP 390.

Null-Restricted Value Object Storage (Preview) will build on this JEP, allowing programmers to manage nulls and atomicity, enabling additional optimizations for value objects stored in fields and arrays.

Enhanced Primitive Boxing (Preview) will enhance the language's use of primitive types, taking advantage of the lighter-weight characteristics of boxing to value objects.

JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.