JEP 401: Value Classes and Objects (Preview)

OwnerDan Smith
TypeFeature
ScopeSE
StatusSubmitted
Discussionvalhalla dash dev at openjdk dot java dot net
EffortXL
DurationXL
Reviewed byBrian Goetz
Created2020/08/13 19:31
Updated2023/10/09 23:59
Issue8251554

Summary

Enhance the Java object model with value objects, class instances that have only final instance fields and lack object identity. This is a preview language and VM feature.

Goals

Non-Goals

Motivation

Java developers often declare classes that represent simple values in their business domain. These classes have immutable state, and instances can be considered interchangeable if their state matches, regardless of when or how they were created. For these objects, the field values are meaningful, but the object wrapper can be ignored.

For example, a class to encapsulate currency values in a finance application might be simple a wrapper around a final int field:

final class USDollars implements Comparable<USDollars> {
    private final int cs;
    private USDollars(int cs) { this.cs = cs; }

    public USDollars(int dollars, int cents) {
        this(dollars * 100 + (dollars < 0 ? -cents : cents));
    }

    public int dollars() { return cs/100; }
    public int cents() { return Math.abs(cs%100); }

    public USDollars plus(USDollars that) {
        return new USDollars(cs + that.cs);
    }

    public int compareTo(USDollars that) { ... }
    public String toString() { ... }
    public boolean equals(Object o) { ... }
    public int hashCode() { ... }
}

In Java, every object that is created is given a unique identity, distinguishing it from any other object in the system. The Object.toString method hints at this unique identity, and the == operator compares objects by their identities, as illustrated in JShell:

jshell> new Object()
$1 ==> java.lang.Object@b1bc7ed

jshell> new Object()
$2 ==> java.lang.Object@30dae81

jshell> new Object() == new Object()
$3 ==> false

For classes like USDollars that represent simple values, identity is unneeded and even counter-productive. The presence of identity can be a distraction to users of the class:

USDollars x = new USDollars(23,95);
USDollars y = x.plus(new USDollars(0,0));
if (x == y) ... // false, even though x.equals(y)

Confusion related to object identity is extremely common. Some of the most frequently-asked questions about Java on StackOverflow relate to object identity, including whether to use == for object comparisons and the meaning of "pass by value" for object references. Identity is especially unintuitive and needlessly complex for objects that represent simple values.

At run time, support for identity is expensive. Identity means that a newly created object can be distinguished from every object already in the system. To achieve this, each newly created object requires the allocation of a fresh region of memory in the JVM's heap. This region stores the object's fields, and is not shared with any other object. Heap-allocated objects will be managed and eventually deallocated by a garbage collector. These objects flow through program code indirectly as heap pointers. An array of objects may include pointers to scattered locations throughout the heap, frustrating memory caches as the program iterates over the array.

For a class like USDollars that doesn't care about identity, run-time performance would be better if the JVM could just pass an int value around, and only allocate an object in memory when the use site required it (e.g., when assigned to a variable of type Comparable). Developers could "allocate" instances of USDollars freely without any impact on memory usage or garbage collection. Arrays could store USDollars instances directly as int values, avoiding extra pointers.

Indeed, modern JVMs will often perform such an optimization if they can prove that an object's identity is unused. A repeated invocation of plus in a loop to find a sum, for example, would probably not cause any heap allocation for intermediate USDollars results in optimized code. Unfortunately, such optimizations are limited, and there is little prospect of improving them. For example, if a developer makes use of ==, it is generally impossible to tell whether that dependency on identity was intentional, or whether the program would behave the same if the developer had used equals. And even if no code uses == locally, once an object has been written to a field or array, it is generally impossible to tell whether the object will need to support == at some point in the future.

Often, developers work around these limitations by avoiding some class declarations altogether, instead using bare primitive types to represent simple values in their business domain. But this strategy gives up all the abstractions provided by objects and classes: methods, access control, data validation, subtyping, etc. A developer of the finance application operating on int currency values might forget to divide by 100 to get a dollar value, or might accidentally interpret the int as a price in euros.

It would be ideal if, instead, developers could declare classes that represent simple values but that explicitly opt out of unneeded identity-based behavior, like identity-sensitive == operations. This would provide the best of both worlds: the abstractions of objects and classes with much of the simplicity and performance benefits of primitive types.

This new property should be applicable to existing classes, including records—which are designed to carry immutable data and often don't need identity—and various value-based classes defined by the Java platform.

In particular, the classes Integer, Double, Byte, etc., that represent boxed primitives, are classic examples of classes modeling simple values that do not need identity. Many developers have been tripped up when pairs of boxed Integers representing identical small values are considered == to each other, while other pairs of boxed Integers representing identical larger values are not. Without identity, confusion about the meaning of == applied to Integer would go away, and the run time overhead of boxed Integer objects would be significantly reduced.

Description

The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags. More comprehensive requirements and implementation details for the language, JVM, and standard libraries can be found in subtasks of this JEP.

Overview

A value object is a class instance that does not have identity. That is, a value object does not have any particular memory address or any other property to distinguish it from other instances of the same class whose fields have the same values. Value objects cannot mutate their fields or be used for synchronization. The == operator on value objects compares their fields. A value class declaration introduces a class whose instances are value objects.

An identity object is a class instance or array that does have identity—the traditional behavior of objects in Java. An identity object can mutate its non-final fields and is associated with a synchronization monitor. The == operator on identity objects compares their identities. An identity class declaration—the default for a concrete class—introduces a class whose instances are identity objects.

At runtime, uses of value objects may be optimized in ways that are difficult or impossible for identity objects.

Value classes

A value class can be declared with the value contextual keyword.

value class USDollars implements Comparable<USDollars> {
    private int cs;
    private USDollars(int cs) { this.cs = cs; }

    public USDollars(int dollars, int cents) {
        this(dollars * 100 + (dollars < 0 ? -cents : cents));
    }

    public int dollars() { return cs/100; }
    public int cents() { return Math.abs(cs%100); }

    public USDollars plus(USDollars that) {
        return new USDollars(cs + that.cs);
    }

    public int compareTo(USDollars that) { ... }
    public String toString() { ... }
}

The class and its instance fields are implicitly final. Constructors bodies are subject to some additional constraints, as described later. In most other respects, a value class declaration works just like any other class declaration.

Instances of a value class are called value objects, while all other objects are called identity objects. Value objects are created and manipulated just like identity objects. Value class types are reference types, and may be null. Value objects can be assigned to supertypes, including the type Object.

USDollars d1 = new USDollars(100,25);
USDollars d2 = null;
if (d1.dollars() >= 100)
    d2 = d1.plus(new USDollars(-100,0));
Object o = d2;
String s = o.toString(); // "$0.25"
Comparable<USDollars> c = d2;
int i = c.compareTo(d1); // -1

Value classes may have multiple fields. While many useful value classes wrap primitive-typed fields, value classes can have reference-typed fields as well, including fields of identity class types or value class types.

value class Item {
    public String name; // identity class type
    public USDollars price; // value class type

    public Item(String name, USDollars price) {
        this.name = name;
        this.price = price;
    }

    ...
}

Identity-sensitive operations

Because their instance fields are final, value objects cannot be mutated.

The == operator applied to value objects has no identity to compare, so instead compares the objects' classes and the values of their instance fields. Fields with primitive types are compared by their bit patterns. Other field values—both identity and value objects—are recursively compared with ==.

USDollars d1 = new USDollars(3,95);
USDollars d2 = new USDollars(3,95).plus(new USDollars(0,0));
assert d1 == d2;

Object o1 = d1;
Object o2 = d2;
assert o1 == o2;

String s1 = "hamburger";
String s2 = new String(s1); // new identity
assert s1 != s2;

assert new Item(s1, d1) == new Item(s1, d2);
assert new Item(s1, d1) != new Item(s2, d1);

The Object.equals method, when not overridden, is defined in terms of == and matches this behavior. The Object.hashCode and System.identityHashCode methods are similarly defined in terms of a value object's field values. The default Object.toString behavior continues to make use of the object's hash code.

assert new Item(s1, d1).equals(new Item(s1, d2));
assert new Item(s1, d1).hashCode() == new Item(s1, d2).hashCode();

Like any class, a value class may distinguish between its internal state and external state (that is, the data it stores vs. the data it represents). Thus, as usual, it is sometimes necessary to override the default equals method.

value class Substring implements CharSequence {
   private String str;
   private int start;
   private int end;

   public int length() {
       return end - start;
   }

   public char charAt(int i) {
       return str.charAt(start + i);
   }

   public String toString() {
       return str.substring(start, end);
   }
   
   public boolean equals(Object o) {
      return o instanceof Substring &&
             toString().equals(o.toString());
   }

   ...
}

Substring s1 = new Substring("abc", 0, 1);
Substring s2 = new Substring("ab", 0, 1);
assert s1 != s2;
assert s1.equals(s2);

Also note that the == operator does not perform a "deep equals" comparison on identity objects stored in fields; it is even possible that an identity object stored in a field will be mutated, but this does not impact ==.

For these reasons, the usual advice for users of a class to prefer equals tests over the == operator still applies to value classes. However, many value classes will be happy with the default == and Object.equals behavior.

Synchronization is disallowed on value objects: the compiler prevents synchronization on any value class type, and attempting to synchronize on a value object at run time results in an exception.

Other identity-sensitive APIs, like java.lang.ref, either reject value objects or use the objects' field values when an identity is needed.

A preview API method, java.util.Objects.isValueObject, can be used to dynamically detect whether an object is a value object or an identity object.

Value object scalarization

Because value objects lack identity, JVMs may freely duplicate and re-encode them in an effort to improve computation time, memory footprint, and garbage collector performance.

Scalarization is one important optimization enabled by this new freedom. A scalarized reference to a value object is encoded as a set of the object's field values, with no enclosing container. A scalarized object is essentially "free" at runtime, having no impact on the normal object allocation and garbage collection processes.

In HotSpot, scalarization is a JIT compilation technique, affecting the representation of reference to value objects in the bodies and signatures of JIT-optimized methods.

To illustrate, the plus method of USDollars could be scalarized by a JIT compiler:

public USDollars plus(USDollars that) {
    return new USDollars(cs + that.cs);
}
// effectively:
// public static int USDollars$plus(int this$cs, int that$cs) {
//     return this$cs + that$cs;
// }

new USDollars(1,23).plus(new USDollars(4,56));

// effectively USDollars$plus(123, 456);

In reality, scalarization is more complex because each variable of a value class type can be scalarized to multiple field values. And these variables actually store references which may be null, so the scalarized encoding needs an extra flag, say, to track the nullness of the reference. We'll use a { ... } notation below to represent these sets of fields and null flags, with the understanding that the set is only notational—there is no wrapper at run time.

public USDollars plus(USDollars that) {
    return new USDollars(cs + that.cs);
}
// more realistically:
// static { boolean, int } USDollars$plus(
//         { boolean this$null, int this$cs },
//         { boolean that$null, int that$cs }) {
//     $checkNull(this$null);
//     $checkNull(that$null);
//     return { false, this$cs + that$cs };
// }
//
// new USDollars(1,23).plus(new USDollars(4,56));
//
// effectively USDollars$plus({ false, 123 }, { false, 456 });

JVMs have used similar techniques to scalarize identity objects in local code when the JVM is able to prove that an object's identity is never used. But scalarization of value objects is more predictable and far-reaching, even across non-inlinable method invocation boundaries.

One limitation of scalarization is that it is not typically applied to a variable with a type that is a supertype of a value class type. Notably, this includes method parameters of generic code whose erased type is Object. Instead, when an assignment to a supertype occurs, an ordinary heap object may be allocated. But this allocation occurs only when necessary, and as late as possible.

Value object heap flattening

Heap flattening is another important optimization enabled by value classes. A flattened reference to a value object is encoded as a compact bit vector of the object's field values, without a pointer to a different memory location. This bit vector can then be stored directly in a field or an array of a value class type.

Heap flattening is useful because a flattened value object requires less memory than an ordinary object on the heap, and because the data is stored locally, avoiding expensive cache misses. These benefits can significantly improve some programs' memory footprint and execution time.

To illustrate, an array of USDollar references could directly store 64-bit encodings of the referenced objects. Note that, as for scalarization, an extra flag is needed to keep track of null references.

USDollars[] ds = new USDollars[100];
ds[5] = new USDollars(1,23);
USDollars d1 = ds[5];
USDollars d2 = ds[6];

// effectively:
// long[] ds = new long[100];
// ds[5] = USDollars$flatten({ false, 123 });
// { boolean d1$null, int d1$cs } = USDollars$inflate(ds[5]);
// { boolean d2$null, int d2$cs } = USDollars$inflate(ds[6]);
//
// where:
// long USDollars$flatten({ boolean val$null, int val$cs }) {
//     if (val$null) return 0;
//     else return (1L << 32) | val$cs;
// }
//
// { boolean, int } USDollars$inflate(long vector) {
//     if (vector == 0) return { true, 0 };
//     else return { false, (int) vector };
// }

Heap flattening is limited by the integrity requirements of objects: the flattened data must be small enough to read and write atomically, or else the encoded data may become corrupted. On common platforms, "small enough" may mean as few as 32 or 64 bits. So while many small value classes can be flattened, most value classes that declare 2 or more fields will have to be encoded as ordinary heap objects (unless the fields store primitives of types boolean, char, byte, or short).

In the future, 128-bit flattened encodings should be possible on platforms that support atomic reads and writes of that size. And the Null-Restricted Value Types JEP will enable heap flattening for even larger value classes in use cases that are willing to opt out of atomicity guarantees.

Identity classes

The identity contextual keyword complements value and may be used to indicate that a class is not a value class, and that its instances are identity objects.

identity class SimpleCounter {
    private int count = 0;
    public int currentValue() { return count; }
    public void increment() { count++; }
}

identity record Node(String label, Node next) {
    public String list() {
        return label + (next == null) ? "" : ", " + next.list();
    }
}

A concrete class that lacks either modifier is an identity class by default.

Value records

Record classes support the value modifier. Records are often good candidates to be value classes, because their fields are already required to be final.

value record Name(String first, String last) {
    public String full() {
        return "%s %s".formatted(first, last);
    }
}

assert new Name("Amy", "Adams") == new Name("Amy", "Adams");

The record class and value class features are similar, in that both are useful for working with immutable data. However, record classes are used to opt out of separate internal state, while value classes are used to opt out of identity. Each of these choices can be made orthogonally; sometimes, an identity record is the right combination of choices.

As for other concrete classes, record classes are identity classes by default.

Superclasses and superinterfaces

A value class cannot extend an identity class. However, many abstract classes and most interfaces are fully supported as supertypes of value classes.

Extension is controlled via the value and identity modifiers. These modifiers can be applied to any class or interface. They cannot mix: it is illegal for a value class or interface to extend an identity class or interface, or vice versa.

A value interface, then, is an interface whose instances are all value objects, while an identity interface is an interface whose instances are all identity objects.

value interface JsonValue {
    String toJsonString();
}

identity interface Counter {
    int currentValue();
    void increment();
}

Most interfaces are declared with neither modifier and are unconstrained. The List interface, for example, may be implemented by both identity and value classes.

Abstract classes fall into a few different categories:

The class Object is special: as the superclass of all other classes, it must be unconstrained. However, it is a concrete class, and calls to new Object() continue to create direct identity object instances of the class (suitable, e.g., as synchronization locks).

Constraints on value class constructors

The constructor of a value class is regulated, meaning that its body must not make any use of this, except to write to an instance field. This ensures a value object does not "escape" to the rest of the program during construction.

value class Rational extends Number {
    int num;
    int denom;
    
    public Rational(int numerator, int denominator) {
        super();
        if (denominator == 0)
            throw new IllegalArgumentException();

        // gcf method must be static
        int factor = gcf(numerator, denominator);
        int n = numerator/factor;
        int d = denominator/factor;
        this.num = n;
        this.denom = d;

        // Cannot refer to 'this' in logging
        System.out.printf("%s/%s-->%s/%s%n",
                          numerator, denominator, n, d);
    }
    
    static int gcf(int num, denom) { ... }

    ...
}

Value objects can be thought of as being in a larval, "write only" state until construction is complete. They will never be observed to mutate, and will never participate in circular object graphs.

Constructors of other classes may be explicitly marked regulated (modifier subject to change) to impose the same restrictions. Any constructor invoked by super() or this() from a regulated constructor must itself be regulated, so any superclass of a value class must declare at least one regulated constructor.

public abstract class Number implements Serializable {

    public regulated Number() { }

    ...
}

If a class doesn't declare a constructor, it gets a default constructor that simply calls super(). This constructor is usually regulated and thus, in the case of an unconstrained abstract class, can support value subclasses. (The exception is when the super() call invokes a non-regulated constructor of the superclass; then the default constructor is also non-regulated.)

Migration of existing classes

Some classes in the standard library have been designated value-based, with the understanding that they would become value classes in a future release.

Under this JEP, when preview features are enabled, the following standard library classes are considered to be value classes, despite not having been declared or compiled with the value modifier:

The migration of the classes used by boxing should significantly reduce boxing-related overhead (although Long and Double may be too large for heap flattening).

Developers are encouraged to identify and migrate value class candidates in their own code, where appropriate. An existing class that meets the requirements of a value class declaration may be migrated simply by applying the value modifier. This is a binary compatible change.

There are some behavioral changes that users of migrated classes may notice:

Similarly, when preview features are enabled, the constructors of java.lang.Object, java.lang.Number, and java.lang.Record are considered to be regulated, despite not having been declared with that modifier.

Developers should scrutinize other existing abstract classes as potential value class superclasses. This primarily involves ensuring that any declared constructors are marked regulated. This is straightforward for most existing constructors; occasionally, a new compiler error will occur, but these can often be worked around with a simple refactoring, such as storing a computed value locally rather than reading it from a field.

Alternatives

As discussed, JVMs have long performed escape analysis to identify objects that never rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization, including storage in fields and arrays.

Hand-coded optimizations via primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.

The C language and its relatives support flattened storage for structs and similar class-like abstractions. For example, the C# language has value types. Unlike value objects, instances of these abstractions have identity, meaning they support operations such as field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.

Risks and Assumptions

The feature makes significant changes to the Java object model. Developers may be surprised by, or encounter bugs due to, changes in the behavior of operations such as == and synchronized. It will be important to validate that such disruptions are rare and tractable.

Some changes could potentially affect the performance of identity objects. The if_acmpeq instruction, for example, typically only costs one instruction cycle, but will now need an additional check to detect value objects. The identity class case should be optimized as the fast path, and we will need to minimize any performance regressions.

There is a security risk that == and hashCode can indirectly expose private field values. Further, two large trees of value objects can take unbounded time to compute ==, potentially a DoS attack risk. Developers need to understand these risks.

The restrictions on regulated constructors may create problems for instrumentation tools, such as those that inject code into the constructor of java.lang.Object. It may be necessary to provide workarounds to these tools.

Dependencies

In anticipation of this feature we already added warnings about potential behavioral incompatibilities for value class candidates in javac and HotSpot, via JEP 390.

Null-Restricted Value Class Types (Preview) will build on this JEP, allowing programmers to manage the storage of nulls and enable more frequent and more dense heap flattening in fields and arrays.

Enhanced Primitive Boxing (Preview) will enhance the language's use of primitive types, taking advantage of the lighter-weight characteristics of boxing to value objects.

JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.

Statements before super() (Preview) clarifies the constraints imposed in the pre-construction context of a constructor. These constraints are similar to those imposed on the entire bodies of regulated constructors.

The Class-File API (Preview) will need to track new modifiers and attributes defined by this JEP.