JEP 401: Value Classes and Objects (Preview)

OwnerDan Smith
TypeFeature
ScopeSE
StatusDraft
Componentspecification
Discussionvalhalla dash dev at openjdk dot java dot net
EffortXL
DurationXL
Reviewed byBrian Goetz
Created2020/08/13 19:31
Updated2024/04/23 19:25
Issue8251554

Summary

Enhance the Java Platform with value objects, class instances that have only final fields and lack object identity. This is a preview language and VM feature.

Goals

Non-Goals

Motivation

Java developers often need to represent simple domain values: the shipping address of an order, a log entry from an application, and so on. To do this, developers typically declare classes whose main purpose is to "wrap" data, stored in final fields. For example, a simple RGB color value could be represented with a record, whose fields are final by default:

var orange = new Color(237, 139, 0);
var blue   = new Color(0, 115, 150);
...
record Color(byte red, byte green, byte blue) {
    public Color(int r, int g, int b) {
        this(checkByte(r), checkByte(g), checkByte(b));
    }
    
    private static byte checkByte(int x) {
        if (x < 0 || x > 255) throw new IllegalArgumentException();
        return (byte) (x & 0xff);
    }

    // Provided automatically: red(), green(), blue(),
    //     toString(), equals(Object), hashCode()

    public Color mix(Color that) {
        return new Color(avg(red, that.red),
                         avg(green, that.green),
                         avg(blue, that.blue));
    }
    
    private static byte avg(byte b1, byte b2) {
        return (byte) (((b1 & 0xff) + (b2 & 0xff)) / 2);
    }
}

Developers will regard the "essence" of a Color object as a red-green-blue triple, but to Java, the essence of an object is its identity. Each execution of new Color(...) creates an object with a unique identity, making it distinguishable from every other object in the system. An object's identity means that developers can share references to an object between different parts of a program, and changes to an object's fields in one part of the program can be observed in other parts.

Object identity is problematic for simple domain values

Object identity is at best irrelevant and at worst harmful to simple domain values:

Confusion around == for objects is so widespread that Java gives special treatment to objects of fundamental classes:

This special treatment minimizes the role of object identity for string literals and integer literals, but fails to address the confusion around == for strings and integers in general. The most viewed Java question on StackOverflow concerns the use of == with String objects, and another high-visibility question concerns the use of == with Integer objects.

All Java developers would benefit if == ignored object identity and focused on the "essence" of the object -- whether for String objects, Integer objects, Color objects, and any other simple domain value.

Object identity is expensive at run time

Java's insistence that every object has identity, even if simple domain values don't want it, means worse performance. Typically, the JVM has to allocate memory for each newly created object, distinguishing it from every object already in the system, and reference that memory location wherever the object is used or stored. This causes the garbage collector to work harder, taking cycles away from the application, and it means worse locality of reference—for example, an array may refer to objects scattered around memory, frustrating the CPU cache as the program iterates over the array.

Modern JVMs have an optimization called escape analysis that can mitigate these performance concerns. For example, instead of allocating memory for a Color x with three byte fields, the JVM can pass the three byte values around the program directly. An inlined call to x.mix(...) could run without any memory being allocated, even though the mix method performs new Color(...). This optimization is valid as long as the code never depends on the identity of the object in question. Unfortunately, the optimization must be unraveled if the program performs an identity-sensitive operation such as x == y, or if the object "escapes" into code that the optimization can't observe, because the unseen code may perform an identity-sensitive operation.

In some application domains, developers routinely program for speed by creating as few objects as possible, thus de-stressing the garbage collector and improving locality. For example, they might encode their RGB colors as three int values rather than as Color objects. Unfortunately, this approach gives up the functionality of classes that makes Java code so maintainable: meaningful names, private state, data validation by constructors, convenience methods, etc. A developer operating on colors represented as int values might accidentally interpret the bits with a BGR encoding, swapping the red and blue components and corrupting the resulting image.

Programming without identity

Trillions of Java objects are created every day, each one bearing a unique identity. We believe the time has come to let Java developers choose which objects in the program need identity, and which do not. A class like Color that represents simple domain values could opt out of identity, so that there would never be two distinct Color objects representing the HTML color purple, just as there are never two distinct int values that both represent the number 4.

By opting out of identity, developers are opting in to a programming model that provides the best of both worlds: the abstraction of classes with the simplicity and performance benefits of primitives.

Important classes in the JDK, such as the wrapper classes used for boxing, are already designed to be "value-based", meaning they discourage depending on the identity of instances. With this JEP, these classes can opt out of identity entirely. For example, in the case of the class Integer, instances will have no identity, == will compare all Integer objects by value, and the run-time overhead of the Integer type can dramatically shrink. Even when stored in arrays, Integer[] can be made nearly as efficient as int[].

Description

A value object is an object that does not have identity. A value object is an instance of a value class. Two value objects are the same according to == if they have the same field values, regardless of when or how they were created. Two variables of a value class type may hold references that point to different memory locations, but refer to the same value object -- much like two variables of type int may hold the same int value.

An identity object is an object that does have identity: a unique property associated with the object when it is created. Prior to value classes, every object in Java was an identity object. Two identity objects are the same according to == if they have the same identity. Two variables of an identity class type refer to the same identity object only if they hold references pointing to the same memory location.

At run time, the use of value objects may be optimized in ways that are difficult or impossible for identity objects. This is because value objects, untethered from any canonical memory location, can be duplicated or re-used whenever it is convenient for the JVM to do so. This freedom allows for smaller memory footprint, fewer memory allocations, and better data locality.

Existing classes that represent simple domain values and that have followed best practices to avoid identity dependencies can be easily migrated to be value classes, with minimal compatibility impact. This JEP migrates a handful of commonly-used classes in the Java Platform, including the primitive wrapper classes such as Integer.

Value classes are a preview language feature, disabled by default.

To try the examples below in JDK NN you must enable preview features:

Programming with value objects

Programs create value objects by instantiating a class that has been declared with the value modifier. In most respects, value objects behave just like any other object, but there are some special behaviors that programmers should be aware of.

Value classes

A class that has no need for identity-related features can opt out of those features with the value modifier. Classes with the value modifier are value classes; classes without the modifier are identity classes.

The Color record introduced earlier could be declared a value record. Nothing else about the declaration changes.

value record Color(byte red, byte green, byte blue) {
    public Color(int r, int g, int b) {
        this(checkByte(r), checkByte(g), checkByte(b));
    }
    
    private static byte checkByte(int x) {
        if (x < 0 || x > 255) throw new IllegalArgumentException();
        return (byte) (x & 0xff);
    }

    public Color mix(Color that) {
        return new Color(avg(red, that.red),
                         avg(green, that.green),
                         avg(blue, that.blue));
    }
    
    private static byte avg(byte b1, byte b2) {
        return (byte) (((b1 & 0xff) + (b2 & 0xff)) / 2);
    }
}

A simple class representing US dollar currency values (to two decimal points) might also be a good value class candidate. In this case, the author might prefer to declare a regular (non-record) class to more closely control the internal state. But because the class does not depend on identity-sensitive features like unique instance creation, field mutation, or synchronization, it can be declared a value class.

value class USDCurrency implements Comparable<USDCurrency> {
    private int cs; // implicitly final
    private USDCurrency(int cs) { this.cs = cs; }

    public USDCurrency(int dollars, int cents) {
        this(dollars * 100 + (dollars < 0 ? -cents : cents));
    }

    public int dollars() { return cs/100; }
    public int cents() { return Math.abs(cs%100); }

    public USDCurrency plus(USDCurrency that) {
        return new USDCurrency(cs + that.cs);
    }

    public int compareTo(USDollars that) { ... }
    public String toString() { ... }
}

The instance fields of a value class are implicitly final. (Special rules apply to the initialization of value class fields in constructors, as described later.) The instance methods of a value class must not be synchronized.

Many abstract classes are also good value class candidates. The class java.lang.Number, for example, has no fields, nor any code that depends on identity-sensitive features.

abstract value class Number implements Serializable {
    public abstract int intValue();
    public abstract long longValue();
    public byte byteValue() { return (byte) intValue(); }
    ...
}

Abstract value classes may be extended by both value and identity classes; in the body of an abstract value class, this may be a value object or an identity object, depending on which kind of subclass is being used. On the other hand, if a value class is not declared abstract, it is assumed to be final and may have no subclasses.

Identity classes may only be extended by other identity classes; in the body of an identity class, this is always guaranteed to be an identity object. Once a class has expressed a dependency on object identity, its subclasses cannot undo this dependency. (Object is a special exception: as the identity class at the top of the class hierarchy, it must permit value subclasses.)

Beyond the restrictions described above, a value class declaration is just like any other class declaration. The class can declare methods and implement interfaces. Users of the class will not typically notice anything unusual about the class—aside from identity-sensitive behaviors, everything about the objects is the same.

// value objects are created with 'new'
USDCurrency d1 = new USDCurrency(100,25);

// value class types may be 'null'
USDCurrency d2 = null;

// method invocations work as usual
if (d1.dollars() >= 100)
    d2 = d1.plus(new USDCurrency(-100,0));

// objects can be viewed as superclass instances
Object o = d2;
String s = o.toString(); // "$0.25"

// objects can be viewed as interface instances
Comparable<USDCurrency> c = d2;
int i = c.compareTo(d1); // -1

Value object construction

Field mutation is closely tied to identity: an object whose field is being updated is the same object before and after the update, so the object needs some way to be uniquely identified separately from the state of its fields. Usually, object identity addresses this need.

But field mutation is also a necessary part of value object construction: the fields of a value object are always final, yet they still start out storing zeros and nulls, and some code must be executed to update this state to an appropriate value. Without relying on object identity, JVMs are responsible for managing some sort of value object "buffer" that can be written into to set up the object. Value class constructors do not need to use any special new syntax, but they are required to carefully initialize the class's fields without exposing developers to observable field mutation and object identity.

To concretely illustrate the problem, recall that final fields of identity classes may be initialized at any point during construction, and nothing prevents attempts to read those fields beforehand, revealing their pre-initialization values. In the following identity class, fields x and y are declared final, yet for a short window during construction they can be observed to mutate, illustrated by repeatedly logging the sum() value.

class IdentityTest {
    final int x;
    final int y;
    
    public int sum() { return x + y; }
    
    public IdentityTest(int x, int y) {
        System.out.println(sum()); // 0
        this.x = x;
        System.out.println(sum()); // 1
        this.y = y;
        System.out.println(sum()); // 3
    }
}

Were the IdentityTest constructor to share this with another thread, any code in that thread would be able to observe an identity-dependent, mutable object.

To avoid this situation, value classes must set all of their instance fields in the earliest stages of construction, before the super(...) call. At this stage, the object is not yet fully-formed, its instance fields can't be read, and this references are illegal.

The Flexible Constructor Bodies JEP enhances the Java programming language to allow field assignments before an explicit super(...) call in a constructor. This capability can be used to initialize value class fields, setting the field values before any superclass construction code is executed.

private USDCurrency(int cs) {
    this.cs = cs;
    // call super() after all fields are set
    super();
}

Further, as a special rule for value classes, if a value class constructor has no explicit super(...) or this(...), then the entire constructor body is run before the implicit super() call. Similarly, instance field initializers in a value class are always executed at the start of the constructor body, before any super(...) call.

private USDCurrency(int cs) {
    // field initializers, if any, run here
    this.cs = cs;
    // implicit super() goes here
}

References to this (explicit or implicit) during value object construction are only allowed after all fields have been set and an explicit super(...) or this(...) call has occurred. Before that, the "larval" value object under construction is not observable by the program.

In the following test, the fields of a value class are mutated during construction, much like the identity class above. But the assignments occur earlier during construction, and it is impossible to observe any mutation—the first opportunity to log the sum() of the fields is after the super() call, when all field values have already been set.

value class ValueTest {
    final int x;
    final int y;
    
    public int sum() { return x + y; }
    
    public ValueTest(int x, int y) {
        this.x = x;
        this.y = y;
        super();
        System.out.println(sum()); // 3
    }
}

References between objects

Value class types are reference types. In Java, any code that operates on an object is really operating on a reference to that object; member accesses must resolve the reference to locate the object (throwing an exception in the case of a null reference). Value objects are no different in this respect.

It might seem odd to talk about references to objects that have no identity, since it is natural to think of an object's memory address as the run time representation of its identity. Indeed, stable memory addresses are not essential for value objects, and JVM implementations will often try to optimize away any indirections to the object data. However, when reasoning about a Java program, it's best to imagine all objects continuing to be handled and operated on via references.

Objects can store references to other objects in their fields, creating complex relationship graphs. There is no restriction on the types of references between value and identity objects. The following value class, for example, stores one reference to an identity object and two references to value objects. The third field, predecessor, recursively references another object of the same value class type (or null).

value class Item {
    private String name; // identity class type
    private USDCurrency cost; // value class type
    private Item predecessor; // this value class type

    public Item(String n, USDCurrency c) {
        this(n, c, null);
    }
    
    public Item(String n, USDCurrency c, Item p) {
        ...
    }

    ...
}

There is, however, one important limitation on references between objects: due to value classes' strict construction requirements, when a value object's fields are initialized, they cannot refer back to the object itself—this is not yet referenceable at that point. So it is impossible, for example, to create an Item whose predecessor is that same Item.

More generally, imagine a directed graph whose nodes are objects and whose edges are references stored in instance fields. For any program running on a JVM, if the object graph contains a cycle, at least one node in the cycle must be an identity object. A cycle can never exist among value objects exclusively.

Comparing value objects with ==

The == operator traditionally tests whether two references are the same. But this capability depends on object identity: only identity objects can be reliably referenced at a stable location.

With the introduction of value objects, the == operator must instead test whether two referenced objects are the same—that is, one is "substitutable" for the other. For identity objects, this is just a different way of describing the same test. But in the case of value objects, this means testing that the objects, wherever located, represent the same value. The result is true if the objects being compared belong to the same class and have the same field values. (Fields with primitive types are compared by their bit patterns. Other field values—both identity and value objects—are recursively compared with ==.)

// value objects with the same field values are the same
USDCurrency d1 = new USDCurrency(3,95);
USDCurrency d2 = new USDCurrency(3,95).plus(new USDCurrency(0,0)); 

assert d1 == d2; // true

// objects are still the same when viewed as supertypes
Object o1 = d1;
Object o2 = d2;
assert o1 == o2; // true

// identity objects are unique when created separately
String s1 = "hamburger";
String s2 = new String(s1); // new identity
assert s1 != s2; // true

// == recursively compares identity object fields
assert new Item(s1, d1) != new Item(s2, d1); // true

// == recursively compares value object fields
assert new Item(s1, d1) == new Item(s1, d2); // true

Notice three things about the recursive use of ==:

When declaring a value class, it's important to keep each of these factors in mind. In some cases, an identity class may be a better fit.

The equals method

While == tests whether two value objects are the same object, the equals method tests whether two objects represent the same data. As for identity classes, two value objects may be !=, but still be considered by the class author to be equal.

// distinct identity objects may be equal
String s1 = "hamburger";
String s2 = new String(s1); // new identity
assert s1 != s2; // true
assert s1.equals(s2); // true

// distinct value objects may be equal
assert new Item(s1, d1) != new Item(s2, d1); // true
assert new Item(s1, d1).equals(new Item(s2, d1)); // should be true

The problem of defining what constitutes "the same data" is left to the class author when they implement their equals method. For convenience, the default Object.equals implementation aligns with ==, testing whether two objects are the same; for simple value classes, this is often good enough. Value records are able to provide an even more convenient default implementation, comparing record components recursively with equals. But these are just starting points, and it's ultimately up to the class author to provide an appropriate equals implementation.

When thinking about equals and ==, its important to remember that a value object's internal state (the data it stores) is not always the same as its external state (the data it represents). An == test compares internal state. This is often not what you're after. Instead, the best advice for developers in most cases is to use equals whenever they need to compare objects.

In the following example, the value class Substring implements CharSequence. A Substring represents a string lazily, without allocating a char[] in memory. Naturally, then, two Substring objects should be considered equal if they represent the same string, regardless of differences in their internal state.

value class Substring implements CharSequence {
   private String str;
   private int start, end;

   public int length() {
       return end - start;
   }

   public char charAt(int i) {
       return str.charAt(start + i);
   }

   public String toString() {
       return str.substring(start, end);
   }
   
   public boolean equals(Object o) {
      return o instanceof Substring && toString().equals(o.toString());
   }
}

Substring s1 = new Substring("ionization", 0, 3);
Substring s2 = new Substring("ionization", 7, 10);
assert s1 != s2; // true
assert s1.equals(s2); // true

The distinction between internal state and external state helps to explain why not all value classes are records, and not all records are value classes: records are used to opt out of separate internal state, while value classes are used to opt out of identity. Each of these choices can be made orthogonally.

Other identity-sensitive operations

In addition to ==, a handful of specialized operations supported by the Java platform have historically relied on object identity. When encountering a value object, these operations behave as follows:

For developers who need to detect value objects for special treatment in their own code, a new method java.util.Objects.isValueObject is defined.

Run-time optimizations for value objects

Because there is no need to preserve identity, Java Virtual Machine implementations have a lot of freedom to encode value objects at run time in ways that optimize memory footprint, locality, and garbage collection efficiency. Optimization techniques will typically either duplicate or re-use value objects to achieve these goals. Duplication might be useful, for example, to convert a value object to an encoding that requires fewer memory loads when accessing the object's data.

This section describes abstractly some of the JVM optimization techniques implemented by HotSpot. It is not comprehensive or prescriptive, but offers a taste of how value objects enable improved performance.

Value object scalarization

Scalarization is one important optimization enabled by the lack of identity. A scalarized reference to a value object is encoded as a set of the object's field values, with no enclosing container. A scalarized object is essentially "free" at run time, having no impact on the normal object allocation and garbage collection processes.

In HotSpot, scalarization is a JIT compilation technique, affecting the representation of references to value objects in the bodies and signatures of JIT-compiled methods.

To illustrate, the plus method of USDCurrency could be scalarized by a JIT compiler. All USDCurrency references could essentially be encoded as int values.

// original method:
public USDCurrency plus(USDCurrency that) { 
    return new USDCurrency(cs + that.cs); 
} 

// effectively:
public static int $plus(int this$cs, int that$cs) {
    return this$cs + that$cs;
}

// original invocation:
new USDCurrency(1,23).plus(new USDCurrency(4,56));

// effectively:
$plus(123, 456);

The reality of scalarization is more complicated, however, due to two additional requirements:

The following illustrates how the Color.mix method might be scalarized with these requirements in mind:

// original method:
public Color mix(Color that) {
    return new Color(avg(red, that.red),
                     avg(green, that.green),
                     avg(blue, that.blue));
}

// effectively:
static { boolean, byte, byte, byte }
    $mix(boolean this$null, byte this$r,
         byte this$g, byte this$b,
         boolean that$null, byte that$r,
         byte that$g, byte that$b) {
           
     $checkNull(this$null);
     $checkNull(that$null);
     return { false,
              avg(this$r, that$r),
              avg(this$g, that$g),
              avg(this$b, that$b) };
 }

// original invocation:
new Color(0x80, 0x00, 0x80).mix(new Color(0xff, 0xff, 0xff));

// effectively:
$mix(false, 0x80, 0x00, 0x80, false, 0xff, 0xff, 0xff);

JVMs have used similar techniques to scalarize identity objects in local code when the JVM is able to prove that an object's identity is never used. But scalarization of value objects is more predictable and far-reaching, even across non-inlinable method invocation boundaries.

One limitation of scalarization is that it is not typically applied to a variable with a type that is a supertype of a value class type. Notably, this includes method parameters of generic code whose erased type is Object. Instead, when an assignment to a supertype occurs, a scalarized value object must be converted to an ordinary heap object encoding. But this allocation occurs only when necessary, and as late as possible.

Value object heap flattening

Heap flattening is another important optimization enabled by value objects' lack of identity. A flattened reference to a value object is encoded as a compact bit vector of the object's field values, without a pointer to a different memory location. This bit vector can then be stored directly in heap storage, in a field or an array of a value class type.

Heap flattening is useful because a flattened value object requires less memory than an ordinary object on the heap, and because the data is stored locally, avoiding expensive cache misses. These benefits can significantly improve some programs' memory footprint and execution time.

To illustrate, an array of Color references could directly store 32-bit encodings of the referenced objects. Note that, as for scalarization, an extra flag is needed to keep track of null references.

// original code:
Color[] cs = new Color[100];
cs[5] = new Color(0x800080);
Color c1 = cs[5];
Color c2 = cs[6];

// effectively:
int[] cs = new int[100];
cs[5] = $flatten(false, 0x80, 0x00, 0x80);
{ boolean c1$null, byte c1$r, byte c1$g, byte c1$b } =
    $inflate(cs[5]);
{ boolean c2$null, byte c2$r, byte c2$g, byte c2$b } =
    $inflate(cs[6]);

// where:
int $flatten(boolean val$null, byte val$r,
              byte val$g, byte val$b) {
    if (val$null) return 0;
    else return (1 << 24) | (val$r & 0xff << 16) |
                (val$g & 0xff << 8) | (val$b & 0xff);
}

{ boolean, byte, byte, byte } $inflate(int vector) {
    if (vector == 0) return { true, 0, 0, 0 };
    else return { false,
                  vector >> 16 & 0xff,
                  vector >> 8 & 0xff,
                  vector & 0xff };
}

The details of heap flattening will vary, of course, at the discretion of the JVM implementation.

Heap flattening must maintain the integrity of objects. For example, the flattened data must be small enough to read and write atomically, or else it may become corrupted. On common platforms, "small enough" may mean as few as 64 bits, plus a null flag that can be managed separately. So while many small value classes can be flattened, larger classes that declare, say, 3 int fields or 2 long fields, might have to be encoded as ordinary heap objects.

In the future, 128-bit flattened encodings should be possible on platforms that support atomic reads and writes of that size. And the Null-Restricted Value Types JEP will enable heap flattening for even larger value classes in use cases that are willing to opt out of atomicity guarantees.

Migration of existing classes

Existing classes that represent simple domain values and that have followed best practices to avoid identity dependencies can be easily migrated to be value classes, with minimal compatibility impact. When preview features are enabled, a handful of commonly-used classes in the JDK, outlined below, are migrated to be value classes.

Preparing for migration

Developers are encouraged to identify and eventually migrate value class candidates in their own code. Records and other classes that represent "simple domain values" are potential candidates, along with interface-like abstract classes that declare no fields.

The author of an identity class that is intended to become a value class in a future release should consider the following:

Impact of migration

In most respects, an identity class that has addressed the risks outlined in the previous section can be compatibly made a value class by simply adding the value modifier.

All existing binaries will continue to link successfully. The only new compiler errors will be attempts to synchronize on the value class type.

There are some behavioral changes that users of the migrated classes may notice:

Value classes in the standard library

Some classes in the standard library have been designated value-based, with the understanding that they would become value classes in a future release.

Under this JEP, when preview features are enabled, the following standard library classes are considered to be value classes, despite not having been declared or compiled with the value modifier:

The migration of the classes used by boxing should significantly reduce boxing-related overhead.

Alternatives

As discussed, JVMs have long performed escape analysis to identify objects that never rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization, including storage in fields and arrays.

Hand-coded optimizations via primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.

The C language and its relatives support flattened storage for structs and similar class-like abstractions. For example, the C# language has value types. Unlike value objects, instances of these abstractions have identity, meaning they support operations such as field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.

Risks and Assumptions

The feature makes significant changes to the Java object model. Developers may be surprised by, or encounter bugs due to, changes in the behavior of operations such as == and synchronized. We expect such disruptions to be rare and tractable.

Some changes could potentially affect the performance of identity objects. The if_acmpeq test, for example, typically only costs one instruction cycle, but will now need an additional check to detect value objects. But the identity class case can be optimized as a fast path, and we believe we have minimized any performance regressions.

There is a security risk that == and hashCode can indirectly expose private field values. Further, two large trees of value objects can take unbounded time to compute ==, potentially a DoS attack risk. Developers need to understand these risks.

Dependencies

Prerequisites:

Future work: