JEP 401: Value Classes and Objects (Preview)
Owner | Dan Smith |
Type | Feature |
Scope | SE |
Status | Submitted |
Discussion | valhalla dash dev at openjdk dot java dot net |
Effort | XL |
Duration | XL |
Reviewed by | Brian Goetz |
Created | 2020/08/13 19:31 |
Updated | 2023/10/09 23:59 |
Issue | 8251554 |
Summary
Enhance the Java object model with value objects, class instances that have
only final
instance fields and lack object identity.
This is a preview language and VM feature.
Goals
-
Allow developers to "opt in" to a programming model for simple values in which objects are distinguished solely by their field values, much as the
int
value3
is distinguished from theint
value4
. -
Migrate popular classes that represent simple values in the JDK, such as
Integer
, to this programming model. Support compatible migration of user-defined classes. -
Maximize the freedom of the JVM to encode simple values in ways that improve memory footprint, locality, and garbage collection efficiency.
Non-Goals
-
It is not a goal to introduce a
struct
feature in the Java language. Java continues to operate on just two kinds of data: primitives and objects. -
It is not a goal to change the treatment of primitive types. Primitive types behave like value classes in many ways, but are a distinct concept. A separate JEP will provide enhancements to make primitive types more class-like and compatible with generics.
-
It is not a goal to guarantee any particular optimization strategy or memory layout. This JEP enables many potential optimizations; only some will be implemented initially. Future JEPs will pursue optimizations related to
null
exclusion and generic specialization. -
It is not a goal to automatically treat existing classes as value classes, even if they meet the requirements for how value classes are declared and used. The behavioral changes require an explicit opt-in.
Motivation
Java developers often declare classes that represent simple values in their business domain. These classes have immutable state, and instances can be considered interchangeable if their state matches, regardless of when or how they were created. For these objects, the field values are meaningful, but the object wrapper can be ignored.
For example, a class to encapsulate currency values in a finance application
might be simple a wrapper around a final
int
field:
final class USDollars implements Comparable<USDollars> {
private final int cs;
private USDollars(int cs) { this.cs = cs; }
public USDollars(int dollars, int cents) {
this(dollars * 100 + (dollars < 0 ? -cents : cents));
}
public int dollars() { return cs/100; }
public int cents() { return Math.abs(cs%100); }
public USDollars plus(USDollars that) {
return new USDollars(cs + that.cs);
}
public int compareTo(USDollars that) { ... }
public String toString() { ... }
public boolean equals(Object o) { ... }
public int hashCode() { ... }
}
In Java, every object that is created is given a unique identity, distinguishing
it from any other object in the system. The Object.toString
method hints at
this unique identity, and the ==
operator compares objects by their
identities, as illustrated in JShell:
jshell> new Object()
$1 ==> java.lang.Object@b1bc7ed
jshell> new Object()
$2 ==> java.lang.Object@30dae81
jshell> new Object() == new Object()
$3 ==> false
For classes like USDollars
that represent simple values, identity is unneeded
and even counter-productive. The presence of identity can be a distraction to
users of the class:
USDollars x = new USDollars(23,95);
USDollars y = x.plus(new USDollars(0,0));
if (x == y) ... // false, even though x.equals(y)
Confusion related to object identity is extremely common. Some of the
most frequently-asked questions
about Java on StackOverflow relate to object identity, including whether to use
==
for object comparisons and the meaning of "pass by value" for object
references. Identity is especially unintuitive and needlessly complex for
objects that represent simple values.
At run time, support for identity is expensive. Identity means that a newly created object can be distinguished from every object already in the system. To achieve this, each newly created object requires the allocation of a fresh region of memory in the JVM's heap. This region stores the object's fields, and is not shared with any other object. Heap-allocated objects will be managed and eventually deallocated by a garbage collector. These objects flow through program code indirectly as heap pointers. An array of objects may include pointers to scattered locations throughout the heap, frustrating memory caches as the program iterates over the array.
For a class like USDollars
that doesn't care about identity, run-time
performance would be better if the JVM could just pass an int
value around,
and only allocate an object in memory when the use site required it (e.g.,
when assigned to a variable of type Comparable
). Developers could "allocate"
instances of USDollars
freely without any impact on memory usage or garbage
collection. Arrays could store USDollars
instances directly as int
values,
avoiding extra pointers.
Indeed, modern JVMs will often perform such an optimization if they can prove
that an object's identity is unused. A repeated invocation of plus
in
a loop to find a sum, for example, would probably not cause any heap allocation
for intermediate USDollars
results in optimized code. Unfortunately, such
optimizations are limited, and there is little prospect of improving them. For
example, if a developer makes use of ==
, it is generally impossible to tell
whether that dependency on identity was intentional, or whether the program
would behave the same if the developer had used equals
. And even if no code
uses ==
locally, once an object has been written to a field or array, it is
generally impossible to tell whether the object will need to support ==
at
some point in the future.
Often, developers work around these limitations by avoiding some class
declarations altogether, instead using bare primitive types to represent simple
values in their business domain. But this strategy gives up all the abstractions
provided by objects and classes: methods, access control, data validation,
subtyping, etc. A developer of the finance application operating on int
currency values might forget to divide by 100 to get a dollar value, or might
accidentally interpret the int
as a price in euros.
It would be ideal if, instead, developers could declare classes that represent
simple values but that explicitly opt out of unneeded identity-based behavior,
like identity-sensitive ==
operations. This would provide the best of both
worlds: the abstractions of objects and classes with much of the simplicity and
performance benefits of primitive types.
This new property should be applicable to existing classes, including records—which are designed to carry immutable data and often don't need identity—and various value-based classes defined by the Java platform.
In particular, the classes Integer
, Double
, Byte
, etc., that represent
boxed primitives, are classic examples of classes modeling simple values that do
not need identity. Many developers have been
tripped up
when pairs of boxed Integer
s representing identical small values are
considered ==
to each other, while other pairs of boxed Integer
s
representing identical larger values are not. Without identity, confusion about
the meaning of ==
applied to Integer
would go away, and the run time
overhead of boxed Integer
objects would be significantly reduced.
Description
The features described below are preview features, enabled with the
--enable-preview
compile-time and runtime flags. More comprehensive
requirements and implementation details for the
language,
JVM, and
standard libraries
can be found in subtasks of this JEP.
Overview
A value object is a class instance that does not have identity. That is, a
value object does not have any particular memory address or any other property
to distinguish it from other instances of the same class whose fields have the
same values. Value objects cannot mutate their fields or be used for
synchronization. The ==
operator on value objects compares their fields. A
value class declaration introduces a class whose instances are value objects.
An identity object is a class instance or array that does have identity—the
traditional behavior of objects in Java. An identity object can mutate its
non-final
fields and is associated with a synchronization monitor. The ==
operator on identity objects compares their identities. An identity class
declaration—the default for a concrete class—introduces a class whose instances
are identity objects.
At runtime, uses of value objects may be optimized in ways that are difficult or impossible for identity objects.
Value classes
A value class can be declared with the value
contextual keyword.
value class USDollars implements Comparable<USDollars> {
private int cs;
private USDollars(int cs) { this.cs = cs; }
public USDollars(int dollars, int cents) {
this(dollars * 100 + (dollars < 0 ? -cents : cents));
}
public int dollars() { return cs/100; }
public int cents() { return Math.abs(cs%100); }
public USDollars plus(USDollars that) {
return new USDollars(cs + that.cs);
}
public int compareTo(USDollars that) { ... }
public String toString() { ... }
}
The class and its instance fields are implicitly final. Constructors bodies are subject to some additional constraints, as described later. In most other respects, a value class declaration works just like any other class declaration.
Instances of a value class are called value objects, while all other objects
are called identity objects. Value objects are created and manipulated just
like identity objects. Value class types are reference types, and may be null
.
Value objects can be assigned to supertypes, including the type Object
.
USDollars d1 = new USDollars(100,25);
USDollars d2 = null;
if (d1.dollars() >= 100)
d2 = d1.plus(new USDollars(-100,0));
Object o = d2;
String s = o.toString(); // "$0.25"
Comparable<USDollars> c = d2;
int i = c.compareTo(d1); // -1
Value classes may have multiple fields. While many useful value classes wrap primitive-typed fields, value classes can have reference-typed fields as well, including fields of identity class types or value class types.
value class Item {
public String name; // identity class type
public USDollars price; // value class type
public Item(String name, USDollars price) {
this.name = name;
this.price = price;
}
...
}
Identity-sensitive operations
Because their instance fields are final
, value objects cannot be mutated.
The ==
operator applied to value objects has no identity to compare, so
instead compares the objects' classes and the values of their instance fields.
Fields with primitive types are compared by their bit patterns. Other field
values—both identity and value objects—are recursively compared with ==
.
USDollars d1 = new USDollars(3,95);
USDollars d2 = new USDollars(3,95).plus(new USDollars(0,0));
assert d1 == d2;
Object o1 = d1;
Object o2 = d2;
assert o1 == o2;
String s1 = "hamburger";
String s2 = new String(s1); // new identity
assert s1 != s2;
assert new Item(s1, d1) == new Item(s1, d2);
assert new Item(s1, d1) != new Item(s2, d1);
The Object.equals
method, when not overridden, is defined in terms of ==
and
matches this behavior. The Object.hashCode
and System.identityHashCode
methods are similarly defined in terms of a value object's field values. The
default Object.toString
behavior continues to make use of the object's hash
code.
assert new Item(s1, d1).equals(new Item(s1, d2));
assert new Item(s1, d1).hashCode() == new Item(s1, d2).hashCode();
Like any class, a value class may distinguish between its internal state and
external state (that is, the data it stores vs. the data it represents). Thus,
as usual, it is sometimes necessary to override the default equals
method.
value class Substring implements CharSequence {
private String str;
private int start;
private int end;
public int length() {
return end - start;
}
public char charAt(int i) {
return str.charAt(start + i);
}
public String toString() {
return str.substring(start, end);
}
public boolean equals(Object o) {
return o instanceof Substring &&
toString().equals(o.toString());
}
...
}
Substring s1 = new Substring("abc", 0, 1);
Substring s2 = new Substring("ab", 0, 1);
assert s1 != s2;
assert s1.equals(s2);
Also note that the ==
operator does not perform a "deep equals" comparison on
identity objects stored in fields; it is even possible that an identity object
stored in a field will be mutated, but this does not impact ==
.
For these reasons, the usual advice for users of a class to prefer equals
tests over the ==
operator still applies to value classes. However, many value
classes will be happy with the default ==
and Object.equals
behavior.
Synchronization is disallowed on value objects: the compiler prevents synchronization on any value class type, and attempting to synchronize on a value object at run time results in an exception.
Other identity-sensitive APIs, like java.lang.ref
, either reject value objects
or use the objects' field values when an identity is needed.
A preview API method, java.util.Objects.isValueObject
, can be used to
dynamically detect whether an object is a value object or an identity object.
Value object scalarization
Because value objects lack identity, JVMs may freely duplicate and re-encode them in an effort to improve computation time, memory footprint, and garbage collector performance.
Scalarization is one important optimization enabled by this new freedom. A scalarized reference to a value object is encoded as a set of the object's field values, with no enclosing container. A scalarized object is essentially "free" at runtime, having no impact on the normal object allocation and garbage collection processes.
In HotSpot, scalarization is a JIT compilation technique, affecting the representation of reference to value objects in the bodies and signatures of JIT-optimized methods.
To illustrate, the plus
method of USDollars
could be scalarized by a JIT
compiler:
public USDollars plus(USDollars that) {
return new USDollars(cs + that.cs);
}
// effectively:
// public static int USDollars$plus(int this$cs, int that$cs) {
// return this$cs + that$cs;
// }
new USDollars(1,23).plus(new USDollars(4,56));
// effectively USDollars$plus(123, 456);
In reality, scalarization is more complex because each variable of a value class
type can be scalarized to multiple field values. And these variables actually
store references which may be null
, so the scalarized encoding needs an
extra flag, say, to track the nullness of the reference. We'll use a { ... }
notation below to represent these sets of fields and null flags, with the
understanding that the set is only notational—there is no wrapper at run time.
public USDollars plus(USDollars that) {
return new USDollars(cs + that.cs);
}
// more realistically:
// static { boolean, int } USDollars$plus(
// { boolean this$null, int this$cs },
// { boolean that$null, int that$cs }) {
// $checkNull(this$null);
// $checkNull(that$null);
// return { false, this$cs + that$cs };
// }
//
// new USDollars(1,23).plus(new USDollars(4,56));
//
// effectively USDollars$plus({ false, 123 }, { false, 456 });
JVMs have used similar techniques to scalarize identity objects in local code when the JVM is able to prove that an object's identity is never used. But scalarization of value objects is more predictable and far-reaching, even across non-inlinable method invocation boundaries.
One limitation of scalarization is that it is not typically applied to a
variable with a type that is a supertype of a value class type. Notably, this
includes method parameters of generic code whose erased type is Object
.
Instead, when an assignment to a supertype occurs, an ordinary heap object may
be allocated. But this allocation occurs only when necessary, and as late as
possible.
Value object heap flattening
Heap flattening is another important optimization enabled by value classes. A flattened reference to a value object is encoded as a compact bit vector of the object's field values, without a pointer to a different memory location. This bit vector can then be stored directly in a field or an array of a value class type.
Heap flattening is useful because a flattened value object requires less memory than an ordinary object on the heap, and because the data is stored locally, avoiding expensive cache misses. These benefits can significantly improve some programs' memory footprint and execution time.
To illustrate, an array of USDollar
references could directly store 64-bit
encodings of the referenced objects. Note that, as for scalarization, an extra
flag is needed to keep track of null
references.
USDollars[] ds = new USDollars[100];
ds[5] = new USDollars(1,23);
USDollars d1 = ds[5];
USDollars d2 = ds[6];
// effectively:
// long[] ds = new long[100];
// ds[5] = USDollars$flatten({ false, 123 });
// { boolean d1$null, int d1$cs } = USDollars$inflate(ds[5]);
// { boolean d2$null, int d2$cs } = USDollars$inflate(ds[6]);
//
// where:
// long USDollars$flatten({ boolean val$null, int val$cs }) {
// if (val$null) return 0;
// else return (1L << 32) | val$cs;
// }
//
// { boolean, int } USDollars$inflate(long vector) {
// if (vector == 0) return { true, 0 };
// else return { false, (int) vector };
// }
Heap flattening is limited by the integrity requirements of objects: the
flattened data must be small enough to read and write atomically, or else the
encoded data may become corrupted. On common platforms, "small enough" may mean
as few as 32 or 64 bits. So while many small value classes can be flattened,
most value classes that declare 2 or more fields will have to be encoded as
ordinary heap objects (unless the fields store primitives of types boolean
,
char
, byte
, or short
).
In the future, 128-bit flattened encodings should be possible on platforms that support atomic reads and writes of that size. And the Null-Restricted Value Types JEP will enable heap flattening for even larger value classes in use cases that are willing to opt out of atomicity guarantees.
Identity classes
The identity
contextual keyword complements value
and may be used to
indicate that a class is not a value class, and that its instances are identity
objects.
identity class SimpleCounter {
private int count = 0;
public int currentValue() { return count; }
public void increment() { count++; }
}
identity record Node(String label, Node next) {
public String list() {
return label + (next == null) ? "" : ", " + next.list();
}
}
A concrete class that lacks either modifier is an identity
class by default.
Value records
Record classes support the value
modifier. Records are often good candidates
to be value classes, because their fields are already required to be final
.
value record Name(String first, String last) {
public String full() {
return "%s %s".formatted(first, last);
}
}
assert new Name("Amy", "Adams") == new Name("Amy", "Adams");
The record class and value class features are similar, in that both are useful for working with immutable data. However, record classes are used to opt out of separate internal state, while value classes are used to opt out of identity. Each of these choices can be made orthogonally; sometimes, an identity record is the right combination of choices.
As for other concrete classes, record classes are identity classes by default.
Superclasses and superinterfaces
A value class cannot extend an identity class. However, many abstract classes and most interfaces are fully supported as supertypes of value classes.
Extension is controlled via the value
and identity
modifiers. These
modifiers can be applied to any class or interface. They cannot mix: it is
illegal for a value
class or interface to extend an identity
class or
interface, or vice versa.
A value
interface, then, is an interface whose instances are all value
objects, while an identity
interface is an interface whose instances are all
identity objects.
value interface JsonValue {
String toJsonString();
}
identity interface Counter {
int currentValue();
void increment();
}
Most interfaces are declared with neither modifier and are unconstrained. The
List
interface, for example, may be implemented by both identity and value
classes.
Abstract classes fall into a few different categories:
-
Any abstract class that declares a non-
final
instance field or asynchronized
instance method, or that was compiled for a previous Java version, is implicitly an identity class and may not be extended by a value class. -
An abstract class may also be explicitly declared an
identity
class. -
An abstract class may be explicitly declared a
value
class, preventing extension by identity classes. These classes are subject to the same constraints as concrete concrete value classes (but are notfinal
, of course). -
Otherwise, the abstract class is unconstrained and permits both kinds of subclasses. An unconstrained abstract class can declare
final
instance fields. As discussed in the next section, its constructor may need modification to allow value subclasses to invoke it.
The class Object
is special: as the superclass of all other classes, it must
be unconstrained. However, it is a concrete class, and calls to new Object()
continue to create direct identity object instances of the class (suitable,
e.g., as synchronization locks).
Constraints on value class constructors
The constructor of a value class is regulated, meaning that its body must not
make any use of this
, except to write to an instance field. This ensures a
value object does not "escape" to the rest of the program during construction.
value class Rational extends Number {
int num;
int denom;
public Rational(int numerator, int denominator) {
super();
if (denominator == 0)
throw new IllegalArgumentException();
// gcf method must be static
int factor = gcf(numerator, denominator);
int n = numerator/factor;
int d = denominator/factor;
this.num = n;
this.denom = d;
// Cannot refer to 'this' in logging
System.out.printf("%s/%s-->%s/%s%n",
numerator, denominator, n, d);
}
static int gcf(int num, denom) { ... }
...
}
Value objects can be thought of as being in a larval, "write only" state until construction is complete. They will never be observed to mutate, and will never participate in circular object graphs.
Constructors of other classes may be explicitly marked regulated
(modifier
subject to change) to impose the same restrictions. Any constructor invoked by
super()
or this()
from a regulated constructor must itself be regulated
,
so any superclass of a value class must declare at least one regulated
constructor.
public abstract class Number implements Serializable {
public regulated Number() { }
...
}
If a class doesn't declare a constructor, it gets a default constructor that
simply calls super()
. This constructor is usually regulated
and thus, in the
case of an unconstrained abstract class, can support value subclasses. (The
exception is when the super()
call invokes a non-regulated
constructor of
the superclass; then the default constructor is also non-regulated
.)
Migration of existing classes
Some classes in the standard library have been designated value-based, with the understanding that they would become value classes in a future release.
Under this JEP, when preview features are enabled, the following standard
library classes are considered to be value classes, despite not having been
declared or compiled with the value
modifier:
java.lang.Byte
java.lang.Short
java.lang.Integer
java.lang.Long
java.lang.Float
java.lang.Double
java.lang.Boolean
java.lang.Character
java.util.Optional
The migration of the classes used by boxing should significantly reduce
boxing-related overhead (although Long
and Double
may be too large for heap
flattening).
Developers are encouraged to identify and migrate value class candidates in
their own code, where appropriate. An existing class that meets the requirements
of a value class declaration may be migrated simply by applying the value
modifier. This is a binary compatible change.
There are some behavioral changes that users of migrated classes may notice:
-
The
==
operator may treat two instances as the same, where previously they were considered different (preferably, the existing class overridesequals
in a way that does not depend on identity) -
Attempts to synchronize on an instance will fail, either at compile time or run time
-
The results of
toString
,equals
, andhashCode
, if they haven't been overridden, may be different -
Assumptions about unique ownership of an instance may be violated (for example, an identical instance may be created at two different program points)
-
Performance will generally improve, but may have different characteristics that are surprising
Similarly, when preview features are enabled, the constructors of
java.lang.Object
, java.lang.Number
, and java.lang.Record
are considered to
be regulated
, despite not having been declared with that modifier.
Developers should scrutinize other existing abstract classes as potential value
class superclasses. This primarily involves ensuring that any declared
constructors are marked regulated
. This is straightforward for most existing
constructors; occasionally, a new compiler error will occur, but these can often
be worked around with a simple refactoring, such as storing a computed value
locally rather than reading it from a field.
Alternatives
As discussed, JVMs have long performed escape analysis to identify objects that never rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization, including storage in fields and arrays.
Hand-coded optimizations via primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.
The C language and its relatives support flattened storage for struct
s and
similar class-like abstractions. For example, the C# language has
value types.
Unlike value objects, instances of these abstractions have identity, meaning
they support operations such as field mutation. As a result, the semantics of
copying on assignment, invocation, etc., must be carefully specified, leading to
a more complex user model and less flexibility for runtime implementations. We
prefer an approach that leaves these low-level details to the discretion of JVM
implementations.
Risks and Assumptions
The feature makes significant changes to the Java object model. Developers may
be surprised by, or encounter bugs due to, changes in the behavior of operations
such as ==
and synchronized
. It will be important to validate that such
disruptions are rare and tractable.
Some changes could potentially affect the performance of identity objects. The
if_acmpeq
instruction, for example, typically only costs one instruction
cycle, but will now need an additional check to detect value objects. The
identity class case should be optimized as the fast path, and we will need to
minimize any performance regressions.
There is a security risk that ==
and hashCode
can indirectly expose
private
field values. Further, two large trees of value objects can take
unbounded time to compute ==
, potentially a DoS attack risk. Developers need
to understand these risks.
The restrictions on regulated
constructors may create problems for
instrumentation tools, such as those that inject code into the constructor of
java.lang.Object
. It may be necessary to provide workarounds to these tools.
Dependencies
In anticipation of this feature we already added warnings about potential
behavioral incompatibilities for value class candidates in javac
and HotSpot,
via JEP 390.
Null-Restricted Value Class Types (Preview) will build on this JEP, allowing programmers to manage the storage of nulls and enable more frequent and more dense heap flattening in fields and arrays.
Enhanced Primitive Boxing (Preview) will enhance the language's use of primitive types, taking advantage of the lighter-weight characteristics of boxing to value objects.
JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.
Statements before super() (Preview) clarifies the constraints imposed
in the pre-construction context of a constructor. These constraints are
similar to those imposed on the entire bodies of regulated
constructors.
The Class-File API (Preview) will need to track new modifiers and attributes defined by this JEP.