JEP draft: Value Objects (Preview)
Owner | Dan Smith |
Type | Feature |
Scope | SE |
Status | Closed / Withdrawn |
Discussion | valhalla dash dev at openjdk dot java dot net |
Effort | XL |
Duration | XL |
Relates to | JEP 401: Value Classes and Objects (Preview) |
Reviewed by | Brian Goetz |
Created | 2021/11/16 00:14 |
Updated | 2023/09/23 00:43 |
Issue | 8277163 |
Summary
Enhance the Java object model with value objects, class instances that have
only final
instance fields and lack object identity.
This is a preview language and VM feature.
Goals
This JEP provides for the declaration of identity-free value classes and
specifies the behavior of their instances, called value objects,
with respect to equality, synchronization, and other operations that
traditionally depend upon identity. To facilitate safe construction of value
objects, value classes make use of regulated
constructors.
Certain value-based classes in the standard library will become value classes when preview features are enabled.
At runtime, the HotSpot JVM will prefer inlining value objects where feasible. An inlined value object is encoded directly with its field values, avoiding any overhead from object headers, indirections, or heap allocation.
Non-Goals
This JEP allows for limited inlining of value objects in field and array
storage, but doesn't attempt to optimize the storage footprint by excluding
null
from the variable's value set. It also doesn't propose inlined storage
for variables whose encoding would exceed a natural atomic read/write size (such
as 64 bits). Improvements to value object storage will be pursued in a
separate JEP.
Values of the primitive types behave like value objects in many ways, but continue on as a distinct concept in the language model. Enhancements to the treatment of primitive types will be explored in Enhanced Primitive Boxing.
Future enhancements to the JVM are anticipated to support inlining of value objects within generic APIs. For now, generic APIs work with erased types and heap-allocated objects, as usual.
Motivation
Java's objects and classes offer powerful abstractions for representing data, including fields, methods, constructors, access control, and nominal subtyping. Every object also comes with identity, enabling features such as field mutation and locking.
Many classes don't take advantage of all of these features. In particular, a significant subset of classes don't have any use for identity—their field values can be permanently set on instantiation, their instances don't need to act as synchronization locks, and their preferred notion of equality makes no distinction between separately-allocated instances with matching field values. (For example, in the standard library, certain classes that discourage any dependency on identity have been designated value-based.)
At run time, support for identity can be expensive. It generally requires that an object's data be located at a particular memory location, packaged with metadata to support the full range of object functionality. As objects are shared between program components, data structures and garbage collectors end up with tangled, non-local webs of objects created at different times. Sometimes, JVM implementations can optimize around these constraints, but the resulting performance improvements can be unpredictable.
An alternative is to encode program data with primitive types. Primitive values
don't have identity, and so can be copied freely and encoded as compact bit
sequences. But programs that represent their data with primitive types give up
all the other abstractions provided by objects and classes. (For example, if a
geographic location is encoded as two float
s, there's no way to restrict the
valid range of values, keep matching pairs of float
s together, prevent
re-interpreting the values with the wrong units, or compatibly switch to a
double
-based encoding.)
Value classes provide programmers with a mechanism to opt out of object identity
without giving up the other features of Java classes. This avoids unwanted
degrees of freedom (like surprising ==
mismatches) and enables many of the
performance benefits of primitive types.
Description
The features described below are preview features, enabled with the
--enable-preview
compile-time and runtime flags.
Overview
A value object is a class instance that does not have identity. That is, a
value object does not have any particular memory address or any other property
to distinguish it from other instances of the same class whose fields have the
same values. Value objects cannot mutate their fields or be used for
synchronization. The ==
operator on value objects compares their fields. A
value class declaration introduces a class whose instances are value objects.
An identity object is a class instance or array that does have identity—the
traditional behavior of objects in Java. An identity object can mutate its
non-final
fields and is associated with a synchronization monitor. The ==
operator on identity objects compares their identities. An identity class
declaration—the default for a concrete class—introduces a class whose instances
are identity objects.
Value class declarations
A concrete class can be declared a value class with the value
contextual
keyword.
value class Substring implements CharSequence {
private String str;
private int start;
private int end;
public Substring(String str, int start, int end) {
checkBounds(start, end, str.length());
this.str = str;
this.start = start;
this.end = end;
}
public int length() {
return end - start;
}
public char charAt(int i) {
checkBounds(0, i, length());
return str.charAt(start + i);
}
public Substring subSequence(int s, int e) {
checkBounds(s, e, length());
return new Substring(str, start + s, start + e);
}
public String toString() {
return str.substring(start, end);
}
private static void checkBounds(int start, int end, int length) {
if (start < 0 || end < start || length < end)
throw new IndexOutOfBoundsException();
}
}
A concrete value
class declaration is subject to the following restrictions:
-
The class is implicitly
final
, so cannot be extended. -
All instance fields are implicitly
final
, so must be assigned exactly once by constructors or initializers, and cannot be assigned outside of a constructor or initializer. -
The class does not extend an
identity
class or anidentity
interface (see below). -
All constructors are implicitly
regulated
, as described later, limiting use ofthis
in constructor bodies. -
No instance methods are declared
synchronized
. -
(Possibly) The class does not declare a
finalize()
method.
In most other ways, a value class declaration is just like an identity
class declaration. It implicitly extends Object
if it has no explicit
superclass type. It can be an inner class. It can declare superinterfaces, type
parameters, member classes and interfaces, overloaded constructors, static
members, and the full range of access restrictions on its members.
A concrete class can be declared an identity class with the identity
contextual keyword. In the absence of the value
and identity
modifiers, a
concrete class (other than Object
) is implicitly an identity
class.
identity class Id1 {
int counter = 0;
void increment() { counter++; }
}
class Id2 { // implicitly 'identity'
synchronized void m() {}
}
The value
and identity
modifiers are supported by record classes. Records
are often good candidates to be value classes, because their fields are already
required to be final
.
value record Name(String first, String last) {
public String full() {
return "%s %s".formatted(first, last);
}
}
identity record Node(String label, Node next) {
public String list() {
return label + (next == null) ? "" : ", " + next.list();
}
}
Just like regular classes, identity
is the default modifier for record
classes.
Working with value objects
Value objects are created and operated on just like normal objects:
Substring s1 = new Substring("abc", 0, 2);
Substring s2 = null;
if (s1.length() == 2)
s2 = s1.subSequence(1, 2);
CharSequence cs = s2;
System.out.println(cs.toString()); // prints "b"
The ==
operator compares value objects of the same class in terms of their
field values, not object identity. Fields with primitive types are compared
by their bit patterns. Other field values—both identity and value objects—are
recursively compared with ==
.
assert new Substring("abc", 1, 2) == s2;
assert new Substring("abcd", 1, 2) != s2;
assert s1.subSequence(0, 2) == s1;
The equals
, hashCode
, and toString
methods, if inherited from Object
,
along with System.identityHashCode
, behave consistently with this definition
of equality.
Substring s3 = s1.subSequence(0, 2);
assert s1.equals(s3);
assert s1.hashCode() == s3.hashCode();
assert System.identityHashCode(s1) == System.identityHashCode(s3);
Synchronization is disallowed on value objects: the compiler disallows synchronization on any value class type, and attempting to synchronize on a value object at run time results in an exception.
Object obj = s1;
synchronized (obj) { } // IllegalMonitorStateException
Other low-level APIs that depend on identity, like java.lang.ref.Reference
,
will similarly either reject value objects or simulate identity using value
objects' field values.
Interfaces and Abstract Classes
By default, an interface may be implemented by both value classes and identity
classes. In a special case where the interface is only meant for one kind of
class or the other, the value
or identity
modifier can be used to declare
a value interface or an identity interface.
value interface JsonValue {
String toJsonString();
}
identity interface Counter {
int currentValue();
void increment();
}
It is an error for a value
class or interface to extend an identity
class or interface, or vice versa. This applies to both direct and indirect
superclasses and superinterfaces—e.g., an interface with no modifiers
may extend an identity
interface, but still its implementing classes must
not be value
classes.
Similarly, it is an error for any class or interface to implement,
either directly or indirectly, both a value
superclass or superinterface and
an identity
superclass or superinterface.
(To be a functional interface, compatible with lambda expressions, an
interface must allow for both value
and identity
implementations. This rule
avoids constraining the language runtime, and may be relaxed in the future.)
An abstract class can similarly be extended by both value classes and identity
classes by default, or can use the identity
or value
modifier to restrict
its subclasses. In addition, an abstract class that declares a non-final
instance field or a synchronized
instance method is implicitly an identity
class.
The class Object
is special. Despite being a concrete class, it is not an
identity class and supports both identity
and value
subclasses. However,
calls to new Object()
continue to create direct identity object instances of
the class (suitable, e.g., as synchronization locks).
regulated
Constructors
The regulated
keyword (name subject to change) indicates that a constructor
must not make any use of this
in its body, except to write to an instance
field. This is a useful property that ensures an object does not "leak" to the
rest of the program during construction.
Within the body of a regulated
constructor, any of the following are a
compiler error:
-
Reading an instance field of the class
-
Invoking an instance method of the class
-
Direct use of
this
to refer to the current class instance, except in an assignment of the formthis.f = expr
-
Use of
super
to access a field or method of the current class's superclass or superinterface -
Construction of an inner class with
this
as an implicit enclosing instance -
Any of the above occurring in an instance field initializer or instance initializer block of the class
-
An explicit constructor invocation (
super()
orthis()
) that invokes a non-regulated
constructor
Local and anonymous classes may be declared, but (as in a static
context) they
have no enclosing instance. Inner classes may refer to enclosing instances or
captured enclosing variables from their own regulated
constructors without
error.
These rules coincide with the restrictions imposed in a pre-construction context, as described by JEP 447, except that they allow for writes to instance fields.
Any constructor of any class may be marked regulated
. Value class constructors
are implicitly regulated
. The implicitly-declared constructor of a non-value
class is also regulated
, as long as the no-arg constructor of the superclass
is regulated
. (In extremely rare occasions, this rule may cause an existing
class to fail to compile, due to a use of this
in its initializers.) The
constructor of the class Object
is regulated
.
Because of the rule about super()
calls, an abstract class may not act as a
superclass of a value class unless it declares at least one regulated
constructor. Due to value classes' use of regulated
constructors, value
objects will never be observed to mutate, and will never participate in circular
object graphs. (A value object under construction is referred to as a "larval
value object", and is unusable for any other purpose.)
Migration of existing classes
If an existing concrete class meets the requirements of value class declarations, it may be declared as a value class without breaking binary compatibility.
There are some behavioral changes that users of the class may notice:
-
The
==
operator may treat two instances as the same, where previously they were considered different (preferably, the existing class overridesequals
in a way that does not depend on identity) -
Attempts to synchronize on an instance will fail, either at compile time or run time
-
The results of
toString
,equals
, andhashCode
, if they haven't been overridden, may be different -
Assumptions about unique ownership of an instance may be violated (for example, an identical instance may be created at two different program points)
-
Performance will generally improve, but may have different characteristics that are surprising
Developers are encouraged to identify and migrate value class candidates in their code, where appropriate.
Value Classes in the Standard Library
Some classes in the standard library have been designated value-based, with the understanding that they would become value classes in a future release.
Under this JEP, when preview features are enabled, the following standard
library classes are considered to be value classes, despite not having been
declared or compiled with the value
modifier:
java.lang.Byte
java.lang.Short
java.lang.Integer
java.lang.Long
java.lang.Float
java.lang.Double
java.lang.Boolean
java.lang.Character
java.util.Optional
class
file representation & interpretation
The identity
and value
modifiers are encoded in a class
file using the
ACC_IDENTITY
(0x0020
) and ACC_VALUE
(0x0040
) flags. In older-versioned
class
files, ACC_IDENTITY
is considered to be set in classes and unset in
interfaces.
(Historically, 0x0020
represented ACC_SUPER
, and all classes, but not
interfaces, were encouraged to set it. The flag is no longer meaningful, but
coincidentally will tend to match this implicit behavior.)
Format checking ensures that identity
and value
are not both set, and that
every class has at least one of identity
, value
, or abstract
set.
The regulated
modifier is encoded in a class
file using the ACC_REGULATED
flag (value TBD). Format checking ensures that this flag is only applied to
methods named <init>
.
Format checking fails if a value
class is not final
, has a non-final
instance field, has a synchronized
instance method, or declares an
non-regulated
<init>
method. Similarly, format checking fails if a
non-identity
abstract
class has a non-final
instance field or a
synchronized
instance method.
At class load time, superclasses and superinterfaces are checked for conflicting
identity
or value
modifiers; if a conflict is detected, the class fails to
load.
When verifying a regulated
<init>
method, the type uninitializedThis
is
not replaced with a standard class type after the super
/this
call.
Instead, references to this
have type uninitializedThis
throughout the
method body. The verifier also ensures that the constructor named by the
super
/this
call is regulated
.
A value class's type is represented using the usual L
descriptor
(LSubstring;
). To facilitate inlining optimizations, a Preload
attribute can
be provided by any class, communicating to the JVM that a set of referenced
CONSTANT_Class
entries should be eagerly loaded to locate potentially-useful
layout information.
Preload_attribute {
u2 attribute_name_index;
u4 attribute_length;
u2 number_of_classes;
u2 classes[number_of_classes];
}
Each class file generated by javac
includes a Preload
attribute naming any
concrete value class that appears in the descriptor of any declared or
referenced field or method.
The if_acmpeq
and if_acmpne
operations implement the ==
test for value
objects, as described above. The monitorenter
instruction throws an exception
if applied to a value object.
API & tool support
A new preview API method, java.util.Objects.isValueObject
, indicates whether
an object is a value object or an identity object. It always returns false
for
arrays and direct instances of the class Object
. (We may consider a similar
method as a member of class Object
.)
java.lang.reflect.Modifier
adds support for the identity
, value
, and
regulated
flags; these are also exposed via new isIdentity
and isValue
methods in java.lang.Class
, and isRegulated
in
java.lang.reflect.Constructor
.
java.lang.ref
recognizes value objects and treats them specially (details
TBD).
The java.lang.invoke.LambdaMetafactory
class rejects identity
and value
superinterfaces.
javax.lang.model
supports the identity
, value
, and regulated
modifiers.
The javadoc
tool surfaces the identity
, value
, and regulated
modifiers.
The class
file API JEP may need updates to support new
class
file features.
Performance model
Because value objects lack identity, JVMs may freely duplicate and re-encode them in an effort to improve computation time, memory footprint, and garbage collector performance.
Implementations are free to use different encodings in different contexts, such
as stack vs. heap, as long as the values of the objects' fields are preserved.
However, these encodings must account for the possibility of a null
value, and
must ensure that fields and arrays storing value objects are read and written
atomically.
In practice, this means that local variables, method parameters, and expression results can regularly use inline encodings, while fields and array components might only store small objects inline (e.g., with fields of 56 bits or less). Assignments to polymorphic supertypes will typically require heap allocation if it has been avoided to that point.
Inlining of value objects in stack code execution will tend to minimize heap allocations and garbage collection activities. Inlining of value objects in heap field and array storage will additionally reduce memory footprint and increase data locality.
Previously, JVMs have used similar optimization techniques to inline identity objects in local code when the JVM is able to prove that an object's identity is never used. Developers can expect more predictable and widespread optimizations for value objects.
HotSpot implementation
This section describes implementation details of this release of the HotSpot virtual machine, for the information of OpenJDK engineers. These details are subject to change in future releases and should not be assumed by users of HotSpot or other JVMs.
Value objects in HotSpot are encoded as follows:
-
In the interpreter and C1, value objects on the stack are encoded as regular heap objects.
-
In C2, value objects on the stack are typically scalarized when stored or passed with concrete value class types. Scalarization effectively encodes each field as a separate variable, with an additional variable encoding
null
; no heap allocation is needed. Methods with value-class-typed parameters support both a pointer-based entry point (for interpreter and C1 calls) and a scalarized entry point (for C2-to-C2 calls). Value objects are allocated on the heap when they need to be viewed as values of a supertype of the value class. -
Tentatively, in fields and arrays with a concrete value class type, for 64-bit builds of HotSpot, the variable stores a 64-bit word encoding a value object's field values and a boolean
null
flag. If the variable has a superclass or superinterface type, or the build uses 32-bit words, or the field values cannot fit in this encoding, the variable stores a regular heap object pointer.
Optimizations rely on the Preload
attribute to identify value class types at
preparation time. If a value class is not named by Preload
(for example, if
the class was an identity class at compile time), fields and methods may end up
using a heap object encoding instead. In the case of a method overriding
mismatch—a method and its super methods disagree about scalarization of a
particular type—the overriding method may dynamically force callers to de-opt
and use the pointer-based entry point.
To facilitate the special behavior of instructions like if_acmpeq
, value
objects in the heap are identified with a new flag in their object header.
Alternatives
JVMs have long performed escape analysis to identify objects that never rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization, including storage in fields and arrays.
Hand-coded optimizations via primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.
The C language and its relatives support inline storage for struct
s and
similar class-like abstractions. For example, the C# language has
value types.
Unlike value objects, instances of these abstractions have identity, meaning
they support operations such as field mutation. As a result, the semantics of
copying on assignment, invocation, etc., must be carefully specified, leading to
a more complex user model and less flexibility for runtime implementations. We
prefer an approach that leaves these low-level details to the discretion of JVM
implementations.
Risks and Assumptions
The feature makes significant changes to the Java object model. Developers may
be surprised by, or encounter bugs due to, changes in the behavior of operations
such as ==
and synchronized
. It will be important to validate that such
disruptions are rare and tractable.
Some changes could potentially affect the performance of identity objects. The
if_acmpeq
instruction, for example, typically only costs one instruction
cycle, but will now need an additional check to detect value objects. The
identity class case should be optimized as the fast path, and we will need to
minimize any performance regressions.
There is a security risk that ==
and hashCode
can indirectly expose
private
field values. Further, two large trees of value objects can take
unbounded time to compute ==
, potentially a DoS attack risk. Developers need
to understand these risks.
Dependencies
In anticipation of this feature we already added warnings about potential
incompatible changes to value class candidates in javac
and HotSpot, via
JEP 390.
Null-Restricted Value Object Storage (Preview) will build on this JEP, allowing programmers to manage nulls and atomicity, enabling additional optimizations for value objects stored in fields and arrays.
Enhanced Primitive Boxing (Preview) will enhance the language's use of primitive types, taking advantage of the lighter-weight characteristics of boxing to value objects.
JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.