JEP 401: Value Classes and Objects (Preview)
Owner | Dan Smith |
Type | Feature |
Scope | SE |
Status | Draft |
Component | specification |
Discussion | valhalla dash dev at openjdk dot java dot net |
Effort | XL |
Duration | XL |
Reviewed by | Brian Goetz |
Created | 2020/08/13 19:31 |
Updated | 2025/03/13 18:39 |
Issue | 8251554 |
Summary
Enhance the Java Platform with value objects, class instances that have
only final
fields and lack object identity.
This is a preview language and VM feature.
Goals
-
Allow developers to opt in to a programming model for simple values in which objects are distinguished solely by their field values, much as the
int
value3
is distinguished from theint
value4
. -
Migrate popular classes that represent simple values in the JDK, such as
Integer
, to this programming model. Support compatible migration of user-defined classes. -
Maximize the freedom of the JVM to encode simple values in ways that improve memory footprint, locality, and garbage collection efficiency.
Non-Goals
-
It is not a goal to introduce a
struct
feature in the Java language. Java continues to operate on just two kinds of data: primitives and objects. -
It is not a goal to change the treatment of primitive types. Primitive types behave like value classes in many ways, but are a distinct concept. A separate JEP will provide enhancements to make primitive types more class-like and compatible with generics.
-
It is not a goal to guarantee any particular optimization strategy or memory layout. This JEP enables many potential optimizations; only some will be implemented initially. Future JEPs will pursue optimizations related to
null
exclusion and generic specialization. -
It is not a goal to automatically treat existing classes as value classes, even if they meet the requirements for how value classes are declared and used. The behavioral changes require an explicit opt-in.
-
It is not a goal to "fix" the
==
operator so that programmers can use it in place ofequals
. This JEP redefines==
only as much as necessary to cope with a new kind of identity-free object. The usual advice to compare objects in most contexts using theequals
method still applies.
Motivation
Java developers often need to represent simple domain values: the shipping
address of an order, a log entry from an application, and so on. To do this,
developers typically declare classes whose main purpose is to "wrap" data,
stored in final
fields. For example, a simple RGB color value could be
represented with a record, whose fields are
final
by default:
var orange = new Color(237, 139, 0);
var blue = new Color(0, 115, 150);
record Color(byte red, byte green, byte blue) {
public Color(int r, int g, int b) {
this(checkByte(r), checkByte(g), checkByte(b));
}
private static byte checkByte(int x) {
if (x < 0 || x > 255) throw new IllegalArgumentException();
return (byte) (x & 0xff);
}
// Provided automatically: red(), green(), blue(),
// toString(), equals(Object), hashCode()
public Color mix(Color that) {
return new Color(avg(red, that.red),
avg(green, that.green),
avg(blue, that.blue));
}
private static byte avg(byte b1, byte b2) {
return (byte) (((b1 & 0xff) + (b2 & 0xff)) / 2);
}
}
Developers will regard the "essence" of a Color
object as a red-green-blue
triple, but to Java, the essence of an object is its identity. Each execution
of new Color(...)
creates an object with a unique identity, making it
distinguishable from every other object in the system. An object's identity
means that developers can share references to an object between different parts
of a program, and changes to an object's fields in one part of the program can
be observed in other parts.
Object identity is problematic for simple domain values
Object identity is at best irrelevant and at worst harmful to simple domain values.
The ==
operator can be used to compare object identities.
While normal programs should avoid using this operator, those that do so will
observe that two objects with the same "essence" may have distinct identities.
For example, two Color
objects that represent the same red-green-blue triple
are not ==
if they were created by different executions of new Color(...)
.
This inconsistency is a frequent source of confusion for developers.
var c = new Color(255, 0, 0);
var d = c.mix(c); // creates a new Color for the same red-green-blue triple
if (c == d) ... // false, even though c.equals(d)
Confusion around ==
for objects is so widespread that Java gives special
treatment to objects of fundamental classes:
-
String literals are interned automatically. This means that a string literal with a given character sequence always produces the same
String
object, no matter where the string literal is used. For example, givenString s = "hello";
andString t = "hello";
, only oneString
object for"hello"
is created, sos == t
is true. -
Small integer literals are autoboxed in a predictable way. This means that a given integer literal always produces the same
Integer
object, no matter where the integer literal is used. For example, givenInteger x = 5;
andInteger y = 5;
, only oneInteger
object for5
is created, sox == y
is true.
This special treatment minimizes the role of object identity for string literals
and integer literals, but fails to address the confusion around ==
for
strings and integers in general. The
most viewed Java question on StackOverflow
concerns the use of ==
with String
objects, and
another high-visibility question
concerns the use of ==
with Integer
objects.
This sort of confusion could be avoided for simple domain values if the language did not insist that separately-created objects with the same "essence" have distinct identities.
Object identity is expensive at run time
Java's requirement that every object has identity, even if simple domain values don't want it, means worse performance. Typically, the JVM has to allocate memory for each newly created object, distinguishing it from every object already in the system, and reference that memory location whenever the object is used or stored. This causes the garbage collector to work harder, taking cycles away from the application, and it means worse locality of reference—for example, an array may refer to objects scattered around memory, frustrating the CPU cache as the program iterates over the array.
Modern JVMs have an optimization called
escape analysis
that can mitigate these performance concerns. For example, instead of
allocating memory for a Color x
with three byte
fields, the JVM can pass
the three byte
values around the program directly. An inlined call to
x.mix(...)
could run without any memory being allocated, even though the mix
method performs new Color(...)
. This optimization is valid as long as the
code never depends on the identity of the object in question. Unfortunately, if
the program performs an identity-sensitive operation such as x == y
, or if the
object might "escape" into code that the optimization can't observe, the
optimization must be unraveled.
In some application domains, developers routinely program for speed by creating
as few objects as possible, thus de-stressing the garbage collector and
improving locality. For example, they might encode their RGB colors as three
byte
values rather than as Color
objects. Unfortunately, this approach gives
up the functionality of classes that makes Java code so maintainable:
meaningful names, private state, data validation by constructors, convenience
methods, etc. A developer operating on colors represented as byte
values might
accidentally interpret the bits with a BGR encoding, swapping the red and blue
components and corrupting the resulting image.
Programming without identity
Trillions of Java objects are created every day, each one bearing a unique
identity. We believe the time has come to let Java developers choose which
objects in the program need identity, and which do not. A class like Color
that represents simple domain values could opt out of identity, so that there
would never be two distinct Color
objects representing the HTML
color purple, just as there are never two distinct int
values that both
represent the number 4
.
By opting out of identity, developers are opting in to a programming model that provides the best of both worlds: the abstraction of classes with the simplicity and performance benefits of primitives.
Important classes in the JDK, such as the wrapper classes used for boxing, are
already designed to be "value-based", meaning they discourage
depending on the identity of instances. With this JEP, these classes can opt
out of identity entirely. For example, in the case of the class Integer
,
instances will have no identity, ==
will compare all Integer
objects by
value, and the run-time overhead of the Integer
type can dramatically shrink.
Even when stored in arrays, Integer[]
can approach the efficiency of int[]
.
Description
A value object is an object that does not have identity. A value object is an
instance of a value class. Two value objects are the same according to ==
if they have the same field values, regardless of when or how they were
created. Two variables of a value class type may hold references that point to
different memory locations, but refer to the same value object—much like two
variables of type int
may hold the same int
value.
An identity object is an object that does have identity—a unique property
associated with the object when it is created. Prior to value classes, every
object in Java was an identity object. Two identity objects are the same
according to ==
if they have the same identity. Two variables of an identity
class type refer to the same identity object only if they hold references
pointing to the same memory location.
At run time, the use of value objects may be optimized in ways that are difficult or impossible for identity objects. This is because value objects, untethered from any canonical memory location, can be duplicated, re-encoded, or re-used whenever it is convenient for the JVM to do so. This freedom allows for smaller memory footprint, fewer memory allocations, and better data locality.
Existing classes that represent simple domain values and that have followed best
practices to avoid identity dependencies can be easily migrated to be value
classes, with minimal compatibility impact. This JEP migrates a handful of
commonly-used classes in the Java Platform, including the primitive wrapper
classes such as Integer
.
Enabling preview features
Value classes are a preview language feature, disabled by default.
To try the examples below in JDK NN you must enable preview features:
-
Compile the program with
javac --release NN --enable-preview Main.java
and run it withjava --enable-preview Main
; or, -
When using the source code launcher, run the program with
java --enable-preview Main.java
; or, -
When using jshell, start it with
jshell --enable-preview
.
Programming with value objects
Programs create value objects by instantiating a class that has been declared
with the value
modifier. In most respects, value objects behave just like any
other object, but there are some special behaviors that programmers should be
aware of.
Value classes
A class that has no need for identity-related features can opt out of those
features with the value
modifier. Classes with the value
modifier
are value classes; classes without the modifier are identity classes.
The Color
record introduced earlier could be declared a value record. Nothing
else about the declaration changes.
value record Color(byte red, byte green, byte blue) {
public Color(int r, int g, int b) {
this(checkByte(r), checkByte(g), checkByte(b));
}
private static byte checkByte(int x) {
if (x < 0 || x > 255) throw new IllegalArgumentException();
return (byte) (x & 0xff);
}
public Color mix(Color that) {
return new Color(avg(red, that.red),
avg(green, that.green),
avg(blue, that.blue));
}
private static byte avg(byte b1, byte b2) {
return (byte) (((b1 & 0xff) + (b2 & 0xff)) / 2);
}
}
A simple class representing US dollar currency values (to two decimal places) might also be a good value class candidate. In this case, the author might prefer to declare a regular (non-record) class to more closely control the internal state. But because the class does not depend on identity-sensitive features like unique instance creation, field mutation, or synchronization, it can be declared a value class.
value class USDCurrency implements Comparable<USDCurrency> {
private int cs; // implicitly final
private USDCurrency(int cs) { this.cs = cs; }
public USDCurrency(int dollars, int cents) {
this(dollars * 100 + (dollars < 0 ? -cents : cents));
}
public int dollars() { return cs/100; }
public int cents() { return Math.abs(cs%100); }
public USDCurrency plus(USDCurrency that) {
return new USDCurrency(cs + that.cs);
}
public int compareTo(USDollars that) { ... }
public String toString() { ... }
}
The instance fields of a value class are implicitly final
. (Special rules
apply to the initialization of value class fields in constructors, as described
later.) The instance methods of a value class must not be synchronized
.
Many abstract classes have no need for identity-related features and so are also
good value class candidates. The class java.lang.Number
, for example, has no
fields, nor any code that depends on identity-sensitive features.
abstract value class Number implements Serializable {
public abstract int intValue();
public abstract long longValue();
public byte byteValue() { return (byte) intValue(); }
...
}
The following rules apply to subclassing relationships involving value classes:
-
A concrete value class is implicitly
final
and may have no subclasses. -
An abstract value class has chosen not to depend on identity, but this choice does not constrain its subclasses: the abstract class may have both value and identity subclasses. (And so a variable of the abstract value class type may or may not refer to a value object.)
-
Identity classes may only be extended by other identity classes. Once a class has expressed a dependency on object identity, its subclasses cannot undo this dependency. (Thus, a variable of an identity class type always refers to an identity object.)
-
Interfaces may be extended by both value and identity classes, and have no way to express a dependency on object identity.
-
The class
Object
, which sits at the top of the class hierarchy, is considered an identity class and has identity instances, but in most respects behaves more like an interface and permits value subclasses.
Beyond the constraints outlined in this section, a value class declaration is just like any other class declaration. The class can declare methods and implement interfaces. Users of the class will not typically notice anything unusual about the class—aside from identity-sensitive behaviors, everything about the objects is the same.
// value objects are created with 'new'
USDCurrency d1 = new USDCurrency(100,25);
// value class types may be 'null'
USDCurrency d2 = null;
// method invocations work as usual
if (d1.dollars() >= 100)
d2 = d1.plus(new USDCurrency(-100,0));
// objects can be viewed as superclass instances
Object o = d2;
String s = o.toString(); // "$0.25"
// objects can be viewed as interface instances
Comparable<USDCurrency> c = d2;
int i = c.compareTo(d1); // -1
References between objects
Value class types are reference types. In Java, any code that operates on an
object is really operating on a reference to that object; member accesses
must resolve the reference to locate the object (throwing an exception in the
case of a null
reference). Value objects are no different in this respect.
It might seem odd to talk about references to objects that have no identity, since it is natural to think of an object's memory address as the run time representation of its identity. Indeed, stable memory addresses are not essential for value objects, and JVM implementations will often try to optimize away any indirections to the object data. However, when reasoning about a Java program, it's best to imagine all objects continuing to be handled and operated on via references.
Objects can store references to other objects in their fields, creating complex
relationship graphs. There is no restriction on the types of references between
value and identity objects. The following value class, for example, stores one
reference to an identity object and two references to value objects. The third
field, predecessor
, recursively references another object of the same value
class type (or stores null
).
value class Item {
private String name; // identity class type
private USDCurrency cost; // value class type
private Item predecessor; // this value class type
public Item(String n, USDCurrency c) {
this(n, c, null);
}
public Item(String n, USDCurrency c, Item p) {
...
}
...
}
There is, however, one important limitation on references between objects: due
to value classes' construction requirements (covered later), when a value
object's fields are initialized, they cannot refer back to the object itself.
So it is impossible, for example, to create an Item
whose predecessor
is
that same Item
.
More generally, the instance fields of a value object can never be used to
create a cycle—at least one object in any cycle would have to be an identity
object.
Comparing value objects with ==
The ==
operator traditionally tests whether two references are the same. But
this capability depends on object identity: only identity objects can be
reliably referenced at a stable location.
With the introduction of value objects, the ==
operator must instead test
whether two referenced objects are the same—that is, one is "substitutable"
for the other. For identity objects, this is just a different way of describing
the same test. But in the case of value objects, this means testing that the
objects, wherever located, represent the same value. The result is true
if
the objects being compared belong to the same class and have the same field
values, and false
otherwise. (Fields with primitive types are compared by
their bit patterns. Other field values—both identity and value objects—are
recursively compared with ==
.)
// value objects with the same field values are the same
USDCurrency d1 = new USDCurrency(3,95);
USDCurrency d2 = new USDCurrency(3,95).plus(new USDCurrency(0,0));
assert d1 == d2; // true
// objects are still the same when viewed as supertypes
Object o1 = d1;
Object o2 = d2;
assert o1 == o2; // true
// identity objects are unique when created separately
String s1 = "hamburger";
String s2 = new String(s1); // new identity
assert s1 != s2; // true
// == recursively compares identity object fields
assert new Item(s1, d1) != new Item(s2, d1); // true
// == recursively compares value object fields
assert new Item(s1, d1) == new Item(s1, d2); // true
Notice three things about the recursive use of ==
:
-
Recursion on identity objects does not perform a "deep" equality test. It compares identities. The referenced identity object may even be mutated—by, say, adding a value to a referenced
List
—but if two value objects are==
, the nested mutation would not impact the==
test. -
Recursion on value objects does perform a deep comparison of the nested objects' fields. The resulting number of comparisons is unbounded: if an
Item
has apredecessor
, and thatItem
has apredecessor
, and so on, using==
on theItem
may require a full traversal of the chain of references. (Fortunately, as noted in the previous section, this chain will never be cyclical.) -
The ability to compare value objects' fields means that a value object's
private
data is a little more exposed than it might be in an identity object: someone who wants to determine a value object's field values can (with sufficient time and access) guess at those values, create a new object wrapping their guess, and use==
to test whether the guess was correct.
When declaring a value class, it's important to keep each of these factors in mind. In some cases, an identity class may be a better fit.
The equals
method
While ==
tests whether two value objects are the same object, the equals
method tests whether two objects represent the same data. As for identity
classes, two value objects may be !=
, but still be considered by the class
author to be equal.
// distinct identity objects may be 'equals'
String s1 = "hamburger";
String s2 = new String(s1); // new identity
assert s1 != s2; // true
assert s1.equals(s2); // true
// distinct value objects may be 'equals'
assert new Item(s1, d1) != new Item(s2, d1); // true
assert new Item(s1, d1).equals(new Item(s2, d1)); // should be true
The problem of defining what constitutes "the same data" is left to the class
author when they implement their equals
method. For convenience, the default
Object.equals
implementation aligns with ==
, testing whether two objects
are the same; for simple value classes, this is often good enough. Value
records are able to provide an even more convenient default implementation,
comparing record components recursively with equals
. But these are just
starting points, and it's ultimately up to the class author to provide an
appropriate equals
implementation.
When thinking about equals
and ==
, its important to remember that a value
object's internal state (the data it stores) is not always the same as
its external state (the data it represents). An ==
test compares internal
state. This is often not what you're after. Instead, the best advice for
developers in most cases is to use equals
whenever they need to compare
objects.
In the following example, the value class Substring
implements CharSequence
.
A Substring
represents a string lazily, without allocating a char[]
in
memory. Naturally, then, two Substring
objects should be considered equal
if they represent the same string, regardless of differences in their internal
state.
value class Substring implements CharSequence {
private String str;
private int start, end;
public int length() {
return end - start;
}
public char charAt(int i) {
return str.charAt(start + i);
}
public String toString() {
return str.substring(start, end);
}
public boolean equals(Object o) {
return o instanceof Substring && toString().equals(o.toString());
}
}
Substring s1 = new Substring("ionization", 0, 3);
Substring s2 = new Substring("ionization", 7, 10);
assert s1 != s2; // true
assert s1.equals(s2); // true
The distinction between internal state and external state helps to explain why not all value classes are records, and not all records are value classes: records are used to opt out of separate internal state, while value classes are used to opt out of identity. Each of these choices can be made orthogonally.
Other identity-sensitive operations
In addition to ==
, a handful of specialized operations supported by the Java
platform have historically relied on object identity. When encountering a value
object, these operations behave as follows:
-
System.identityHashCode
: The "identity hash code" of a value object is computed by combining the hash codes of the value object's fields. The default implementation ofObject.hashCode
continues to return the same value asidentityHashCode
. (Note that, like==
, this hash code exposes information about a value object'sprivate
fields that might otherwise be hidden by an identity object. Developers should be cautious about storing sensitive secrets in value object fields.) -
Synchronization: Value objects do not have synchronization monitors. At compile time, the operand of a
synchronized
statement must not have a concrete value class type. At run time, if an attempt is made to synchronize on a value object (for example, where the operand of asynchronized
statement has typeObject
), anIdentityException
is thrown. Invocations of thewait
andnotify
methods ofObject
will similarly fail at run time, because they require callers to first synchronize on the object's monitor. -
Garbage collection: Value objects do not have a traditional life cycle—an object may already exist before
new
, and may appear again after it becomes unreachable. So operations that manage the end of an object's lifetime are not relevant to value objects. A garbage collector will never call thefinalize
method of a value object. The classes ofjava.lang.ref
throw anIdentityException
when asked to wrap or operate on a value object.
For developers who need to dynamically require identity in their own code, an
IdentityException
may be thrown, and the java.util.Objects
class provides
convenience methods hasIdentity
and requireIdentity
.
Safe construction
Constructors initialize newly-created objects, including setting the values of the objects' fields. Because value objects do not have identity, their initialization requires special care.
Larval object leakage
An object being constructed is "larval"—it has been created, but it is not yet fully-formed. Larval objects must be handled carefully, because the expected properties and invariants of the object may not yet hold.
For example, in the following class, name
is expected to hold a valid
String
, and length
is expected to hold the length of that string.
But when the constructor begins, the larval object's name
field is null
;
immediately after name
gets set, the larval object's length
is incorrect.
class Name {
final String name;
final int length;
Name(String n) {
name = n;
length = computeLength();
}
int computeLength() {
return name.length();
}
}
Notice that the computeLength
method is asked to run with a larval object as
a receiver.
The larval object has "leaked" out of the constructor and might be expected to
behave like a fully-initialized Name
.
Fortunately, the larval object's name
field has already been set, and the
computeLength
method doesn't depend on the length
field, so an appropriate
value is returned.
But if, say, the fields were initialized in the opposite order, an exception
would occur.
Also notice that the name
and length
fields are marked final
—yet despite
this modifier, if a larval Name
is leaked to unsuspecting code, that code may
be surprised to observe these final
fields mutating!
In a toy example, these risks may seem minor. But in a complex initialization process involving multiple constructors and class hierarchies that span maintenance domains, larval object leakage can become a singificant risk to correctness and security.
Early & late construction
Traditionally, a constructor begins the initialization process by invoking a
superclass constructor, super(...)
.
After the superclass returns, the subclass then proceeds to set its declared
instance fields and perform other initialization tasks.
This pattern exposes a completely uninitialized subclass to any larval object
leakage occurring in a superclass constructor.
The Flexible Constructor Bodies preview feature enables an
alternative approach to initialization, in which fields can be set and other
code executed before the super(...)
invocation.
There is a two-phase initialization process: early construction before
the super(...)
invocation, and late construction afterwards.
class Name {
final String name;
final int length;
Name(String n) {
// early construction:
name = n;
super();
// late construction:
length = computeLength();
}
int computeLength() {
return name.length();
}
}
During the early construction phase, larval object leakage is impossible: the
constructor may set the fields of this
, but may not invoke instance methods
or otherwise make use of this
.
Fields that are initialized early are set before they can ever be read, even if
a superclass leaks the larval object.
Final fields, in particular, can never be observed to mutate.
Value object initialization
Early initialization of instance fields is mandatory for value classes.
Value objects lack identity, so there is no canonical memory location in which the late mutation of a field could be observed. JVM implementations need to be free to make copies of value objects—including leaked larval value objects—whenever it is convenient for them to do so. Thus, the values of the object's fields must be provided early, before there is any risk of larval object leakage.
To facilitate early initialization of fields, construction code in value classes prefers early construction wherever possible:
-
If a value class constructor has no
super(...)
orthis(...)
call, an implicitsuper()
call is placed at the end of the constructor body, and the entire body is part of early construction (in identity classes, the implicitsuper()
is placed at the start) -
Each value class instance field initializer is placed at the start of the constructor, as part of early construction (in identity classes, instance field initializers run after the
super(...)
call) -
Instance initializer blocks (a rarely-used feature) continue to run in the late phase, and so may assign to value class instance fields
-
For convenience, fields that have been assigned may be read by subsequent early construction code (that is, early construction may freely access the fields of
this
, but may not invoke instance methods or sharethis
with other code)
In practice, value class authors may notice errors when they attempt to use
this
in a value class constructor or field initializer.
These errors can be addressed by either (i) refactoring the code so that it no
longer depends on this
, or (ii) placing the code that depends on this
after
an explicit super(...)
call in a constructor.
In the following example, the Name
class has become a value class, and the
author has eliminated the dependency on this
by making the computeLength
method static
and passing the input string as an argument.
value class Name {
String name;
int length;
Name(String n) {
// early construction:
name = n;
length = computeLength(name);
}
static int computeLength(String n) {
return n.length();
}
}
Encouraging early initialization of identity classes
Ultimately, we think developers should shift as much of their construction code as possible to the early phase. This is especially important for value classes, but many identity classes would also benefit.
In the future, we anticipate that identity classes will have a way to adopt the
constructor timing of value classes: field initializers run first, in the early
phase, and implicit super()
calls run last.
(Unlike value classes, identity classes would not be required to initialize all
of their fields before an explicit super(...)
call.)
In the mean time, for this JEP javac
provides lint
warnings indicating
this
dependencies in instance field initializers and implicit-super()
constructors of identity classes.
These warnings can be addressed, as for value classes, by (i) refactoring the
code so that it no longer depends on this
, or (ii) placing the code that
depends on this
after an explicit super(...)
call in a constructor.
A class that compiles without warning will likely be able to cleanly transition
to the constructor timing of value classes in the future.
(If there are indirect timing dependencies between a subclass and a
superclass—say both classes must interact with a mutable static field in a
specific order—javac
will not warn about that dependency, but as a best
practice, the class author should place code that must run late after an
explicit super(...)
call.)
As a special case, in an identity record class, a constructor dependency on
this
is likely a bug, and this JEP specifies either an error or a mandatory
warning (TBD) to address the issue.
The constraints on record constructors are relaxed so that a constructor can use
an explicit super()
call to indicate code that must run in the late phase.
The enhancement that allows fields to be read during early construction applies to both value classes and identity classes.
Run-time optimizations for value objects
Because there is no need to preserve identity, Java Virtual Machine implementations have a lot of freedom to encode value objects at run time in ways that optimize memory footprint, locality, and garbage collection efficiency. Optimization techniques will typically duplicate, re-encode, or re-use value objects to achieve these goals. Re-encoding might be useful, for example, to copy a value object into a variable that requires fewer memory loads to access the object's data.
This section describes abstractly some of the JVM optimization techniques implemented by HotSpot. It is not comprehensive or prescriptive, but offers a taste of how value objects enable improved performance.
Value object scalarization
Scalarization is one important optimization enabled by the lack of identity. A scalarized reference to a value object is reduced to its "essence", a set of the object's field values without any enclosing container. A scalarized object is essentially "free" at run time, having no impact on the normal object allocation and garbage collection processes.
In HotSpot, scalarization is a JIT compilation technique, affecting the representation of references to value objects in the bodies and signatures of JIT-compiled methods.
The following illustrates how the JIT compiler might translate the Color.mix
method to scalarize its input and output. The "essence" of a Color
reference
is 3 bytes, r
, g
, and b
, along with a boolean to indicate whether the
reference is null
—in which case the other 3 bytes can be ignored. (In this
pseudocode, the notation { ... }
refers to a vector of multiple values that
can be returned from a scalarized method. Importantly, this is purely
notational: there is no wrapper at run time.)
// original method:
public Color mix(Color that) {
return new Color(avg(red, that.red),
avg(green, that.green),
avg(blue, that.blue));
}
// effectively:
static { boolean, byte, byte, byte }
$mix(boolean this_null, byte this_r,
byte this_g, byte this_b,
boolean that_null, byte that_r,
byte that_g, byte that_b) {
$nullCheck(this_null);
$nullCheck(that_null);
return { false,
avg(this_r, that_r),
avg(this_g, that_g),
avg(this_b, that_b) };
}
// original invocation:
new Color(237, 139, 0).mix(new Color(0, 0, 0));
// effectively:
$mix(false, 237, 139, 0, false, 0, 0, 0);
JVMs have used similar techniques to scalarize identity objects in local code when the JVM is able to prove that an object's identity is never used. But scalarization of value objects is more predictable and far-reaching, even across non-inlinable method invocation boundaries.
One limitation of scalarization is that it is not typically applied to a
variable with a type that is a supertype of a value class type. Notably, this
includes method parameters of generic code whose erased type is Object
.
Instead, when an assignment to a supertype occurs, a scalarized value object
must be converted to an ordinary heap object encoding. But this allocation
occurs only when necessary, and as late as possible.
Value object heap flattening
Heap flattening is another important optimization enabled by value objects' lack of identity. The "essence" of a reference to a value object is encoded as a compact bit vector, without any pointer to a different memory location. This bit vector can then be stored directly in heap storage, in a field or an array of a value class type.
Heap flattening is useful because a flattened value object requires less memory than an ordinary object on the heap, and because the data is stored locally, avoiding expensive cache misses. These benefits can significantly improve some programs' memory footprint and execution time.
To illustrate, an array of Color
references could directly store 32-bit
encodings of the referenced objects. Note that, as for scalarization, an extra
flag is needed to keep track of null
references.
// original code:
Color[] cs = new Color[100];
cs[5] = new Color(237, 139, 0);
Color c1 = cs[5];
Color c2 = cs[6];
// effectively:
int[] cs = new int[100];
cs[5] = $flatten(false, 237, 139, 0);
{ boolean c1_null, byte c1_r, byte c1_g, byte c1_b } =
$inflate(cs[5]);
{ boolean c2_null, byte c2_r, byte c2_g, byte c2_b } =
$inflate(cs[6]);
// where:
int $flatten(boolean val_null, byte val_r,
byte val_g, byte val_b) {
if (val_null) return 0;
else return (1 << 24) | (val_r & 0xff << 16) |
(val_g & 0xff << 8) | (val_b & 0xff);
}
{ boolean, byte, byte, byte } $inflate(int vector) {
if (vector == 0) return { true, 0, 0, 0 };
else return { false,
vector >> 16 & 0xff,
vector >> 8 & 0xff,
vector & 0xff };
}
The details of heap flattening will vary, of course, at the discretion of the JVM implementation.
Heap flattening must maintain the integrity of objects. For example, the
flattened data must be small enough to read and write atomically, or else it may
become corrupted. On common platforms, "small enough" may mean as few as 64
bits, including the null flag. So while many small value classes can be
flattened, classes that declare, say, 2 int
fields or a double
field, might
have to be encoded as ordinary heap objects.
In the future, 128-bit flattened encodings may be possible on platforms that support atomic reads and writes of that size. And the Null-Restricted Value Types JEP will enable heap flattening for even larger value classes in use cases that are willing to opt out of atomicity guarantees.
Migration of existing classes
Existing classes that represent simple domain values and that have followed best practices to avoid identity dependencies can be easily migrated to be value classes, with minimal compatibility impact. When preview features are enabled, a handful of commonly-used classes in the JDK, outlined below, are migrated to be value classes.
Preparing for migration
Developers are encouraged to identify and eventually migrate value class candidates in their own code. Records and other classes that represent "simple domain values" are potential candidates, along with interface-like abstract classes.
The author of an identity class that is intended to become a value class in a future release should consider the following:
-
On migration, all instance fields of the class will implicitly be made
final
and will need to be initialized without any reference tothis
. If that presents difficulties, the class may not be be a good migration candidate. If there are any non-private
, non-final
fields, the change will need to be coordinated with any users who might attempt to mutate the fields. -
Similarly, a concrete, non-
final
class will becomefinal
on migration. If users have been allowed to both extend and create instances of the class, the author must choose to either break subclasses (by addingfinal
), break instance creations (by addingabstract
along with, say, factory methods and a private implementation class), or conclude that the class is not a good migration candidate. -
The
equals
andhashCode
methods should be overridden by the class so that their results are consistent before and after migration. -
Users of the class will be able to observe different
==
behavior after migration. If this is a concern, an ideal migration candidate might declare private constructors and provide a factory method that explicitly advertises the possibility of results that are==
to a previous result. (See, for example, theInteger.valueOf
factory method.) -
As described in previous sections, the
==
andidentityHashCode
operations may allow users to guess or infer the values ofprivate
fields, and may be noticeably slow for value objects that (probably recursively) encode very large structures. If these are concerns for the class, it may not be a good migration candidate. -
Attempts to synchronize on instances or use the
java.lang.ref
API will fail after migration. Of course, the class itself should not declaresynchronized
methods or otherwise use these features. There's not much that can be done to prevent users from doing so, but it may be helpful to advertise the risk in the class's documentation. -
If the superclass is not
Object
, it must be made a value class before this class can be migrated. All of the considerations in this section apply to the superclass.
Impact of migration
In most respects, an identity class that has addressed the risks outlined in the
previous section can be compatibly made a value class by simply adding the
value
modifier.
All existing binaries will continue to link successfully. The only new compiler errors will be attempts to synchronize on the value class type.
There are some behavioral changes that users of the migrated classes may notice:
-
The
==
operator may treat two instances as the same, where previously they were considered different -
Attempts to synchronize on an instance or use the
java.lang.ref
API will fail with an exception -
Assumptions about unique ownership of an instance may be violated (for example, an identical instance may be created at two different program points)
-
Performance will generally improve, but may have different characteristics that are surprising
Value classes in the standard library
Some classes in the standard library have been designated value-based, with the understanding that they would become value classes in a future release.
Under this JEP, when preview features are enabled, the following standard
library classes are considered to be value classes, despite not having been
declared or compiled with the value
modifier:
java.lang.Number
and the 8 primitive wrapper classes used for boxingjava.lang.Record
java.util.Optional
,java.util.OptionalInt
, etc.- Most of the public classes of
java.time
, includingjava.time.LocalDate
andjava.time.ZonedDateTime
The migration of the primitive wrapper classes should significantly reduce boxing-related overhead.
Alternatives
As discussed, JVMs have long performed escape analysis to identify objects that never rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization, including storage in fields and arrays.
Hand-coded optimizations via primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.
The C language and its relatives support flattened storage for struct
s and
similar class-like abstractions. For example, the C# language has
value types.
Unlike value objects, instances of these abstractions have identity, meaning
they support operations such as field mutation. As a result, the semantics of
copying on assignment, invocation, etc., must be carefully specified, leading to
a more complex user model and less flexibility for runtime implementations. We
prefer an approach that leaves these low-level details to the discretion of JVM
implementations.
Risks and Assumptions
The feature makes significant changes to the Java object model. Developers may
be surprised by, or encounter bugs due to, changes in the behavior of operations
such as ==
and synchronized
. We expect such disruptions to be rare and
tractable.
Some changes could potentially affect the performance of identity objects. The
if_acmpeq
test, for example, typically only costs one instruction
cycle, but will now need an additional check to detect value objects. But the
identity class case can be optimized as a fast path, and we believe we have
minimized any performance regressions.
There is a security risk that ==
and hashCode
can indirectly expose
private
field values. Further, two large trees of value objects can take
unbounded time to compute ==
, potentially a DoS attack risk. Developers need
to understand these risks.
Dependencies
Prerequisites:
-
In anticipation of this feature we already added warnings about potential behavioral incompatibilities for value class candidates in
javac
and HotSpot, via Warnings for Value-Based Classes and Warnings for Identity-Sensitive Libraries -
Flexible Constructor Bodies (Third Preview) allows constructors to execute statements before a
super(...)
call and allows assignments to instance fields in this context. These changes facilitate the construction protocol required by value classes. -
Strict Field Initialization in the JVM (Preview) provides the JVM mechanism necessary to require, through verification, that value class instance fields are initialized during early construction
Future work:
-
Null-Restricted Value Class Types (Preview) will build on this JEP, allowing programmers to manage the storage of nulls and enable more dense heap flattening in fields and arrays.
-
Enhanced Primitive Boxing (Preview) will enhance the language's use of primitive types, taking advantage of the lighter-weight characteristics of boxing to value objects.
-
JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.