JEP draft: Null-Restricted Value Class Types (Preview)
Owner | Dan Smith |
Type | Feature |
Scope | SE |
Status | Draft |
Discussion | valhalla dash dev at openjdk dot org |
Effort | XL |
Duration | XL |
Created | 2023/09/22 23:57 |
Updated | 2024/09/24 22:01 |
Issue | 8316779 |
Summary
Allow the type of a variable storing value objects to exclude null
, enabling
more compact storage and other optimizations at run time.
This is a preview language and VM feature.
Goals
-
Introduce a new kind of type for value classes that excludes
null
from its value set, much like primitive types that cannot benull
. -
Allow a value class to "opt in" to the automatic creation of an appropriate default value used to initialize fields and arrays that don't store
null
. -
Allow larger value classes to further "opt in" to non-atomic encodings in fields and arrays that don't store
null
. -
Support compatible migration of existing classes. Apply these properties to value classes in the Java platform, including the classes used for primitive boxing.
Non-Goals
- It is not a goal to support null-restricted types for identity classes or classes that do not provide a default value.
Motivation
Value objects are special objects that lack identity, and so can be freely duplicated and re-encoded by the JVM at run time. One especially useful optimization that can be applied to these objects is heap flattening, in which a reference to a value object is encoded as a compact bit vector of the object's field values, without a pointer to a different memory location. The bit vector can be stored directly in a field or an array of a value class type. This encoding strategy usually leads to smaller memory footprint and better locality than a standard encoding using heap-allocated objects and pointers.
However, in comparison to primitive types, heap flattening of value class
types can be inefficient, because it must account for null
references.
These are typically encoded by reserving some bits for a "null flag",
and those bits are then unavailable to encode the object's field values.
So, for example, a boxed Integer
requires 32 bits to encode the int
value, and at least 1 more bit for the null flag, probably leading to a
64-bit encoding.
Further, heap flattening of value class types is limited by the integrity
requirements of objects and references: the flattened data must be small enough to
read and write atomically, or else the encoded data may become corrupted.
On common platforms, "small enough" may mean as few as 32 or 64 bits.
So while many small value classes can be flattened, most value classes
that declare 2 or more fields will have to be encoded as ordinary heap
objects (unless the fields store primitives of types boolean
, char
, byte
, or
short
). Even a boxed Double
requires at least 65 bits (counting one for
a null flag), which exceeds that atomic read/write capabilities of many
systems.
Primitives types do not have these constraints: a primitive-typed field is implicitly
initialized to a zero value (or the equivalent) on creation, rather than null
;
and large primitive variables, of types long
or double
, are allowed to be
non-atomically updated
(see JLS 17.7).
Thus, for example, a large array of type int
has half the memory footprint
of a flattened array of type Integer
.
If the Java language had a type representing references to instances of a value class but not null, then there would be no need for a null flag, and the flattened storage could have a footprint no larger than the footprint of the class's fields.
This storage would need to be initialized to something, and so classes that
intend to support this feature would need to allow for a default value, something
like the 0
value used to initialize int
-typed storage.
Some value classes might further be willing to tolerate corrupt data created by
non-atomic reads and writes. Without the need to track null flags, the JVM's
data integrity requirements could be relaxed, allowing classes that opt in
to mimic the specified behavior of long
and double
. This choice would
shift responsibility to their users for managing concurrency and handling
any bugs arising from races.
Description
The features described below are preview features, enabled with the
--enable-preview
compile-time and runtime flags. More comprehensive
requirements and implementation details for the
language,
JVM, and
standard libraries
can be found in subtasks of this JEP.
Null-restricted types
A null-restricted type is a reference type expressed with the name of a value
class followed by the !
symbol. It asserts that the value of a given variable
or expression will not be null
.
Null-restricted types may appear in variable declarations, array allocations (as the component type), and casts.
void printAll(Range! r) {
for (int i = r.start; i < r.end; i++)
System.out.println(i);
}
printAll(new Range(5, 50));
printAll(null); // compiler error
A normal class type can be converted to a null-restricted type, and vice versa,
much like the type Integer
can be converted to and from the type int
. When
converting to a null-restricted type, a null check occurs at run time.
Range r = new Range(1, 3);
printAll(r);
r = null;
printAll(r); // NullPointerException
Object o = null;
r = (Range!) o; // NullPointerException
Arrays of null-restricted types can be assigned to non-restricted supertypes, and a null check continues to be enforced at run time, similarly to other array storage checks.
Range![] a1 = new Range![3];
a1[0] = new Range(-3, 0);
Range[] a2 = a1;
a2[1] = null; // ArrayStoreException
Object[] a3 = a2;
a3[2] = new Object(); // ArrayStoreException
a3[2] = null; // ArrayStoreException
Zero instances
When objects and arrays are created, each enclosed field or array component is automatically initialized to an appropriate default value. This ensures that if the program attempts to read from the variable before its first write, a predictable value can be found (rather than, say, garbage data).
Each primitive type has a zero-like default value: 0
, 0.0
, false
, etc.
For normal reference types, the default value is null
. But a variable with a
null-restricted type cannot store null
, so what is its default value?
Range![] a = new Range![100];
Range r = a[5]; // not null...
The answer is that the default value of a null-restricted type is a zero instance
of the given value class. The zero instance is created by simply setting each of
the class's instance fields to its own default value. Unlike null
, the zero
instance is a real, fully-functional object.
Range r = a[5];
System.out.println(r); // Range[start=0, end=0]
int size = r.size(); // 0
Implicit constructors
Notice that the zero instance of a value class is created automatically, without
any execution of code in the class. Not all value classes will be comfortable
with this behavior, or be willing to accept the zero instance as a valid object
in their domain. For example, for an Name
record with some String
fields,
the zero instance would be a name with all fields set to null
.
value record Name(String first, String last) {
public String toString() { return "%s %s".formatted(first, last); }
// zero instance toString: 'null null'
}
For this reason, creation of a zero instance must be authorized by the class, and
many value classes will choose not to opt in. We use a zero-argument constructor
with the implicit
modifier to allow zero instance to be created automatically at
run time.
value record Range(int start, int end) {
public implicit Range();
public Range(int start, int end) {
if (start > end) throw new IllegalArgumentException();
}
}
The implicit
constructor must always be public
and can be invoked directly,
producing a zero instance without executing any code. At run time, the implicit
constructor gives the JVM permission to create zero instances without invoking a
constructor at all.
If a value class declares an implicit
constructor, it must not be an inner
class, and its zero instance must not contain itself through circular
null-restricted field types.
value class ListNode {
implicit ListNode();
Object val;
ListNode! next; // error
}
If a value class with an implicit constructor extends an abstract class, that superclass must also declare an implicit constructor.
Because implicit
constructors are necessary for the allocation of null-restricted fields
and arrays, value classes that do not declare implicit
constructors cannot be used
as null-restricted types.
Non-atomic updates
A value class with an implicit
constructor may also declare that it tolerates
implicit creation of instances via non-atomic field and array updates. This
means that, in a race condition, new class instances may be accidentally created
by intermixing field values from other instances, without any code execution or
other additional cooperation from the value class.
A value class opts in to allowing this behavior by implementing the
LooselyConsistentValue
interface:
value class Point implements LooselyConsistentValue {
double x;
double y;
public implicit Point();
public Point(double x, double y) {
this.x = x;
this.y = y;
}
}
This is strawman syntax, subject to change.
Users of a LooselyConsistentValue
class are responsible for maintaining
the integrity of their data, and can avoid unwanted instance creation by
limiting access to a single thread, enforcing a synchronization protocol, or
declaring a field volatile
. Otherwise, unexpected instances may be created:
Point![] ps = { new Point(0.0, 1.0) };
new Thread(() -> ps[0] = new Point(2.0, 3.0)).start();
Point p = ps[0]; // may be (2.0, 1.0), among other possibilities
Some implicitly-constructible value classes have complex integrity constraints
for non-zero field values (for example, the start
index of a Range
, declared
above, must not exceed the end
index). In that circumstance, it may not be
appropriate for the class to implement the LooselyConsistentValue
interface.
This feature is designed for the subset of value classes that can comfortably
operate on arbitrary combinations of field values.
Performance model
As described in the Value Objects JEP, the typical treatment of a
standard value class is for local variables, method parameters, and expression
results to use inline encodings, while fields and array components are only
inlined if the value object, plus a null
flag, can fit in an atomic word size
(such as 64 bits).
Adding an implicit constructor to a value class enables null-restricted storage,
avoiding the need to dedicate any bits to a null
flag. So, for example, a
variable of type Long
might be too large to store inline, but the type Long!
should be safely inlinable on a 64-bit JVM.
For larger classes (as determined by the JVM implementation), implementing
LooselyConsistentValue
may also be necessary to enable inlining of these
null-restricted fields and array components.
When flattened, a null-restricted class type should have a heap storage
footprint and execution time (when fully optimized) comparable to the primitive
types. For example, a Point!
, given the class declaration above, can be
expected to directly occupy 128 bits in fields and array components, and to
avoid any allocation in stack computations. A field access simply references the
first or second 64 bits. There are no additional pointers.
Notably, null-restricted uses of a value class with an implicit constructor and a single instance field can be expected to have minimal overhead compared to operating on a value of the field's type directly.
However, JVMs are ultimately free to encode class instances however they see fit. Some classes may be considered too large to represent inline. Certain JVM components, in particular those that are less performance-tuned, may prefer to interact with instances as heap-allocated objects. An encoding might carry with it a cached heap pointer to reduce the overhead of future allocations. Etc.
Implicitly-Constructible Value Classes in the Standard Library
The following classes, which are considered value classes under JEP 401 when preview features are enabled, are further considered under this JEP to have an implicit constructor, despite not having declared such a constructor:
- java.lang.Byte
- java.lang.Short
- java.lang.Integer
- java.lang.Long
- java.lang.Float
- java.lang.Double
- java.lang.Boolean
- java.lang.Character
- java.util.Optional
Reflection and erasure
Like parameterized types, null-restricted types are erased in compiled field
and method signatures. There is no instance of java.lang.Class
to represent
Range!
, and adding or removing !
in APIs is a binary compatible refactoring.
However, unlike parameterized types, null restrictions are still enforced at run
time. This is achieved with compiler-generated null checks, and through a new
mechanism, called a CheckedType
, that performs a dynamic check when fields and
arrays are written to.
The CheckedType
of an array expresses the array's dynamic store check,
including any null check.
String[] a1 = new String[100];
CheckedType t1 = Array.getComponentType(a1);
t1.cast("abc"); // success
t1.cast(new Range(8, 12)); // ClassCastException
t1.cast(null); // success
Range![] a2 = new Range![100];
CheckedType t2 = Array.getComponentType(a2);
t2.cast("abc"); // ClassCastException
t2.cast(new Range(8, 12)); // success
t2.cast(null); // NullPointerException
New arrays can be created using a CheckedType
, and this mechanism should be
preferred over array allocations using Class
objects to represent the
component type.
Range[] a3 = (Range[]) Array.newInstance(t2, 100);
a3[10] = null; // ArrayStoreException
If a field is declared with a null-restricted type, the Field.getCheckedType
method will return the corresponding checked type.
Alternatives
Making use of primitive types, rather than declaring value classes, will often
produce a program with equivalent or slightly better performance. However, this
approach gives up the valuable abstractions provided by classes. It's easy to,
say, interpret a double
with the wrong units, pass an out-of-range int
to a
library method, or fail to keep two boolean
flags together in the right order.
Value classes provide useful performance benefits without needing implicit
constructors and non-atomic/null-restricted storage. In some cases, field and
array storage can already be inlined. But many classes cannot fit in an atomic
word size, or have no room to spare for a null
flag; and even if further
engineering could increase that atomic word size to a comfortable level, null
flags unnecessarily inflate memory footprint in many use cases. This JEP allows
the memory footprint to match that of primitive types.
We considered many different approaches to the object model and type system before settling on a model in which compact flattened heap storage is simply a JVM optimization for a null-restricted reference type. This strategy avoids the conceptual overhead that comes from generalizing the existing model of primitive types. Developers already understand objects and classes, and null-restricted types are a simple language enhancement that is useful as a general-purpose feature.
Risks and Assumptions
There are security risks involved in allowing instance creation outside of
constructors, via zero instances and non-atomic reads and writes. Developers
will need to understand the implications, and recognize when it would be unsafe
to declare an implicit constructor or implement the LooselyConsistentValue
interface.
Dependencies
This JEP depends on Value Classes and Objects (Preview), which establishes the semantics of identity-free objects and implements value object inlining.
Building on this JEP, JEP 402: Enhanced Primitive Boxing (Preview) refactors the primitive wrapper classes as value classes with implicit constructors.
In the future, JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field and array layouts when parameterized by null-restricted value class types.
More general support for nullness features will be explored in a future JEP.