JEP 401: Null-Restricted Value Object Storage (Preview)
Owner | Dan Smith |
Type | Feature |
Scope | SE |
Status | Draft |
Discussion | valhalla dash dev at openjdk dot java dot net |
Effort | XL |
Duration | XL |
Reviewed by | Brian Goetz |
Created | 2020/08/13 19:31 |
Updated | 2023/03/21 20:36 |
Issue | 8251554 |
Summary
Allow certain value classes to enable null-restricted, compact storage of their instances in variables. This is a preview language and VM feature.
Goals
Provide language features for value class authors to express that a value class permits instance creation outside of the normal construction protocol, including instances created as default values and as the result of non-atomic variable updates.
Support special treatment for null-restricted types used with these classes, setting fields and arrays initially to an appropriate default value.
Optimize HotSpot's treatment of these fields and arrays, supporting an inlined encoding that avoids any need for object headers, indirections, or heap allocation.
Non-Goals
The storage behavior of primitive values has provided inspiration for this JEP, but primitive types remain distinct from value class types and do not interact with these features. Enhancements to the treatment of primitive types will be explored in Enhanced Primitive Boxing.
Future enhancements to the JVM are anticipated to support inlining of value objects within generic APIs. For now, generic APIs work with erased types and heap-allocated objects, as usual.
Existing value-based classes in the standard libraries will not be affected by this JEP. Once the features of this JEP become final, they can be applied to classes in the standard libraries as a separate task.
Motivation
Value classes give up their instances' object identity in exchange for better performance. Specifically, the lack of identity enables inlined object encodings—instances directly encoded as sequences of field values, avoiding any overhead from object headers, indirections, or heap allocation.
Unfortunately, it is difficult for Java Virtual Machines to achieve maximal performance when working with inlined value objects in field and array storage. There are two significant constraints:
-
A variable of a value class type is initially set to
null
, so the inlined layout of a value object typically requires some additional bits to encodenull
. For example, a variable storing anint
can fit in 32 bits, but for a value class with a singleint
field, a variable of that class type could use up to 64 bits. -
A variable of a value class type must be modified atomically in order to respect encapsulation of its state. But inlined object layouts are often too large for efficient atomic modification—the overhead to guarantee atomicity exceeds typical benefits of the inlined layout.
Null-restricted types help solve this problem by allowing
programmers to declare null-free field and array component types. But if we want
to avoid the need for an underlying null
representation in the JVM,
newly-created fields and arrays need to be initialized to something else—a
non-null
default instance of the class. To allow for such instances, we
would need value class authors to give up some control over instance creation.
Similarly, we can ask developers to manage concurrent variable accesses via external means, accepting the risk of bugs arising from non-atomic modification. But to allow for class instances that may be produced by race conditions, we would need class authors to give up further control over instance creation.
This JEP provides value class authors with the option to opt out of some guarantees provided by the normal object construction protocol, and in exchange get better-performing field and array storage.
Description
The features described below are preview features, enabled with the
--enable-preview
compile-time and runtime flags.
Optional construction and default instances
A concrete value class may declare an optional
constructor, with no arguments
and no body:
value class Range {
int start;
int end;
public optional Range();
public Range(int start, int end) {
if (start > end) throw new IllegalArgumentException();
this.start = start;
this.end = end;
}
}
The optional
constructor must be public
.
This is strawman syntax, subject to change.
A class with an optional
constructor gives permission for class instances to
be created outside of the normal instance creation process. The methods of such
a class should be prepared to work with these instances.
In particular, the class has a default instance produced by setting all of its
fields to their default values (null
, 0, etc.) This default instance exists
without executing any constructor code.
Value classes that declare an optional
constructor are subject to additional
restrictions:
-
The class must not be an inner class with an enclosing instance.
-
No instance field of the class may have a null-restricted type that depends—directly or indirectly—on the declaring class. In other words, the default instance of the class must not contain itself.
The default
keyword can be used in conjunction with a value class name to
access the default instance of the class.
Range zero = Range.default;
assert range.start == 0;
assert range.end == 0;
For many value classes, the default instance would violate the class's
invariants (for example, a reference-typed field might be required to be
initialized to something other than null
). In that circumstance, it may not be
appropriate for the class to declare an optional
constructor. This feature is
designed for the subset of value classes that can comfortably operate on their
default instance.
Optional construction and atomicity
A value class with an optional
constructor and two or more instance fields may
declare the optional
constructor non-atomic
:
value class Point {
double x;
double y;
public non-atomic optional Point();
public Point(double x, double y) {
this.x = x;
this.y = y;
}
}
This is strawman syntax, subject to change.
The non-atomic
keyword allows variables of the value class type to be updated
one field at a time, leading to situations in which a partially-updated instance
(or concurrently-updated instance) can be observed by another thread. (Compare
the treatment of the primitive types double
and long
, as described in
JLS 17.7.)
Like the default instance, a class instance produced in this way is created without executing any constructor code.
Point[] ps = new Point[]{ new Point(0.0, 1.0) };
new Thread(() -> ps[0] = new Point(1.0, 0.0)).run();
Point p = ps[0]; // may be (1.0, 1.0), among other possibilities
Users of a value class with a non-atomic
constructor are responsible for
maintaining the integrity of their data, and can avoid unwanted instance
creation by limiting access to a single thread, enforcing a synchronization
protocol, or declaring a field volatile
.
Some value classes with an optional
constructor have complex integrity
constraints for non-zero field values (for example, the start
index of a
Range
, declared above, must not exceed the end
index). In that circumstance,
it may not be appropriate for the class to declare its constructor non-atomic
.
This feature is designed for the subset of value classes that can comfortably
operate on arbitrary combinations of field values.
Use of null-restricted types
A variable with a null-restricted type prevents attempts to set
the variable to null
. Details of this behavior are described in the referenced
JEP.
The details of general-purpose null-restricted types are still under development. The most relevant feature for this JEP is that the type of a variable or method return may use a
!
suffix to indicate that it does not storenull
. This is enforced with runtime checks.
As a special feature of value classes with optional
constructors, when a
null-restricted field or array component with the class's type is created, it is
initialized to the class's default instance.
class Cursor {
private Point! position;
public Cursor() {
}
public Cursor(Point! position) {
this.position = position;
}
static void test() {
Cursor c = new Cursor();
assert c.position == Point.default;
c = new Cursor(null); // NullPointerException
}
}
Additionally, if an array was allocated with the class's null-restricted type, it will dynamically check for nulls at run time, even when viewed through an unrestricted compile-time type.
Object[] objs = new Point![10];
assert objs[2] == Point.default;
objs[2] = null; // NullPointerException
class
file representation & interpretation
Concrete value classes with optional
and non-atomic
constructors encode
these properties in a ClassFile
attribute (details TBD). At preparation time,
an error occurs if a value class with an optional
constructor has an illegal
circularity in its instance field types.
A field descriptor uses a special Q
prefix to indicate that the field has a
null-restricted value class type. The named class is loaded and validated during
preparation (or at another point before the first access of the field), ensuring
that it has optional constructors. Reads of the field (getfield
, getstatic
)
will initially produce the default instance of the named class. Writes to the
field (putfield
, putstatic
) will check for null
, throwing a
NullPointerException
if found.
An anewarray
or multianewarray
instruction similarly uses a special Q
descriptor to indicate that the element type of the array is null-restricted,
with aaload
producing default instances and aastore
performing null checks.
Q
descriptors are treated as if they were L
descriptors for purposes of
verification and field resolution. A field reference using an L
descriptor may
resolve to a field declared with a Q
descriptor, or vice versa.
The class Foo
must be initialized whenever another class with a field of type
QFoo;
is initialized, and whenever an array with element type QFoo;
is
created.
Additional uses of
Q
descriptors may be supported in the future, but for now, these are the only places where the syntax is legal.
Java language compilation
If a field's type is null-restricted and names a value class with optional
constructors, the type is compiled to a Q
descriptor (both at the declaration
and use sites). Otherwise, an L
descriptor is used.
Similarly, an array creation with an appropriate element type compiles to
anewarray
or multianewarray
using a Q
descriptor. However, all varibles
storing arrays use L
descriptors.
A default
expression compiles to either aconst_init
(if possible), a
reflective call, or an array allocation/read combination (details TBD).
Core reflection
The Field.getType()
and Class.getComponentType()
methods return a special
Class
object representing a null-restricted type if the underlying field or
array uses a Q
descriptor.
In most respects, the null-restricted Foo
Class
object is identical to
Foo.class
. Preview methods of Class
(details TBD) expose the null-restricted
property and map from one Class
to the other. There is no class literal for
the null-restricted Foo
type—instead, developers can reach it via an instance
method of Foo.class
.
Only value classes with optional
constructors have these special Class
objects modeling null-restricted types. Null-restricted types of other classes
are erased and have no Class
object representation.
Other API & tool support
java.lang.constant
and java.lang.invoke
support Q
types in field
descriptors.
javax.lang.model
supports the optional
and non-atomic
modifiers.
The javadoc
tool advertises whether a value class has optional
constructors
and non-atomic
fields.
Performance model
In typical usage, for a value class with an optional
constructor and no
atomicity requirements, a null-restricted class type should have a heap storage
footprint and execution time (when fully optimized) comparable to the primitive
types. For example, a Point!
, given the class declaration above, can be
expected to directly occupy 128 bits in fields and array components, and to
avoid any allocation in stack computations. A field access simply references the
first or second 64 bits. There are no additional pointers.
Notably, null-restricted uses of a value class with an optional
constructor
and a single instance field can be expected to have minimal overhead compared to
operating on a value of the field's type directly.
However, JVMs are ultimately free to encode class instances however they see fit. Some classes may be considered too large to represent inline. Certain JVM components, in particular those that are less performance-tuned, may prefer to interact with instances as heap-allocated objects. An encoding might carry with it a cached heap pointer to reduce the overhead of future allocations. Etc.
HotSpot implementation
This section describes implementation details of this release of the HotSpot virtual machine, for the information of OpenJDK engineers. These details are subject to change in future releases and should not be assumed by users of HotSpot or other JVMs.
Values of Q
types in HotSpot are encoded as follows:
-
Value classes with field layouts exceeding a size threshold, that do not declare an
optional
constructor, or that require atomic updates are always encoded as regular heap objects. Fields markedvolatile
always store regular heap objects.In this case,
Q
-typed fields initially storenull
, but this value is detected and lazily replaced by the class's default instance whenever a read operation occurs. In contrast,Q
-typed arrays are eagerly filled with a default instance pointer at array creation time. -
Otherwise, value objects are encoded in fields and arrays as a flattened sequence of field values. Array components may be padded to achieve good alignment.
In this case, the initial default instance encoding is achieved by setting all bits to 0 when the field or array is allocated. Reads and writes must adapt between the use site encoding and the field encoding, at times copying from heap storage on writes and allocating new heap storage on reads. Array accesses may need to dynamically check a flag to determine whether the underlying array is flattened.
Alternatives
Making use of primitive types, rather than declaring value classes, will often
produce a program with equivalent or slightly better performance. However, this
approach gives up the valuable abstractions provided by classes. It's easy to,
say, interpret a double
with the wrong units, pass an out-of-range int
to a
library method, or fail to keep two boolean
flags together in the right order.
Value classes provide useful performance benefits without needing optional
constructors and null-restricted storage. And with additional innovation in JVM
implementation techniques and hardware capabilities, the performance costs of
null
encodings and atomic updates may shrink further. However, the limitations
outlined in the "Motivation" section are pretty fundamental. For example, a
value class type wrapping a single long
field and supporting the full range of
long
values for that field can never be encoded in fewer than 65 bits. This
JEP gives programmers who need fine-grained control a more reliable performance
model for heap storage.
We considered many different approaches to the object model and type system before settling on a model in which inlined heap storage is simply a JVM optimization for a null-restricted reference type. This strategy avoids the conceptual overhead that comes from generalizing the existing model for primitive types. Developers already understand objects and classes, and null-restricted types are a simple language enhancement that is useful as a general-purpose feature.
Risks and Assumptions
There are security risks involved in allowing instance creation outside of
constructors, via default instances and non-atomic reads and writes. Developers
will need to understand the implications, and recognize when it would be unsafe
to use the optional
or non-atomic
keywords.
Dependencies
This JEP depends on Value Objects (Preview), which establishes the semantics of identity-free objects and applies many JVM optimizations.
This JEP also depends on Null-Restricted and Nullable Types (Preview), which introduces null-restricted types and defines their runtime behavior.
Building on this JEP, JEP 402: Enhanced Primitive Boxing (Preview)
refactors the primitive wrapper classes as value classes with optional
constructors.
In the future, JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field and array layouts when parameterized by null-restricted value class types.