JEP 401: Primitive Classes (Preview)
Owner | Dan Smith |
Type | Feature |
Scope | SE |
Status | Candidate |
Discussion | valhalla dash dev at openjdk dot java dot net |
Effort | XL |
Duration | XL |
Reviewed by | Brian Goetz |
Created | 2020/08/13 19:31 |
Updated | 2022/05/19 17:57 |
Issue | 8251554 |
Summary
Support new, developer-declared primitive types in Java. This is a preview language and VM feature.
Goals
This JEP introduces primitive classes, special kinds of value classes that define new primitive types.
The Java programming language will be enhanced to recognize primitive class declarations and support new primitive types in its type system.
The Java Virtual Machine will be enhanced with a new Q
carrier type to encode
declared primitive types.
Non-Goals
This JEP is concerned with the core treatment of developer-declared primitives. Additional features to improve integration with the Java programming language are not covered here, but are expected to be developed in parallel. Specifically:
-
JEP 402 will enhance the basic primitives (
int
,boolean
, etc.) by giving them primitive class declarations. -
A separate JEP will update Java's generics so that primitive types can be used as type arguments.
Other followup efforts may enhance existing APIs to take advantage of primitive classes, or introduce new language features and APIs built on top of primitive classes.
Motivation
Java developers work with two kinds of values: primitives and objects.
Primitives offer better performance, because they are typically inlined—stored directly (without headers or pointers) in variables, on the computation stack, and, ultimately, in CPU registers. Hence, memory reads do not have additional indirections, primitive arrays are stored densely and contiguously in memory, primitive-typed fields can be similarly compact, primitive values do not require garbage collection, and primitive operations are performed within the CPU.
Objects offer better abstractions, including fields, methods, constructors, access control, and nominal subtyping. But objects traditionally perform poorly in comparison to primitives, because they are primarily stored in heap-allocated memory and accessed by reference.
Value objects, introduced by another JEP, significantly improve object performance in many contexts, providing a good fusion of the better abstractions of objects with the better performance of primitives.
However, certain invariant properties of objects limit how much they can be optimized—particularly when stored in fields and arrays. Specifically:
-
A variable of a reference type may be
null
, so the inlined layout of a value object typically requires some additional bits to encodenull
. For example, a variable storing anint
can fit in 32 bits, but for a value class with a singleint
field, a variable of that class type could use up to 64 bits. -
A variable of a reference type must be modified atomically. This often makes it impractical to inline a value object, because its layout would be too large for efficient atomic modification. Large primitive types (currently,
double
andlong
) make no such atomicity guarantees, so variables of these types can be modified efficiently without indirect representations (concurrency is instead managed at a higher level).
Primitive classes give developers the capability to define new primitive types that aren't subject to these limitations. Programs can make use of class features without giving up any of the performance benefits of primitives.
Applications of developer-declared primitives include:
-
Numbers of varieties not supported by the basic primitives, such as unsigned bytes, 128-bit integers, and half-precision floats;
-
Points, complex numbers, colors, vectors, and other multi-dimensional numerics;
-
Numbers with units—sizes, rates of change, currency, etc.;
-
Bitmasks and other compressed encodings of data;
-
Map entries and other data structure internals;
-
Data-carrying tuples and multiple returns;
-
Aggregations of other primitive types, potentially multiple layers deep
Description
The features described below are preview features, enabled with the
--enable-preview
compile-time and runtime flags.
Primitive classes
A primitive class is a special kind of value class that introduces a new primitive type.
As value classes, primitive classes have no identity. This allows their instances to be freely converted between value objects and simpler primitive values. A primitive value can be thought of as a bare sequence of field values, without any headers or extra pointers.
A primitive class is declared with the primitive
contextual keyword.
primitive class Point implements Shape {
private double x;
private double y;
public Point(double x, double y) {
this.x = x;
this.y = y;
}
public double x() { return x; }
public double y() { return y; }
public Point translate(double dx, double dy) {
return new Point(x+dx, y+dy);
}
public boolean contains(Point p) {
return equals(p);
}
}
interface Shape {
boolean contains(Point p);
}
(Alternatively, we might prefer the class to be declared as primitive Point
.)
Primitive class declarations are subject to the same restrictions
as other value class declarations. For example, the instance fields of a
primitive class are implicitly final
, so cannot be assigned outside of a
constructor or initializer.
In addition, no instance field of a primitive class declaration may have a primitive type that depends—directly or indirectly—on the declaring class. In other words, with the exception of reference-typed fields, the class must allow for flat, fixed-size layouts without cycles.
In most other ways, a primitive class declaration is just like any other class
declaration. It can have superinterfaces, type parameters, enclosing instances (todo: maybe a bad idea, because it allows enclosing this
to be null
),
inner classes, overloaded constructors, static
members, and the full range of
access restrictions on its members.
Primitive types
The name of a primitive class denotes that class's primitive type. Primitive types store instances of the named class as primitive values. Instances can be created with normal class instance creation expressions.
Point p1 = new Point(1.0, -0.5);
Field access and method invocation are supported by primitive types. The members of a primitive type are the same as the members of the class.
assert p1.x() == 1.0;
Point p2 = p1.translate(0.0, 1.0);
System.out.println(p2.toString());
Primitive types support the ==
and !=
operators when comparing two values of
the same type. As is the case for value objects, the ==
comparison recursively
compares the values' fields.
Point p3 = new Point(1.8, 3.6);
Point p4 = p3.translate(0.0, 0.0);
assert p3 == p4;
Like a value class reference type, an expression of a primitive type cannot be
used as the operand of a synchronized
statement.
Unlike other value classes, a this
expression in the body of a primitive
class has a primitive type.
Default values and null
Like the basic primitive types (int
, boolean
, etc.), declared primitive
types do not allow null
.
Whenever a field or array component is created, the longstanding behavior is to
set its initial value to the default value of its type. For reference types,
this value is null
, and for the basic primitive types, this value is 0 or
false
.
For a declared primitive type, the default value is the initial instance of the class: an instance whose fields are all set to their own default values.
Object[] os = new Object[5];
assert os[0] == null;
Point[] ps = new Point[5];
assert ps[0].x() == 0.0 && ps[0].y() == 0.0;
As shorthand, the default value of a primitive type can be expressed with the
class name followed by the default
keyword.
assert Point.default.x() == 0.0 &&
Point.default.y() == 0.0;
Note that the initial instance of a primitive class is created without invoking
any constructors or instance initializers, and is available to anyone with
access to the class (or its reflective Class
object). Primitive classes are
not able to specify an initial instance that sets fields to something other than
their default values.
Methods of primitive classes should be designed to work on the initial instance. If this isn't feasible (for example, a reference-typed field is expected to be non-null), it may not be appropriate for the class to have a primitive type. Instead, it can be declared as a normal value class.
Multi-threaded reads and writes
As for the basic primitive types double
and long
, when a field or array
component has a declared primitive type, reads and writes might not be atomic.
As a result, in a multi-threaded program, unexpected instances may be
encountered.
Point[] ps = new Point[]{ new Point(0.0, 1.0) };
new Thread(() -> ps[0] = new Point(1.0, 0.0)).run();
Point p = ps[0]; // may be (1.0, 1.0), among other possibilities
Like initial instances, primitive class instances produced by non-atomic reads
and writes are created without invoking any constructors or instance
initializers. There is no opportunity for the class to ensure that the field
values of the new object are compatible with each other (for example, a start
index may end up being greater than an end
index).
To ensure that a particular primitive-typed field is always read from and
written to atomically, the field can be declared volatile
. But there is no
mechanism for a primitive class to ensure that all fields and array components
of its type are considered volatile.
A class with a complex integrity constraint in its constructor may not be a good candidate to be a primitive class. Instead, it can be declared as a normal value class.
Reference types
Primitive values are monomorphic—they belong to a single type with a specific set of fields known at compile time and runtime. Values of different primitive types can't be mixed.
To participate in the polymorphic reference type hierarchy, primitive values are converted to value objects with a value object conversion. This occurs implicitly when assigning from a primitive type to a reference type. The result is an instance of the same class, just in a different form.
Shape s = p1; // value object conversion
assert s.getClass() == Point.class;
When invoking an inherited method of a primitive type, the receiver value undergoes value object conversion to have the type expected by the method declaration.
Point p = new Point(0.3, 7.2);
// toString is declared by Object
p.toString(); // value object conversion
It is sometimes useful to talk about the reference type of a primitive class.
This type is expressed with the class name followed by the ref
contextual
keyword. A variable with a primitive class reference type stores either a value
object belonging to the named class or null
.
Point.ref[] prs = new Point.ref[10];
prs[1] = new Point(1.0, 1.0);
prs[4] = new Point(4.0, 4.0);
for (Point.ref pr : prs) {
if (pr != null)
System.out.println(pr);
}
The ref
type is useful when null
is needed or when the runtime
characteristics of reference types are preferred (for example, a large sparse
array might be more efficiently encoded with references).
The relationship between the types Point
and Point.ref
is similar to the
traditional relationship between the types int
and Integer
. However, Point
and Point.ref
both correspond to the same class declaration; the values of
both types are instances of a single Point
class. At run time, the conversion
between a primitive value and a value object is more lightweight than
traditional boxing conversion.
Value objects can be converted back to primitive values with a primitive value
conversion. null
cannot be converted to a primitive value, so attempts to
convert it cause an exception.
Point p = prs[1]; // primitive value conversion
prs[1] = null;
p = prs[1]; // NullPointerException
When invoking a method overridden by a primitive class, the receiver object undergoes primitive value conversion to have the type expected by the method declaration.
Shape s = new Point(0.7, 3.2);
// 'contains' is declared by Point
s.contains(Point.default); // primitive value conversion
Overload resolution and type arguments
Value object conversion and primitive value conversion are allowed in loose, but not strict, invocation contexts. This follows the pattern of boxing and unboxing: a method overload that is applicable without applying the conversions takes priority over one that requires them.
void m(Point p, int i) { ... }
void m(Point.ref pr, Integer i) { ... }
void test(Point.ref pr, Integer i) {
m(pr, i); // prefers the second declaration
m(pr, 0); // ambiguous
}
For now, Java's generics only work with reference types. Another JEP will enhance generics to interoperate with primitive types.
Thus, provisionally, type arguments must be inferred to be reference types. Type inference treats value object and primitive value conversions the same as boxing and unboxing—for example, a primitive value passed where an inferred type is expected will lead to a reference-typed inference constraint.
var list = List.of(new Point(1.0, 5.0));
// infers List<Point.ref>
Array subtyping
Traditionally, primitive array types are not related to reference array
types—an int[]
cannot be assigned to an Object[]
variable.
Arrays of declared primitive types are more flexible: the type Point[]
is a
subtype of Point.ref[]
, which is a subtype of Object[]
.
(Basic primitive array types like int[]
will also gain this capability with
JEP 402.)
When a reference is stored in an array of static type Object[]
, if the array's
runtime component type is Point
then the operation will perform both an array
store check (checking that the object is an instance of class Point
) and a
primitive value conversion (converting the object to a primitive value).
Similarly, reading from an array of static type Object[]
will cause a
value object conversion if the array stores primitive values.
Object replace(Object[] objs, int i, Object val) {
Object result = objs[i]; // may perform value object conversion
objs[i] = val; // may perform primitive value conversion
return result;
}
Point[] ps = new Point[]{ new Point(3.0, -2.1) };
replace(ps, 0, new Point(-2.1, 3.0));
replace(ps, 0, null); // NPE from primitive value conversion
class
file representation & interpretation
A primitive class is declared in a class
file using the ACC_PRIMITIVE
modifier (0x0800
). At class load time, an error occurs if a primitive class is
not a value class (via ACC_VALUE
, 0x0100
). At preparation time, an error
occurs if a primitive class has a primitive type circularity in its instance
fields.
A declared primitive type is represented with a new Q
descriptor prefix
(QPoint;
). The class's reference type is represented using the usual L
descriptor (LPoint;
).
Primitive values with Q
types are one-slot stack values, even though they may
represent aggregates of much more than 32 or 64 bits. No particular encoding of
primitive values is mandated.
Verification treats a Q
type as a subtype of the corresponding L
type—e.g.,
QPoint;
is a subtype of LPoint;
. Conversions from primitive values to value
objects occur implicitly, as needed.
The this
parameter of a primitive class's instance method has a primitive
type.
Classes mentioned by primitive types in field and method descriptors are loaded during linkage, before the first access of that field or method.
A CONSTANT_Class
constant pool entry may refer to a primitive type using a Q
descriptor as a "class name". A CONSTANT_Class
using the plain name of a
primitive class represents the class's reference type.
The aconst_init
instruction may refer to either a primitive type or a
reference type. This determines whether a primitive value or a value object is
produced.
Similarly, a CONSTANT_Fieldref
or CONSTANT_Methodref
may refer to a field or
method as a member of a primitive type or a reference type. In the case of
withfield
, this determines the result type of the operation.
The anewarray
and multianewarray
instructions can be used to create arrays
of declared primitive types. Array subtyping allows these arrays to be viewed as
instances of reference array types.
The checkcast
, instanceof
, and aastore
opcodes support primitive value
types, performing primitive value conversions (including null
checks) when
necessary.
Primitive classes may be initialized for the same reasons as other classes (for
example, before a static method is invoked). In addition, primitive class
initialization is triggered by the aconst_init
instruction, by each of the
anewarray
and multianewarray
instructions when used with a primitive type,
and (recursively) by initialization of another class that declares a
primitive-typed field mentioning the primitive class.
Core reflection
Every primitive class has a java.lang.Class
object representing the class.
For both primitive values and value objects, the getClass
method of the
class's instances returns this object. A class literal—Point.class
—can also
be used to express this object.
Tentatively: this Class
object returns true
from the isPrimitive
method,
and getModifiers
shows its Modifier.PRIMITIVE
flag set.
For uses that need to model types, there is one Class
object representing
the primitive type, and another representing the reference type. Each of these
have the same behavior as the Class
object representing the class in most
respects, except for methods to explicitly tell them apart and map from one to
the other.
Tentatively: the Class
object representing the class doubles as a
representation of the primitive type. A separate Class
object exist for the
purpose of representing the reference type.
Other APIs
The following APIs also gain new behaviors:
-
java.lang.constant
encodesQ
types inCONSTANT_Class
structures and field and method descriptors -
java.lang.invoke
recognizesQ
types and supportsL
-to-Q
conversions -
javax.lang.model
recognizes primitive class declarations
Performance model
In typical usage, in heap storage and during fully-optimized code execution,
declared primitive types should have a footprint and execution overhead
comparable to the basic primitive types. For example, a Point
, as declared
above, can be expected to directly occupy 128 bits in local variables,
parameters, fields, and array components. A field access simply extracts the
first or second 64 bits. There are no additional pointers or metadata fields.
Notably, a primitive class with a single instance field can be expected to have minimal overhead compared to operating on a value of the field's type directly.
However, JVMs are ultimately free to encode primitive values however they see fit. Some classes may be considered too large to represent inline. Certain JVM components, in particular those that are less performance-tuned, may prefer to interact with primitive values as objects. A primitive value might carry with it a cached value object pointer to reduce the overhead of future conversions. Etc.
Value objects that are instances of primitive classes can be expected to behave much like instances of other value classes.
HotSpot implementation
This section describes implementation details of this release of the HotSpot virtual machine, for the information of OpenJDK engineers. These details are subject to change in future releases and should not be assumed by users of HotSpot or other JVMs.
Values of Q
types in HotSpot are encoded as follows:
-
Primitive classes whose field layouts exceed a size threshold are always encoded as regular heap objects. Fields marked
volatile
always store regular heap objects. -
Otherwise, primitive values are encoded in fields and arrays as a flattened sequence of field values. Array components may be padded to achieve good alignment.
-
In the interpreter and C1, primitive values on the stack are represented as value objects. Each read of a primitive-typed field or array allocates a heap object.
-
In C2, primitive values on the stack are scalarized, effectively encoding each field as a separate variable. Methods with Q-typed parameters support both a pointer-based entry point (for interpreter and C1 calls) and a scalarized entry point (for C2-to-C2 calls). Value objects are also scalarized when working with the primitive class's reference type. Heap allocations occur where any other supertype is used.
Default values are generally encoded as sequences of zeros, simplifying the task
of field and array creation. However, in cases where a field or array encodes
primitive values as heap pointers, the default value is a non-zero pointer.
(Circularities may require this value to be null
temporarily, but the null
must be hidden from program code.)
Some array types, like [Ljava/lang/Object;
and [LPoint;
, allow for both
pointer-based and flattened arrays. Reads and writes for these types dynamically
check a flag and perform the necessary conversions when operating on flattened
arrays.
Alternatives
Making use of the basic primitive types, rather than declaring new primitives,
will often produce a program with equivalent or slightly better performance.
However, this approach gives up the valuable abstractions provided by classes.
It's easy to, say, interpret a double
with the wrong units, pass an
out-of-range int
to a library method, or fail to keep two boolean
flags
together in the right order.
Normal value classes provide many of the benefits of primitive classes, without
the substantial disruptions to the language and JVM type systems. With
additional innovation in JVM implementation techniques and hardware
capabilities, the gap may close further. However, the limitations outlined in
the "Motivation" section are pretty fundamental. For example, a value class type
wrapping a single long
field and supporting the full range of long
values
for that field can never be encoded in fewer than 65 bits. Primitive classes
give programmers who need fine-grained control a more reliable performance
model.
We considered many different approaches to boxing and polymorphism before settling on a model in which primitive values and value objects are two different representations, with two different types, of the same class instances. This strategy balances the traditional understanding of primitive types, with familiar semantics, performance expectations, and conversions to objects, with the simplicity of a single named class declaration for modeling data in both the primitive and reference spaces. Strategies in which a primitive value is a object obscure some important differences between the types. Strategies in which conversions occur between two different class-like entities introduce distracting complexity.
Risks and Assumptions
There are security risks involved in allowing instance creation outside of
constructors, via default instances and non-atomic reads and writes. Developers
will need to understand the implications, and recognize when it would be unsafe
to declare a class primitive
.
This JEP does not address the interaction of primitive classes with the basic primitives or generics; these features will be addressed by other JEPs (see below). But, ultimately, all three JEPs will need to be completed to deliver a cohesive language design.
Dependencies
This JEP depends on Value Objects, which establishes the semantics of primitives when treated as objects. Primitive classes are a special case of value classes.
In support of this JEP, there are separate efforts to improve the JVM
Specification (in particular its treatment of class
file validation) and the
Java Language Specification (in particular its treatment of types). These
changes address technical debt and facilitate the specification of these new
features.
In JEP 402 we propose to update the basic primitive types (int
,
boolean
, etc.) to be represented by primitive classes, unifying the two kinds
of primitive types. The existing wrapper classes will be repurposed to represent
the corresponding types' primitive classes.
In another JEP we will propose modifying the generics model in Java to make type parameters universal—instantiable by all types, both reference and primitive.
In the future, JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by primitive types.