JEP draft: enhanced checkcast for Valhalla type unification
Owner | John Rose |
Type | Feature |
Scope | JDK |
Status | Draft |
Created | 2022/11/18 02:39 |
Updated | 2022/11/18 21:18 |
Issue | 8297236 |
Motivation
Invisible conversions everywhere dense in your classfile...
In the Java language, the Valhalla project unifies the type system so that all values (except possibly null
) are assigned subclasses of Object
, notably int
and double
and other legacy primitive types (also known as the non-reference types in Java before Valhalla). This unification is done by more closely identifying, as "the same thing", both the boxed and the native forms of any given primitive value. This allows the legacy primitives to work in generic classes and methods (with suitably permissive type variable bounds).
But the JVM knows different: The native form of a double
occupies 64 bits and two stack slots, under the descriptor "D"
, while the boxed form stores the 64-bit value in one or more buffer objects in the heap, and its reference occupies just one stack slot, under the descriptor "Ljava/lang/Double;"
. There is no way, within the JVM, to retroactively unify those two representations of "the same" double value object.
Instead, the Java static compiler (e.g.,javac
) needs to juggle at least two representations for double
values, the two-slot native and the one-slot boxed reference. It must supply the correct format to each operation: If native arithmetic (like the dadd
bytecode) it must be in the native "D"
format, and if communicating with generic code or storing into an array of Object
or Number
, it must be in the boxed reference format.
The Java static compiler has historically done such juggling historically in order to implement language rules for implicit auto-boxing and auto-unboxing. To do this, it makes implicit calls (not seen in the source code) to well-defined API points such as Double::valueOf
(for boxing) and Double::doubleValue
(for unboxing).
Note also that the static compiler often implicitly changes the reference type of values around the boundaries of (erased) Java generic APIs, for example quietly casting the result of a call to List<String>::get
from Object
(the erased type variable bound, returned from the get
method) to String
(the type known at compile time). To do that task, it emits implicit uses of the checkcast
bytecode.
In the case of a generic API point like List<double>::get
, the Object
value returned from the method must be implicitly retyped as double
in two operational steps: Cast to Double
as a reference, and then unboxed to a native two-slot double with Double::doubleValue
.
For Valhalla user-defined primitives, the representation for both boxed and unboxed values are the same, but there are still going to be implicit changes in their types, and those changes will (probably, in the unboxing direction only) be reflected by operational null
-checks, using Objects::requireNonNull
or the equivalent.
With Valhalla, we expect that the frequency (or density) of such conversions and checks may increase in some codes, as users enjoy freedom from worry about whether their values are boxed or not. But the JVM will have to worry all the more, especially for legacy primitives. Also, the static compiler will have to send the right guidance to it, in the form of implicit bytecode instructions to manage the implicit boxing and unboxing.
Valhalla does not plan to enhance the verifier type system beyond where it is today. In particular, we do not plan to propagate the results of null
checks in the verifier type system. This means that if javac forgets to put in a call to Objects::requireNonNull
, null
s will be checked later if at all, when a variable is reached that is positively null
-rejecting. (The argument to Double::doubleValue
is null
-rejecting in that way, since a null
receiver elicits a NullPointerException
.)
In summary, this means that the following operations will effectively be used as virtual machine instructions for managing low-level type changes in code generated by javac
:
Double::doubleValue
-- for unboxingDouble
todouble
Double::valueOf
-- for boxingdouble
toDouble
.<Primtype>::<primtype>Value
-- likewise for unboxing any legacy primitive type.<Primtype>::valueOf
-- likewise for boxing any legacy primitive type.Objects::requireNonNull
-- for unboxing any user-defined primitive type- (no code) -- for boxing any user-defined primitive type (verifier sees no type change)
In addition, calls to requireNonNull
will, in many cases, need to be followed by a checkcast
to reassure the verifier that what came out of the method is the same type (in fact, the same reference) as what went in. (This effect is not visible at the language level, since requireNonNull
generically returns its input type. But the JVM requires a checkcast
here.)
Also additionally, calls to doubleValue
(or any <primtype>Value
) will, in many cases, need to be preceded by a checkcast
to the box type. (These cases are either explicit user casts from a supertype like Object
, or implicit casts inserted around a generic API point.)
This prospect of a much greater volume of implicit conversion bytecodes, or pairs of such conversions, suggests that perhaps the translation strategy for Valhalla would benefit from new support in the JVM bytecode instruction set for expressing those conversions more simply.
(That is a big "perhaps"; it is expensive to add new bytecodes. This memo explores that expensive option. The fallback position, and plan of record at the moment, is to use as many library routine calls as it takes to get the job done, and call it a day.)
Description
Enhance the checkcast
instruction in three directions:
- polymorphically produce legacy primitives as well as references (cf.
getstatic
for a precedent) - polymorphically consume legacy primitives as well as references (cf.
putstatic
for a precedent) - optionally perform null check operations
All three enhancements are enabled by a condition which previously has been illegal. That condition holds when the checkcast
instruction operand field, an index into the constant pool, refers to a CONSTANT_Utf8
item, rather than a CONSTANT_Class
item, as is already legal.
The spelling of the CONSTANT_Utf8
item selects the function:
">D"
pops anObject
reference, casts toDouble
and then callsdoubleValue
">I"
pops anObject
reference, casts toInteger
and then callsintValue
"<D"
pops adouble
(two slots) and callsDouble::valueOf
"<I"
pops anint
and callsInteger::valueOf
- (and so on for other
">x"
and"<x"
, wherex
is in[BSIJZCFD]
) "!"
peeks at the value on the stack and throwsNPE
if it is null
Any other operand (any other spelling or other constant pool entry type) will fail verification of the checkcast
instruction, and is thereby reserved for future use.
The descriptions above are carefully crafted to imply the following interactions with the verifier type system:
">x"
requires a reference (Object
) on the stack and leaves a primitive (x
) on the stack"<x"
requires a primitive (x
) on the stack and leaves its box type (not merelyObject
) on the stack"!"
requires a reference on the stack and leaves that reference alone, with the same verifier type
It is obvious that an efficient interpreter would probably choose to require these new UTF8-using forms of checkcast
to an internal, otherwise unused bytecode, and use the operand field of that bytecode to efficiently select the required behavior corresponding to the UTF8 string.
It seems likely that the javac
compiler should choose to emit these new instances of checkcast
in some or all of the cases where it previously has emitted the method calls (whether implicit or explicit in the source code).
For presentations of this bytecode by other low-level tools, it is suggested that the name checkbox
be used instead of checkcast
, and the instruction be presented with its string operand unchanged. But the code point for checkcast
(decimal 192) should be reused (overloaded) for this new purpose, rather than allocating a new codepoint.