JEP draft: low-level control of field initialization

Owner	John Rose
Type	Feature
Scope	JDK
Status	Draft
Created	2022/11/16 18:55
Updated	2024/02/27 18:04
Issue	8297156

There are occasional needs to adjust and control the initialization of class fields. In many cases, a read of an uninitialized field (RBFW = read-before-first-write) is a program error, but such errors are (again in many cases) not fully diagnosed.

Static analysis mandated by the JLS can and does catch many RBFW errors in source code. But all such mechanisms are leaky and cannot be made fully correct without new and complex restrictions of dynamic “Turing machine” behavior. A non-static field can be observed in the uninitialized state if this escapes from a constructor before it sets the field; a static field can be observed in the uninitialized state if a lexically preceding static action calls a method which reads the field. Concurrency adds more complexity, and dynamic linking across compilation boundary adds more.

The language and VM specifications could add more explicit control of field initialization, including tracking and testing of field initialization states, to make these use cases more secure and efficient. This control could also be extended to array elements, at least for appropriately configured array types.

Here are the current mechanisms:

The JLS mandates static tracking of “definite assignment” for final fields.
The JLS also mandates a more limited ad hoc rule for all static fields.
The JVMS mandates that, if a field can be read before first write, a type-specific default value is always present.
The JVMS mandates that the default reference (null) self-diagnoses (with NPE) when it is used, except for a very limited set of operations.
The JVMS mandates that the first read of a static field trigger a class initialization process (via <clinit>) that can supply an initializing write that precedes the read.

These rules, taken together, catch many but not all RBFW errors, with varying degress of specificity. (NPE errors are notoriously hard to trace to their root cause.) For Java at least, a dynamic check is necessary to complete the proof that all RBFW errors will be caught. For more discussion about the need for dynamic checks in Java, please see slides 39-46 in the Rose JVMLS17 talk, on the theme “Java safety checking is never finished”.

If a variable can be assigned the default value of its type (thereby confirming the default value initially stored there), then dynamically detecting RBFW errors requires that the JVM somewhere maintain some additional state tracking to distinguish between the initial default state and any similar state obtained by explicitly writing the default value. The effect of this would be a query that can always tell if the field has never yet been written.

Sometimes RBFW conditions do not throw, but rather execute some sort of repair. This is true with classic static fields, which automatically initialize. It is also true of more subtle lazy-static fields. In those cases, there is a second kind of RBFW error, in which the evaluation of the repair action (such as evaluating a field initializer expression) has a cyclic dependency on the field value, so that the repair recursively requests itself; this is essentially a stack overflow. It would be useful for the JVM to record, for such fields, an additional state that indicates that the repair is pending. (More accurately, repair has started, but might not have finished; the previously described state indicates when the repair has finished.) A read of the field that detects this second condition, and verifies that there has not yet been a write, must report that the repair has failed.

Within the current specifications, it might be useful to track variable initialization states. A JVM option like -Xcheck:fieldinit could cause the JVM to transparently allocate the tracking bits and/or sentinels, and adjust the field read and write “microcode” to do the extra bookkeeping. To remain spec-compliant, the JVM would allow RBFW conditions (since they are not really errors) but it would somehow report them to the user as possible problems.

The specifications (both JLS and JVMS) can also be extended to allow the language to request special handling of RBFW conditions.

With lazy statics (JDK-8209964), a RBFW of a lazy static final field is repaired, not solely by executing <clinit>, but by executing the initializer of that field.
A static configured like a lazy static, but without an initializer, could be specified to throw a known exception (e.g., IllegalStateException). This makes sense for both finals and non-finals.
Given per-instance state tracking, the behavior of non-statics can be adjusted in the same new degrees of freedom.
Given per-array-element state tracking, and special array factories, even array element behavior can be adjusted in similar ways.

Some of the above mechanisms could replace the @Stable annotation in the JDK code base. This annotation tells the JIT to constant-fold non-default values in marked fields and arrays. Use of @Stable is a high-wire act for experts, not backed up by actual state tracking, and so by itself cannot be standarized.

Some of the above mechanisms could assist the implementation of weakly null-excluding field types, such as String!, as proposed by Valhalla and elsewhere. They are “weak” in the sense that they erase to undecorated types (such as String), but they can be made more reliable by strengthening containers of those types. For example, a field of type String! would be declared in the JVM as Ljava/lang/String; but would also be marked for RBFW detection. It might also be marked for an additional detection, of writes of null.

For some use cases, these features pair well with customized read and write logic, to dtect exceptional values. For example, marking a String! field so that both getfield and putfield would reject null provides a reasonable translation strategy for both final and non-final fields of that null-excluding type.

For some use cases with non-final variables, a way should be provided to gently query whether a variable is readable, without taking the exception. Again, for mutable variables, a state transition back to the unbound state might be useful. These operations could be encoded using overloadings of the getfield and putfield bytecodes.

Such mechanisms would provide a better foundation for Scala lazy values and similar features in other JVM-hosted languages.

The next level beyond initialization control is control of all state changes. This amounts to some fancy way to let the JVM user adjust the “microcode” of getfield and friends, so that a value moving between stack and field is adjusted by some sort of projection/embedding pair (related to a type-restriction, perhaps: this is a connection to reified generics). Or maybe, the value moving between stack and field, or some other state, is vetted or normalized to some correctness criterion.

In the use cases given earlier, one value-correctness criterion is null-exclusion, and or RBFW exclusion. But others seem possible as well.

One interesting possibility is field-confinement, as in Rose JVMLS17 talk, slide 56, “Making the best of the Object header”. In this scenario, reading or writing a field would always test a correctness condition that guarantees race-free access; specifically, the reading or writing would throw an exception if the object were not already locked by the current thread. Alternatively, it could repair the problem by waiting. Viewed a certain way, that is exactly what synchronized methods do: They repair any missing synchronization; there is no reason we couldn’t apply the same rule to field gets and puts. The feature of throwing (rather than reparing) if not synchronized would be new to Java, but such a fast-fail check is normal in Java API designs. Perhaps a variation like ensure-synchronized, applied as a modifier to both methods and fields, would provide a useful way to get this fast-fail behavior. The implementation of such a thing for fields could naturally use a programmable get/put feature in the JVM.

But this next level of adding logic to gets and puts, applied at the language level, takes us perilously close to some kind of “properties in Java” conversation, which would be very unlikely to end well. Yet the JVM may have legitimate use cases for programmable get/put actions on fields, even if the JLS does not commit to roll out properties (per se) in the language.

Implementation notes

The extra variable states can be represented in the JVM by using side bits, sentinel values, or both. The JVM can track writes, when the written value is the type default value, and set a side bit atomically in that event, instead of writing the default value (if the side bit was in fact clear). Then the RBFW check simply looks for the default value, and if that is seen, then the side bit is checked as well to see if the field was explicitly nulled out. The order of operations is important, to avoid races. Once set, the side is never cleared, and any given write changes either the side bit or the field, so there are no new race conditions in this scheme.

A simplier way to distinguish states, sometimes possible, is for the JVM to adjust reads and writes of the default value to read or write a non-zero sentinel, invisible to the user. No side bit is needed, but this does not work for types like int, which do not have enough bits to represent the required extra sentinel value.

Again, for tracking the “repair started” condition, a second side bit or second sentinel value can be used. Such schemes are well-understood; for example see Scala SIP-20, version V4.

Doing this for arrays is harder, since array layouts are somewhat less abstract than object layouts. The sentinel mechanism might apply to array elements. Otherwise, additional tracking bits have to go somewhere. This can be done without layout changes if the array object header points to an inflated lock, and that lock in turn points to a side array of tracking bits. Or, the array layout itself can be made polymorphic, and the side bits mixed in near the array elements. There are ways to do this without wasting a lot of space, by blocking the array into spans on the scale of cache lines.

If a reference variable is marked for the JVM to reject RBFW errors, and if the field is also marked so that writes of null are also rejected, then it follows that the only way a null can be stored in that field is if it is not yet initialized. In that case, neither a tracking bit nor a special sentinel are needed; the normal null pointer serves as the sentinel for the state to reject. This combination of conditions give a cheap way to implement String! fields and array elements.