JEP 447: Statements before super() (Preview)
Owner | Archie Cobbs |
Type | Feature |
Scope | SE |
Status | Candidate |
Component | specification / language |
Discussion | amber dash dev at openjdk dot java dot net |
Reviewed by | Alex Buckley, Brian Goetz, Vicente Arturo Romero Zaldivar |
Created | 2023/01/20 17:33 |
Updated | 2023/05/18 23:52 |
Issue | 8300786 |
Summary
In constructors in the Java programming language, allow statements that do not reference the instance being created to appear before this()
or super()
. This is a preview language feature.
Goals
-
In constructors, allow statements that do not access the instance being created to appear prior to invocations of
this()
orsuper()
. -
Correct an error in the specification which defines constructor invocations as a static context.
-
Preserve existing safety and initialization guarantees for constructors.
-
Do not change the behavior of any existing program.
Non-Goals
-
Modifications to the Java Virtual Machine Specification (JVMS) — This proposal may prompt reconsideration of the JVMS's current restrictions on constructors, but we do not intend here to revise the JVMS.
-
Maximizing Java Language Specification (JLS) and JVMS alignment — These changes will bring the JLS and JVMS into closer alignment, but it is not a goal to harmonize them completely. The JLS and JVMS address different problem domains, and therefore it is reasonable for them to differ in what they allow. For example, the JVMS allows a constructor to write to the same final field multiple times, whereas the JLS does not.
-
The introduction of new language concepts — The rules for code appearing before
this()
orsuper()
should be familiar and intuitive to developers, similar to the rules for code appearing insidethis()
andsuper()
parameter lists, which also executes before superclass construction.
Motivation
Before an object can be used, its state must be initialized. In order to ensure orderly object initialization, the JLS includes a variety of rules specifically related to object construction. For example, dataflow analysis ensures that every final field is assigned a definite value.
For classes in a non-trivial class hierarchy, object initialization does not occur in a single step. An object's state is the composition of groups of fields: the group of fields defined in the class itself plus the groups of fields defined in its superclasses. Each group of fields is initialized in a separate step by a corresponding constructor in those fields' defining class. An object is not fully initialized until every class in its hierarchy has initialized its own fields.
To keep this process orderly, the JLS requires that superclass constructors execute prior to subclass constructors. Thus objects are always initialized from the top down. This ensures that, at each level, a constructor can assume that the fields in all of its superclasses have been initialized. This guarantee is important because constructors often need to rely on functionality in a superclass, and the superclass would not be able to guarantee correct behavior without the assumption that its own initialization is complete. For example, it is common for a constructor to invoke superclass methods to configure or prepare the object for a specific task.
To enforce top-down initialization, the JLS requires that invocations of this()
or super()
in a constructor always appear as the first statement. This does indeed guarantee top-down initialization. It does so, however, in a heavy-handed way, by taking what is really a semantic requirement ("initialize the superclass before accessing the new instance") and enforcing it with a syntactic requirement ("super()
or this()
must be the first statement").
A rule that more carefully addresses the requirement to ensure top-down initialization would allow arbitrary statements prior to superclass construction as long as the instance's fields are not read until superclass construction completes. This would allow constructors to, for example, do housekeeping prior to superclass construction. Such a rule would closely follow the familiar existing rules for blank final fields. Those rules disallow reading prior to initialization, ensure that initialization happens exactly once, and allow full access afterward.
The fact that the current rule is unnecessarily restrictive is, in itself, a reason to change it. There are also practical reasons to relax this restriction. For one, the current rule causes idioms commonly used within normal methods to be either difficult or impossible to use within constructors. Below are a few examples.
Implementing fail-fast
Sometimes we need to validate a constructor parameter that is passed up to the superclass constructor. Today we can only do this in-line, e.g., using static methods:
public class PositiveBigInteger extends BigInteger {
// This logic really belongs in the constructor
private static long verifyPositive(long value) {
if (value <= 0)
throw new IllegalArgumentException("non-positive value");
return value;
}
public PositiveBigInteger(long value) {
super(PositiveBigInteger.verifyPositive(value));
}
}
or else after the fact, potentially doing useless work:
public class PositiveBigInteger extends BigInteger {
public PositiveBigInteger(long value) {
super(value); // potentially useless work here
if (value <= 0)
throw new IllegalArgumentException("non-positive value");
}
}
It would be more natural to validate parameters as the first order of business, just as in normal methods:
public class PositiveBigInteger extends BigInteger {
public PositiveBigInteger(long value) {
if (value <= 0)
throw new IllegalArgumentException("non-positive value");
super(value);
}
}
Telescoping Constructors
This use of fail-fast is especially helpful with telescoping constructors, which is a pattern by which a class provides simplified constructors that delegate to increasingly more general constructors.
An example is the class java.lang.Thread
, which has nine public constructors taking various combinations of String
(thread name), ThreadGroup
, Runnable
, long
(stack size), and boolean
(inherit thread locals) parameters. The most general constructor takes them all:
public Thread(ThreadGroup group, Runnable task, String name,
long stackSize, boolean inheritInheritableThreadLocals)
and all of the other constructors internally delegate up to it via this()
. However, simplified constructors often need to apply additional restrictions and/or preparation to their parameter(s).
For example, while the most general Thread
constructor allows a null name
parameter (resulting in a default generated thread name), the Thread(String)
constructor requires its name
parameter to be non-null. Currently this check must be done by a private static null-check method (analogous to verifyPositive()
in the above example), but it would be more natural if telescoping constructors could each contain their own required preparation logic, for example:
public Thread(String name) {
if (name == null)
throw new NullPointerException("'name' is null");
this(null, null, name, 0, false);
}
Passing one value to a superclass constructor twice
Sometimes we need to compute a value and pass it to the superclass constructor twice, as two different arguments. Today the only way to do that is to add an intermediate constructor:
public class MyExecutor extends ScheduledThreadPoolExecutor {
private static class MyFactoryHandler
implements ThreadFactory, RejectedExecutionHandler
{
...
}
// Extra intermediate constructor we must hop through
private MyExecutor(int corePoolSize, MyFactoryHandler factory) {
super(corePoolSize, factory, factory);
}
public MyExecutor(int corePoolSize) {
this(corePoolSize, new MyFactoryHandler());
}
}
A more straightforward implementation might look like this:
public class MyExecutor extends ScheduledThreadPoolExecutor {
private static class MyFactoryHandler
implements ThreadFactory, RejectedExecutionHandler
{
...
}
public MyExecutor(int corePoolSize) {
MyFactoryHandler factory = new MyFactoryHandler();
super(corePoolSize, factory, factory);
}
}
Complex preparation of superclass constructor arguments
Sometimes we must perform non-trivial computation in order to prepare superclass arguments. For example:
public class MyBigInteger extends BigInteger {
/**
* Use the public key integer extracted from the given certificate.
*
* @param certificate public key certificate
* @throws IllegalArgumentException if certificate type is unsupported
*/
public MyBigInteger(Certificate certificate) {
final byte[] bigIntBytes;
PublicKey pubkey = certificate.getPublicKey();
if (pubkey instanceof RSAKey rsaKey)
bigIntBytes = rsaKey.getModulus().toByteArray();
else if (pubkey instanceof DSAPublicKey dsaKey)
bigIntBytes = dsaKey.getY().toByteArray();
else if (pubkey instanceof DHPublicKey dhKey)
bigIntBytes = dhKey.getY().toByteArray();
else
throw new IllegalArgumentException("unsupported cert type");
super(bigIntBytes);
}
}
All of the examples above that show code before super()
adhere to the semantic requirement of "initialize the superclass before accessing the new instance" and therefore preserve top-down initialization.
The JVMS already allows this
Fortunately, the JVMS already grants suitable flexibility to constructors:
-
Multiple invocations of
this()
andsuper()
may appear in a constructor as long as on any code path there is exactly one invocation. -
Arbitrary code may appear before
this()
andsuper()
as long as that code does not reference the instance under construction except to assign fields. -
However, invocations of
this()
andsuper()
may not appear within atry
block, i.e., within a bytecode exception range.
These more permissive rules still ensure top-down initialization:
-
Superclass initialization always happens exactly once, either directly via
super()
or indirectly viathis()
; and -
Uninitialized instances are off-limits except for field assignments, which do not affect outcomes, until superclass initialization is complete.
In fact, the current inconsistency between the JVMS and the JLS is an historical artifact. The original JVMS was more restrictive as well, but this led to issues with the initialization of compiler-generated fields for new language features such as inner classes and captured free variables. As a result the JVMS was relaxed to accommodate the compiler, but this new flexibility never made its way back up to the language level.
The JLS contains a bug
JLS §8.1.3 defines static context and notes that
The purpose of a static context is to demarcate code that must not refer explicitly or implicitly to the current instance of the class whose declaration lexically encloses the static context.
The JLS naturally applies this concept to code inside a super()
or this()
invocation. Prior to the introduction of generics, inner classes, and captured free variables, this yielded the correct semantics for superclass constructor invocation.
However, as §8.1.3 notes a static context prohibits
-
this
expressions, whether unqualified or qualified, -
Unqualified references to instance variables of any lexically enclosing class or interface declaration, and
-
References to type parameters, local variables, formal parameters, and exception parameters declared by methods or constructors of any lexically enclosing class or interface declaration that is outside the immediately enclosing class or interface.
These rules make this program illegal:
import java.util.concurrent.atomic.AtomicReference;
public class A<T> extends AtomicReference<T> {
private int intval;
public A(T obj) {
super(obj);
}
public class B extends A<T> {
public B() {
super((T)null); // illegal - 'T'
}
}
public class C extends A<Object> {
C() {
super(A.this); // illegal - 'this'
}
}
public class D extends A<Integer> {
D() {
super(intval); // illegal - 'intval'
}
}
public static Object method(int x) {
class E extends A<Float> {
E() {
super((float)x); // illegal - 'x'
}
}
return new E();
}
}
Yet this program has compiled successfully since at least Java 8, and these idioms are in common use!
The operative mental model here is that code "must not refer explicitly or implicitly to the current instance of the class whose declaration lexically encloses" the code in question. However the concept of static context, as defined, goes beyond that to forbid, e.g., even references to generic type parameters.
The underlying issue is that the JLS applies the concept of static context to two scenarios which are similar, but not equivalent:
-
When there is no
this
instance defined, e.g., as within a static method, and -
When
this
is defined but must not be referenced, e.g., prior to superclass initialization.
The current definition of static context is appropriate for the first scenario. After the addition of generics, inner classes, and captured free variables to the language, however, it is no longer appropriate for the second scenario.
We therefore define a new concept, a pre-construction context, which is like a static context but is less restrictive. It still disallows accessing the current instance in any way but does not disallow, for example, using the class's generic type parameters or accessing an outer instance. This more accurately matches not only the underlying requirement but also developer expectations, common usage, and the compiler's behavior going back as far as Java 8. (This change will, effectively, fix 8301649 by codifying the compiler's current behavior.)
Description
Summary of JLS modifications
-
Update the grammar to allow statements (other than
return
) to appear prior tosuper()
orthis()
. -
Define the statements up to and including a
super()
orthis()
call as a pre-construction context (this includes the evaluation ofthis()
andsuper()
parameters). -
Narrow the definition of static context to exclude pre-construction contexts.
-
Update restrictions on static contexts to also restrict pre-construction contexts where appropriate.
Records
Record constructors are subject to more restrictions that normal constructors. In particular:
-
Canonical record constructors may not contain any explicit
super()
orthis()
invocation, and -
Non-canonical record constructors may invoke
this()
, but notsuper()
.
These restrictions remain in place, but otherwise record constructors also benefit from these changes. The net result is that non-canonical record constructors may now contain prologue statements before this()
.
Testing
We will test the compiler changes with existing unit tests, unchanged except for those tests that verify changed behavior, plus new positive and negative test cases as appropriate.
We will compile all JDK classes using the previous and new versions of the compiler and verify that the resulting bytecode is identical.
No platform-specific testing should be required.
Risks and Assumptions
Risks associated with any language change can be divided into two categories: behavioral risk and environmental risk.
Behavioral risk includes any changes or unexpected behavior in the actual execution of programs. Because an explicit goal of this work is to not change the behavior of existing programs, the risk to the proper execution of existing code should be low. However, there is also the risk that new code doesn't behave as expected. Fortunately, pre-construction context semantics are already familiar to developers, even though that term is new to the language specification, because these semantics are what the compiler has been applying to this()
and super()
parameter evaluation since Java 8. So for example, one relatively natural mental model that developers can use is to think of statements before super()
as an "unrolling" of code that previously had to be shoehorned into this()
and super()
parameter expressions; other than its being moved and granted additional flexibility (e.g., full statements and control flow instead of just expressions), it's treated the same by the compiler contextually.
The more widespread category of risk is environmental risk. Java programmers have understood since time immemorial that any superclass constructor invocation must always appear as the first statement in a constructor, and this requirement is deeply embedded into the code analyzers, style checkers, syntax highlighters, development environments, and other tools in the Java ecosystem. As with any language change, there may be a period of pain as the various environmental tools are updated. Java developers using tools that have not yet been updated may be confused when the compiler tells them one story while the syntax highlighter tells another.
The fact that this language change is backward compatible could actually work against some of these tools: instead of being tripped up during an initial lexical scan by an unfamiliar syntax, they might appear to function normally, while some internal code assumption is no longer valid, leading to bugs that are harder to detect.