JEP draft: Statements before super()

Authoracobbs
TypeFeature
ScopeJDK
StatusDraft
Componentspecification / language
Discussionamber dash dev at openjdk dot java dot net
Created2023/01/20 17:33
Updated2023/02/08 05:37
Issue8300786

Summary

Allow statements that do not reference the instance being created to appear before this() or super() in a constructor.

Goals

Non-Goals

Modifications to the JVMS. These changes may prompt reconsideration of the JVMS's current restrictions on constructors, however, in order to avoid unnecessary linkage between JLS and JVMS changes, any such modifications should be proposed in a follow-on JEP. This JEP assumes no change to the current JVMS.

Maximizing JLS and JVMS Alignment. Although these changes will bring the JLS and JVMS into closer alignment, it is not a goal to harmonize them. The JLS and JVMS address different problem domains, and therefore it is reasonable for them to differ in what they allow. For one example, the JVMS allows a constructor to write to the same final field multiple times, whereas the JLS does not.

Changes to Current Behavior. There is no intention to change the behavior of any program following current JLS. This change strictly expands the universe of valid programs, without affecting existing ones.

Addressing Larger Language Concerns. Thinking about the interplay between superclass constructors and subclass initialization has evolved since the Java language was first created. This work should be considered a pragmatic tweak rather than a statement on language design.

Motivation

As in most object-oriented languages, Java defines an explicit "construction" step that occurs after memory allocation but before "regular" use of an object. This acknowledges the fact that, in general, some initialization and setup of object state is required before objects can be safely used. In order to ensure orderly object initialization, Java specifies a variety of rules specifically related to object construction. For example, dataflow analysis verifies that all final fields have definite values assigned during construction.

However, for classes in a non-trivial class hierarchy, object initialization does not occur in a single step. An object's state consists of the composition of groups of fields: the group of fields defined in the class itself, plus the groups of fields defined in each ancestor superclass. Initialization of each group of fields is performed as a separate step by a corresponding constructor in those fields' defining class. An object is not fully initialized until every class in the hierarchy has had its opportunity to initialize its own fields.

To keep this process orderly, Java requires that superclass constructors execute prior to subclass constructors. The result is that Java objects are always initialized "top down". This ensures that at each level, a constructor may assume that the fields in all of its superclasses have already been initialized. This guarantee is important, as constructors often need to rely on some functionality in the superclass, and the superclass wouldn't be able to guarantee correct behavior without the assumption that its own initialization were complete. For example, it's common for a constructor to invoke superclass methods to configure or prepare the object for some specific task.

In order to enforce this top down initialization, the Java language requires that invocations of this() or super() always appear as the first statement in a constructor. This indeed guarantees top down initialization, but it does so in a heavy-handed way, by taking what is really a semantic requirement ("Intialize the superclass before accessing the new instance") and enforcing it with a syntactic requirement ("super() or this() must literally be the first statement").

A rule that more carefully addresses the requirement to ensure top down initialization would allow arbitrary statements prior to superclass construction, as long as the this instance remains hands-off until superclass construction completes. This would allow constructors to do any desired "housekeeping" prior to superclass construction. Such a rule would closely follow the familiar existing rules for blank final fields, where access is disallowed prior to initialization, the initialization must happen exactly once, and full access is permitted afterward.

The fact that the current enforcement mechanism is unnecessarily restrictive is, in itself, a reason for change. There are also practical reasons to relax this restriction. For one, the current rules cause certain idioms commonly used within normal methods to be either difficult or impossible to use within constructors. Below are a few examples.

Implementing "Fail Fast"

A subclass constructor sometimes wishes to enforce a requirement on a parameter that is also passed up to the superclass constructor. Today such requirements can only be applied "inline" e.g., using static methods, or after the fact.

For example:

public class PositiveBigInteger extends BigInteger {

    public PositiveBigInteger(long value) {
        super(PositiveBigInteger.verifyPositive(value));
    }

    // This logic really belongs in the constructor
    private static verifyPositive(long value) {
        if (value <= 0)
            throw new IllegalArgumentException("non-positive value");
    }
}

or:

public class PositiveBigInteger extends BigInteger {

    public PositiveBigInteger(long value) {
        super(value);   // potentially doing useless work here
        if (value <= 0)
            throw new IllegalArgumentException("non-positive value");
    }
}

It would be more natural to validate parameters as the first order of business, just as in normal methods:

public class PositiveBigInteger extends BigInteger {

    public PositiveBigInteger(long value) {
        if (value <= 0)
            throw new IllegalArgumentException("non-positive value");
        super(value);
    }
}

Passing Superclass Constructor the Same Parameter Twice

Sometimes you need to create a single object and pass it to the superclass constructor twice, as two different parameters.

Today the only way to do that requires adding an extra intermediate constructor:

public class MyExecutor extends ScheduledThreadPoolExecutor {

    public MyExecutor(int corePoolSize) {
        this(corePoolSize, new MyFactoryHandler());
    }

    // Extra intermediate constructor we must hop through
    private MyExecutor(int corePoolSize, MyFactoryHandler factory) {
        super(corePoolSize, factory, factory);
    }

    private static class MyFactoryHandler
      implements ThreadFactory, RejectedExecutionHandler {
        ...
    }
}

A more straightforward implementation might look like this:

public class MyExecutor extends ScheduledThreadPoolExecutor {

    public MyExecutor(int corePoolSize) {
        MyFactoryHandler factory = new MyFactoryHandler();
        super(corePoolSize, factory, factory);
    }

    private static class MyFactoryHandler
      implements ThreadFactory, RejectedExecutionHandler {
        ...
    }
}

Complex Preparation of Superclass Constructor Parameters

Sometimes, complex handling or preparation of superclass parameters is needed.

For example:

public class MyBigInteger extends BigInteger {

    /**
     * Use the public key integer extracted from the given certificate.
     *
     * @param certificate public key certificate
     * @throws IllegalArgumentException if certificate type is unsupported
     */
    public MyBigInteger(Certificate certificate) {
        final byte[] bigIntBytes;
        PublicKey pubkey = certificate.getPublicKey();
        if (pubkey instanceof RSAKey rsaKey)
            bigIntBytes = rsaKey.getModulus().toByteArray();
        else if (pubkey instanceof DSAPublicKey dsaKey)
            bigIntBytes = dsaKey.getY().toByteArray();
        else if (pubkey instanceof DHPublicKey dhKey)
            bigIntBytes = dhKey.getY().toByteArray();
        else
            throw new IllegalArgumentException("unsupported cert type");
        super(bigIntBytes);
    }
}

All of the above examples showing code before super() still adhere to the principle of "intialize the superclass before accessing the new instance" and therefore preserve top down initialization.

What the JVMS Actually Allows

Fortunately, the JVMS already grants suitable flexibility to constructors:

As described above, these more permissive rules still ensure "top down" initialization:

In fact, the current inconsistency between the JLS and the JVMS is somewhat a historical artifact: the original JVMS was more restrictive as well, however, this led to issues with initialization of compiler-generated fields that supported new language features such as inner classes and captured free variables. As a result, the JVMS was relaxed to accommodate the compiler, but this new flexibility never made its way back up to the language level.

Description

Language Changes

The JLS will be modified as follows:

(1) Modify the beginning of §8.8.7 "Constructor Body" to read:

A constructor body may contain an explicit invocation of another constructor of the same class or of the direct superclass (§8.8.7.1).

ConstructorBody:
    { [BlockStatements] } ;
    { [BlockStatements] ExplicitConstructorInvocation [BlockStatements] } ;

It is a compile-time error for a constructor to directly or indirectly invoke itself through a series of one or more explicit constructor invocations involving this.

If a constructor body does not contain an explicit constructor invocation and the constructor being declared is not part of the primordial class Object, then the constructor body implicitly begins with a superclass constructor invocation "super();", an invocation of the constructor of its direct superclass that takes no arguments.

Except for the possibility of explicit constructor invocations, and the prohibitions on return statements (§14.17), the body of a constructor is like the body of a method (§8.4.7).

If a constructor body contains an explicit constructor invocation, the BlockStatements preceding the explicit constructor invocation are called the prologue of the constructor body. The BlockStatements in a constructor with no explicit constructor invocation and the BlockStatements following the explicit constructor invocation in a constructor with an explicit constructor invocation are called the main body of the constructor.

A return statement (§14.17) may be used in the main body of a constructor if it does not include an expression. It is a compile-time error if a return statement appears in the prologue of a constructor body.

(2) Modify this sentence in §8.8.7.1 "Explicit Constructor Invocations":

An explicit constructor invocation statement introduces a pre-initialization context, which includes the prologue of the constructor and the explicit constructor invocation statement, and which prohibits the use of constructs that refer explicitly or implicitly to the current object. These include this or super referring to the current object, unqualified references to instance variables or instance methods of the current object, method references referring to instance methods of the current object, and instantiations of inner classes of the current object's class for which the current object is the enclosing instance (§8.1.3).

(3) Replace the steps for constructor processing in §12.5 with the following:

  1. Assign the arguments for the constructor to newly created parameter variables for this constructor invocation.
  2. If this constructor contains an explicit constructor invocation (§8.8.7.1), then execute the BlockStatements of the prologue of the constructor body. If execution of any statement completes abruptly, then execution of the constructor completes abruptly for the same reason; otherwise, continue with step 3.
  3. If this constructor contains an explicit constructor invocation (§8.8.7.1) of another constructor in the same class (using this), then evaluate the arguments and process that constructor invocation recursively using these same six steps. If that constructor invocation completes abruptly, then this procedure completes abruptly for the same reason; otherwise, continue with step 6.
  4. This constructor does not contain an explicit constructor invocation of another constructor in the same class (using this). If this constructor is for a class other than Object, then this constructor contains an explicit or implicit invocation of a superclass constructor (using super). Evaluate the arguments and process that superclass constructor invocation recursively using these same six steps. If that constructor invocation completes abruptly, then this procedure completes abruptly for the same reason. Otherwise, continue with step 5.
  5. Execute the instance initializers and instance variable initializers for this class, assigning the values of instance variable initializers to the corresponding instance variables, in the left-to-right order in which they appear textually in the source code for the class. If execution of any of these initializers results in an exception, then no further initializers are processed and this procedure completes abruptly with that same exception. Otherwise, continue with step 6.
  6. Execute the main body of this constructor. If that execution completes abruptly, then this procedure completes abruptly for the same reason. Otherwise, this procedure completes normally.

(4) In section §8.1.3:

(5) In section §15.8.3:

Note that the part of a constructor prior to super() is no longer a "static context", but is now a "pre-initialization context". This is a less restrictive version of "static context" that disallows accessing the current instance in any way, but doesn't disallow, for example, use of the class' generic type parameters, or accessing the outer instance via the expression Outer.this.

This verbiage not only more accurately describes the requirement here, but it is also matches developer expectations, common usage, and the compiler's behavior going back as far as Java 8; see also JDK-8301649, which this change will effectively fix by codifying the current compiler behavior.

Records

Record constructors are subject to more restrictions that normal constructors. In particular:

These restrictions remain in place, but otherwise record constructors also benefit from these changes. The net result is that non-canonical record constructors may now contain prologue statements before this().

Testing

Testing of compiler changes will be done using the existing unit tests, which are unchanged except for those tests that verify changed compiler behavior, plus new positive and negative test cases related to this new feature.

All JDK existing classes will be compiled using the previous and new versions of the compiler, and the bytecode compared, to verify there is no change to existing bytecode.

No platform-specific testing should be required.

Risks and Assumptions

An explicit goal of this work is to not change the behavior of existing programs. Therefore, other than any newly created bugs, the risk to existing software should be low.

It's possible that compiling and/or executing newly valid code could trigger bugs in existing code that were not previously accessible.

Dependencies

Java compiler changes - JDK-8194743