JEP 447: Statements before super() (Preview)

OwnerArchie Cobbs
TypeFeature
ScopeSE
StatusCandidate
Componentspecification / language
Discussionamber dash dev at openjdk dot java dot net
Reviewed byAlex Buckley, Brian Goetz, Vicente Arturo Romero Zaldivar
Created2023/01/20 17:33
Updated2023/05/18 23:52
Issue8300786

Summary

In constructors in the Java programming language, allow statements that do not reference the instance being created to appear before this() or super(). This is a preview language feature.

Goals

Non-Goals

Motivation

Before an object can be used, its state must be initialized. In order to ensure orderly object initialization, the JLS includes a variety of rules specifically related to object construction. For example, dataflow analysis ensures that every final field is assigned a definite value.

For classes in a non-trivial class hierarchy, object initialization does not occur in a single step. An object's state is the composition of groups of fields: the group of fields defined in the class itself plus the groups of fields defined in its superclasses. Each group of fields is initialized in a separate step by a corresponding constructor in those fields' defining class. An object is not fully initialized until every class in its hierarchy has initialized its own fields.

To keep this process orderly, the JLS requires that superclass constructors execute prior to subclass constructors. Thus objects are always initialized from the top down. This ensures that, at each level, a constructor can assume that the fields in all of its superclasses have been initialized. This guarantee is important because constructors often need to rely on functionality in a superclass, and the superclass would not be able to guarantee correct behavior without the assumption that its own initialization is complete. For example, it is common for a constructor to invoke superclass methods to configure or prepare the object for a specific task.

To enforce top-down initialization, the JLS requires that invocations of this() or super() in a constructor always appear as the first statement. This does indeed guarantee top-down initialization. It does so, however, in a heavy-handed way, by taking what is really a semantic requirement ("initialize the superclass before accessing the new instance") and enforcing it with a syntactic requirement ("super() or this() must be the first statement").

A rule that more carefully addresses the requirement to ensure top-down initialization would allow arbitrary statements prior to superclass construction as long as the instance's fields are not read until superclass construction completes. This would allow constructors to, for example, do housekeeping prior to superclass construction. Such a rule would closely follow the familiar existing rules for blank final fields. Those rules disallow reading prior to initialization, ensure that initialization happens exactly once, and allow full access afterward.

The fact that the current rule is unnecessarily restrictive is, in itself, a reason to change it. There are also practical reasons to relax this restriction. For one, the current rule causes idioms commonly used within normal methods to be either difficult or impossible to use within constructors. Below are a few examples.

Implementing fail-fast

Sometimes we need to validate a constructor parameter that is passed up to the superclass constructor. Today we can only do this in-line, e.g., using static methods:

public class PositiveBigInteger extends BigInteger {

    // This logic really belongs in the constructor
    private static long verifyPositive(long value) {
        if (value <= 0)
            throw new IllegalArgumentException("non-positive value");
        return value;
    }

    public PositiveBigInteger(long value) {
        super(PositiveBigInteger.verifyPositive(value));
    }

}

or else after the fact, potentially doing useless work:

public class PositiveBigInteger extends BigInteger {

    public PositiveBigInteger(long value) {
        super(value);           // potentially useless work here
        if (value <= 0)
            throw new IllegalArgumentException("non-positive value");
    }

}

It would be more natural to validate parameters as the first order of business, just as in normal methods:

public class PositiveBigInteger extends BigInteger {

    public PositiveBigInteger(long value) {
        if (value <= 0)
            throw new IllegalArgumentException("non-positive value");
        super(value);
    }

}

Telescoping Constructors

This use of fail-fast is especially helpful with telescoping constructors, which is a pattern by which a class provides simplified constructors that delegate to increasingly more general constructors.

An example is the class java.lang.Thread, which has nine public constructors taking various combinations of String (thread name), ThreadGroup, Runnable, long (stack size), and boolean (inherit thread locals) parameters. The most general constructor takes them all:

public Thread(ThreadGroup group, Runnable task, String name,
              long stackSize, boolean inheritInheritableThreadLocals)

and all of the other constructors internally delegate up to it via this(). However, simplified constructors often need to apply additional restrictions and/or preparation to their parameter(s).

For example, while the most general Thread constructor allows a null name parameter (resulting in a default generated thread name), the Thread(String) constructor requires its name parameter to be non-null. Currently this check must be done by a private static null-check method (analogous to verifyPositive() in the above example), but it would be more natural if telescoping constructors could each contain their own required preparation logic, for example:

public Thread(String name) {
    if (name == null)
        throw new NullPointerException("'name' is null");
    this(null, null, name, 0, false);
}

Passing one value to a superclass constructor twice

Sometimes we need to compute a value and pass it to the superclass constructor twice, as two different arguments. Today the only way to do that is to add an intermediate constructor:

public class MyExecutor extends ScheduledThreadPoolExecutor {

    private static class MyFactoryHandler
        implements ThreadFactory, RejectedExecutionHandler
    {
        ...
    }

    // Extra intermediate constructor we must hop through
    private MyExecutor(int corePoolSize, MyFactoryHandler factory) {
        super(corePoolSize, factory, factory);
    }

    public MyExecutor(int corePoolSize) {
        this(corePoolSize, new MyFactoryHandler());
    }

}

A more straightforward implementation might look like this:

public class MyExecutor extends ScheduledThreadPoolExecutor {

    private static class MyFactoryHandler
        implements ThreadFactory, RejectedExecutionHandler
    {
        ...
    }

    public MyExecutor(int corePoolSize) {
        MyFactoryHandler factory = new MyFactoryHandler();
        super(corePoolSize, factory, factory);
    }

}

Complex preparation of superclass constructor arguments

Sometimes we must perform non-trivial computation in order to prepare superclass arguments. For example:

public class MyBigInteger extends BigInteger {

    /**
     * Use the public key integer extracted from the given certificate.
     *
     * @param certificate public key certificate
     * @throws IllegalArgumentException if certificate type is unsupported
     */
    public MyBigInteger(Certificate certificate) {
        final byte[] bigIntBytes;
        PublicKey pubkey = certificate.getPublicKey();
        if (pubkey instanceof RSAKey rsaKey)
            bigIntBytes = rsaKey.getModulus().toByteArray();
        else if (pubkey instanceof DSAPublicKey dsaKey)
            bigIntBytes = dsaKey.getY().toByteArray();
        else if (pubkey instanceof DHPublicKey dhKey)
            bigIntBytes = dhKey.getY().toByteArray();
        else
            throw new IllegalArgumentException("unsupported cert type");
        super(bigIntBytes);
    }

}

All of the examples above that show code before super() adhere to the semantic requirement of "initialize the superclass before accessing the new instance" and therefore preserve top-down initialization.

The JVMS already allows this

Fortunately, the JVMS already grants suitable flexibility to constructors:

These more permissive rules still ensure top-down initialization:

In fact, the current inconsistency between the JVMS and the JLS is an historical artifact. The original JVMS was more restrictive as well, but this led to issues with the initialization of compiler-generated fields for new language features such as inner classes and captured free variables. As a result the JVMS was relaxed to accommodate the compiler, but this new flexibility never made its way back up to the language level.

The JLS contains a bug

JLS §8.1.3 defines static context and notes that

The purpose of a static context is to demarcate code that must not refer explicitly or implicitly to the current instance of the class whose declaration lexically encloses the static context.

The JLS naturally applies this concept to code inside a super() or this() invocation. Prior to the introduction of generics, inner classes, and captured free variables, this yielded the correct semantics for superclass constructor invocation.

However, as §8.1.3 notes a static context prohibits

These rules make this program illegal:

import java.util.concurrent.atomic.AtomicReference;

public class A<T> extends AtomicReference<T> {

    private int intval;

    public A(T obj) {
        super(obj);
    }

    public class B extends A<T> {
        public B() {
            super((T)null);         // illegal - 'T'
        }
    }

    public class C extends A<Object> {
        C() {
            super(A.this);          // illegal - 'this'
        }
    }

    public class D extends A<Integer> {
        D() {
            super(intval);          // illegal - 'intval'
        }
    }

    public static Object method(int x) {
        class E extends A<Float> {
            E() {
                super((float)x);    // illegal - 'x'
            }
        }
        return new E();
    }
}

Yet this program has compiled successfully since at least Java 8, and these idioms are in common use!

The operative mental model here is that code "must not refer explicitly or implicitly to the current instance of the class whose declaration lexically encloses" the code in question. However the concept of static context, as defined, goes beyond that to forbid, e.g., even references to generic type parameters.

The underlying issue is that the JLS applies the concept of static context to two scenarios which are similar, but not equivalent:

  1. When there is no this instance defined, e.g., as within a static method, and

  2. When this is defined but must not be referenced, e.g., prior to superclass initialization.

The current definition of static context is appropriate for the first scenario. After the addition of generics, inner classes, and captured free variables to the language, however, it is no longer appropriate for the second scenario.

We therefore define a new concept, a pre-construction context, which is like a static context but is less restrictive. It still disallows accessing the current instance in any way but does not disallow, for example, using the class's generic type parameters or accessing an outer instance. This more accurately matches not only the underlying requirement but also developer expectations, common usage, and the compiler's behavior going back as far as Java 8. (This change will, effectively, fix 8301649 by codifying the compiler's current behavior.)

Description

Summary of JLS modifications

Records

Record constructors are subject to more restrictions that normal constructors. In particular:

These restrictions remain in place, but otherwise record constructors also benefit from these changes. The net result is that non-canonical record constructors may now contain prologue statements before this().

Testing

We will test the compiler changes with existing unit tests, unchanged except for those tests that verify changed behavior, plus new positive and negative test cases as appropriate.

We will compile all JDK classes using the previous and new versions of the compiler and verify that the resulting bytecode is identical.

No platform-specific testing should be required.

Risks and Assumptions

Risks associated with any language change can be divided into two categories: behavioral risk and environmental risk.

Behavioral risk includes any changes or unexpected behavior in the actual execution of programs. Because an explicit goal of this work is to not change the behavior of existing programs, the risk to the proper execution of existing code should be low. However, there is also the risk that new code doesn't behave as expected. Fortunately, pre-construction context semantics are already familiar to developers, even though that term is new to the language specification, because these semantics are what the compiler has been applying to this() and super() parameter evaluation since Java 8. So for example, one relatively natural mental model that developers can use is to think of statements before super() as an "unrolling" of code that previously had to be shoehorned into this() and super() parameter expressions; other than its being moved and granted additional flexibility (e.g., full statements and control flow instead of just expressions), it's treated the same by the compiler contextually.

The more widespread category of risk is environmental risk. Java programmers have understood since time immemorial that any superclass constructor invocation must always appear as the first statement in a constructor, and this requirement is deeply embedded into the code analyzers, style checkers, syntax highlighters, development environments, and other tools in the Java ecosystem. As with any language change, there may be a period of pain as the various environmental tools are updated. Java developers using tools that have not yet been updated may be confused when the compiler tells them one story while the syntax highlighter tells another.

The fact that this language change is backward compatible could actually work against some of these tools: instead of being tripped up during an initial lexical scan by an unfamiliar syntax, they might appear to function normally, while some internal code assumption is no longer valid, leading to bugs that are harder to detect.