JEP draft: Flexible Constructor Bodies (Second Preview)

AuthorArchie Cobbs & Gavin Bierman
OwnerArchie Cobbs
TypeFeature
ScopeSE
StatusSubmitted
Componentspecification / language
Discussionamber dash dev at openjdk dot org
Reviewed byAlex Buckley, Brian Goetz
Created2024/02/13 22:18
Updated2024/04/18 08:16
Issue8325803

Summary

In constructors of Java classes, allow statements to appear before an explicit constructor invocation (super(..) or this(..)). The statements cannot use the instance under construction, but can initialize its fields, which makes classes more reliable when methods are overridden. This is a preview language feature.

History

This feature was originally proposed with a different title by JEP 447, delivered in JDK 22. We here propose to preview the feature for a second time, with the following addition:

Goals

Motivation

Creating an instance of a class involves executing a constructor, which is responsible for establishing invariants about the new instance. For example, suppose a Person class has an invariant that its age field is less than 130; a constructor which takes an age-related parameter (e.g., a birth date) must validate it and either assign to age (thereby establishing the invariant) or throw an exception, as appropriate.

This notion of constructors establishing invariants is enhanced in the presence of inheritance. These invariants are inherited; in other words, the invariants of the superclass should apply to the subclass.

To help ensure the inheritance of invariants, in Java, constructors run from the "top down": A constructor in a superclass must be run -- establishing its own invariants -- before a constructor in a subclass is run.

To guarantee that constructors run from the top down, the Java language requires that in a constructor body, the first statement must be an explicit invocation of another constructor: either super(..) or this(..). If no explicit constructor invocation appears in the constructor body, then the compiler inserts one -- namely, super() -- as the first statement in the constructor body.

Furthermore, the Java language requires that for any explicit constructor invocation, none of its arguments can use the instance under construction in any way.

These two requirements guarantee some predictability and hygiene over the construction of new instances but they are heavy-handed because they outlaw certain familiar programming patterns. The following examples illustrate the issues.

Example: Validating superclass constructor arguments

Sometimes we need to validate an argument that is passed to a superclass constructor. We can validate the argument after calling the superclass constructor, but that means potentially doing unnecessary work:

public class PositiveBigInteger extends BigInteger {
    public PositiveBigInteger(long value) {
        super(value);  // Potentially unnecessary work
        if (value <= 0) throw ...
    }
}

It would be better to declare a constructor that fails fast, by validating its argument before it invokes the superclass constructor. Today we can only do that by calling an auxiliary method in-line, as part of the super(..) call:

public class PositiveBigInteger extends BigInteger {
    public PositiveBigInteger(long value) {
        super(verifyPositive(value));
    }

    private static long verifyPositive(long value) {
        if (value <= 0) throw ...
        return value;
    }
}

The code would be more readable if we could write the validation logic in the constructor body:

public class PositiveBigInteger extends BigInteger {
    public PositiveBigInteger(long value) {
        if (value <= 0) throw ...
        super(value);
    }
}

Example: Preparing superclass constructor arguments

Sometimes we must perform non-trivial computation to prepare arguments for a superclass constructor. Again, we must resort to calling an auxiliary method in-line, as part of the super(..) call. For example, suppose a constructor takes a Certificate argument but must convert it to a byte array for a superclass constructor:

public class Sub extends Super {
    public Sub(Certificate certificate) {
        super(prepareByteArray(certificate));
    }

    private static byte[] prepareByteArray(Certificate certificate) {
        var publicKey = certificate.getPublicKey();
        if (publicKey == null) throw ...
        return switch (publicKey) {
            case RSAKey rsaKey -> ...
            case DSAPublicKey dsaKey -> ...
            default -> ...
        };
    }
}

The code would be more readable if we could prepare the arguments in the constructor body:

public Sub(Certificate certificate) {
    var publicKey = certificate.getPublicKey();
    if (publicKey == null) throw ...
    byte[] certBytes = switch (publicKey) {
        case RSAKey rsaKey -> ...
        case DSAPublicKey dsaKey -> ...
        default -> ...
    };
    super(certBytes );
}

Example: Sharing superclass constructor arguments

Sometimes we need to pass the same value to a superclass constructor more than once, in different arguments. The only way to do this is via an auxiliary constructor:

public class Super {
    public Super(C x, C y) { ... }
}

public class Sub extends Super {
    public  Sub(int i) { this(new C(i)); }  // Prepare the argument for Super's constructor
    private Sub(C x)   { super(x, x); }     // Pass the argument twice to Super's constructor
}

The code would be more maintainable if we could perform the "sharing" in the constructor body, obviating the need for an auxiliary constructor:

public class Sub extends Super {
    public Sub(int i) {
        var x = new C(i);
        super(x, x);
    }
}

In all of these examples, the constructor body that we would like to write contains statements that do not use the instance being constructed before the explicit constructor invocation. Unfortunately, the constructor bodies are rejected by the compiler, even though all of them are safe.

If the Java language could guarantee top down construction with more flexible rules then constructor bodies would be easier to read and write. Constructor bodies could more naturally do argument validation, argument preparation, and argument sharing without calling upon clumsy auxiliary methods or constructors. It is time to move beyond the simplistic syntactic requirements enforced since Java 1.0 that super(..) or this(..) must be the first statement in a constructor body.

Description

The grammar of a constructor body is changed to allow statements before an explicit constructor invocation, that is, from:

ConstructorBody:
    { [ExplicitConstructorInvocation] [BlockStatements] }

to:

ConstructorBody:
    { [BlockStatements] ExplicitConstructorInvocation [BlockStatements] }
    { [BlockStatements] }

Eliding some details, an explicit constructor invocation is either super(...) or this(...).

The statements that appear before an explicit constructor invocation constitute the prologue of the constructor body.

The statements that appear after an explicit constructor invocation constitute the epilogue of the constructor body.

It is permitted to omit an explicit constructor invocation in a constructor body. In this case, the prologue is empty, and all the statements in the constructor body constitute the epilogue.

A return statement is permitted in the epilogue of a constructor body if it does not include an expression. That is, return; is allowed but return e; is not. It is a compile-time error if a return statement appears in the prologue of a constructor body.

Throwing an exception in the prologue or epilogue of a constructor body is permitted. Throwing in the prologue will be typical in fail-fast scenarios.

This is a preview language feature, disabled by default

To try the examples below in JDK 23 you must enable preview features:

Early construction contexts

In the Java language, code that appears in the argument list of an explicit constructor invocation is said to appear in a static context. This means that the arguments to the explicit constructor invocation are treated as if they were code in a static method; in other words, as if no instance is available. The technical restrictions of a static context are stronger than necessary, however, and they prevent code that is useful and safe from appearing as constructor arguments.

Rather than revise the concept of a static context, we introduce the concept of an early construction context that covers both the argument list of an explicit constructor invocation and any statements that appear before it in the constructor body. Code in an early construction context must not use the instance under construction, except to initialize its fields (provided these fields do not have initializers).

This means that any explicit or implicit use of this to refer to the current instance, or to access fields or invoke methods of the current instance, is disallowed in an early construction context:

class A {
    int i;

    A() {
        System.out.print(this);  // Error - refers to the current instance

        var x = this.i;          // Error - explicitly refers to field of the current instance
        this.hashCode();         // Error - explicitly refers to method of the current instance

        var x = i;               // Error - implicitly refers to field of the current instance
        hashCode();              // Error - implicitly refers to method of the current instance

        super();
    }
}

Similarly, any field access, method invocation, or method reference qualified by super is disallowed in an early construction context:

class B {
    int i;
    void m() { ... }
}

class C extends B {
    C() {
        var x = super.i;  // Error
        super.m();        // Error
        super();
    }
}

Use of enclosing instances in an early construction context

When class declarations are nested, the code of an inner class can refer to the instance of an enclosing class. This is because the instance of the enclosing class is created before the instance of the inner class. The code of the inner class -- including constructor bodies -- can access fields and invoke methods of the enclosing instance, using either simple names or qualified this expressions. Accordingly, operations on an enclosing instance are allowed in an early construction context.

In the program below, the declaration of Inner is nested in the declaration of Outer, so every instance of Inner has an enclosing instance of Outer. In the constructor of Inner, code in the early construction context can refer to the enclosing instance and its members, either via simple names or via Outer.this.

class Outer {
    int i;
    void hello() { System.out.println("Hello"); }

    class Inner {
        int j;

        Inner() {
            var x = i;             // OK - implicitly refers to field of enclosing instance
            var y = Outer.this.i;  // OK - explicitly refers to field of enclosing instance
            hello();               // OK - implicitly refers to method of enclosing instance
            Outer.this.hello();    // OK - explicitly refers to method of enclosing instance
            super();
        }
    }
}

By contrast, in the constructor of Outer shown below, code in the early construction context cannot instantiate the Inner class with new Inner(). This expression is really this.new Inner(), meaning that it uses the current instance of Outer as the enclosing instance for the Inner object. Per the earlier rule, any explicit or implicit use of this to refer to the current instance is disallowed in an early construction context.

class Outer {
    class Inner {}

    Outer() {
        var x = new Inner();       // Error - implicitly refers to the current instance of Outer
        var y = this.new Inner();  // Error - explicitly refers to the current instance of Outer
        super();
    }
}

Early assignment to fields

Accessing fields of the current instance is disallowed in an early construction context, but what about assigning to fields of the current instance while it is still under construction?

Allowing such assignments would be useful as a way for a constructor in a subclass to "defend" against a constructor in a superclass seeing uninitialized fields in the subclass. This can occur when a constructor in a superclass invokes a method in the superclass that is overridden by a method in the subclass. Although the Java language allows constructors to invoke overridable methods, it is considered bad practice: Item 19 of Effective Java (Third Edition) states that "Constructors must not invoke overridable methods". To see why it is considered bad practice, consider the following class hierarchy:

class Super {
    Super() { overriddenMethod(); }

    void overriddenMethod() { System.out.println("hello"); }
}

class Sub extends Super {
    final int x;
    Sub(int x) { this.x = x; }

    @Override
    void overriddenMethod() { System.out.println(x); }
}

What does new Sub(42); print? It might be expected to print 42, but it actually prints 0. This because the Super constructor is implicitly invoked before the field assignment in the Sub constructor body. The Super constructor then invokes overriddenMethod, causing that method in Sub to run before the Sub constructor body has had a chance to assign 42 to the field. As a result, the method in Sub sees the default value of the field, which is 0. This is the source of many bugs and errors.

Whilst this is considered bad programming practice, it is not uncommon, and it presents a conundrum for subclasses, especially when modifying the superclass is not an option.

Flexible constructor bodies solve the conundrum by allowing the Sub constructor to initialize the field in Sub before the Super constructor is invoked. The example can be rewritten as follows, where only the Sub class is changed:

class Super {
    Super() { overriddenMethod(); }

    void overriddenMethod() { System.out.println("hello"); }
}

class Sub extends Super {
    final int x;
    Sub(int x) {
        this.x = x;  // Initialize the field before the super call
        super();
    }

    @Override
    void overriddenMethod() { System.out.println(x); }
}

Now, new Sub(42); will print 42, because the field in Sub is assigned to 42 before overriddenMethod is invoked.

In a constructor body, a simple assignment to a field declared in the same class is allowed in an early construction context, provided the field declaration lacks an initializer.

This means that a constructor body can initialize the class's own fields in an early construction context, but not the fields of a superclass.

As discussed earlier, a constructor body cannot access any of the fields of the current instance -- whether declared in the same class as the constructor, or in a superclass -- until after the explicit constructor invocation.

Records

Constructors of record classes are already subject to more restrictions than constructors of normal classes. In particular:

These restrictions remain, but otherwise record constructors will benefit from the changes described above, primarily because non-canonical record constructors will be able to contain statements before the alternative constructor invocation.

Enums

Constructors of enum classes can contain alternate constructor invocations but not superclass constructor invocations. Enum classes will benefit from the changes described above, primarily because their constructors will be able to contain statements before the alternate constructor invocation.

Testing

We will test the compiler changes with existing unit tests, unchanged except for those tests that verify changed behavior, plus new positive and negative test cases as appropriate.

We will compile all JDK classes using the previous and new versions of the compiler and verify that the resulting bytecode is identical.

No platform-specific testing should be required.

Risks and Assumptions

The changes we propose above are source- and behavior-compatible. They strictly expand the set of legal Java programs while preserving the meaning of all existing Java programs.

These changes, though modest in themselves, represent a significant change in the long-standing requirement that a constructor invocation, if present, must always appear as the first statement in a constructor body. This requirement is deeply embedded in code analyzers, style checkers, syntax highlighters, development environments, and other tools in the Java ecosystem. As with any language change, there may be a period of pain as tools are updated.

Dependencies

This Java language feature depends on the ability of the JVM to verify and execute arbitrary code that appears before constructor invocations in constructors so long as that code does not reference the instance under construction. Fortunately, the JVM already supports a more flexible treatment of constructor bodies:

These more permissive rules still ensure top-down initialization:

In other words, we do not require any changes to the Java Virtual Machine Specification.

The current mismatch between the JVM and the language is an historical artifact. Originally the JVM was more restrictive, but this led to issues with the initialization of compiler-generated fields for new language features such as inner classes and captured free variables. As a result, the specification was relaxed to accommodate compiler-generated code, but this new flexibility never made its way back up to the language.