JEP draft: Unnamed patterns and variables

OwnerAngelos Bimpoudis
TypeFeature
ScopeSE
StatusSubmitted
Componentspecification / language
Discussionamber dash dev at openjdk dot org
EffortS
DurationS
Reviewed byAlex Buckley
Created2022/09/26 08:00
Updated2022/12/06 14:40
Issue8294349

Summary

Enhance the Java language with unnamed patterns, which match a record component without stating the component's name or type, and with unnamed variables, which can be initialized but not used. Both are denoted with an underscore, _. This is a preview language feature.

Goals

Non-Goals

Motivation

Record classes and record patterns work together to streamline data processing in Java. Data is modeled as a record class and its components, while code that receives an instance of a record class uses pattern matching with record patterns to disaggregate the instance into its components. In the following code, one part of a program creates a ColoredPoint instance, while another part of the program uses pattern matching with instanceof to test whether a variable is a ColoredPoint, and extract its two components if so:

record Point(int x, int y) {}
enum Color { RED, GREEN, BLUE }
record ColoredPoint(Point p, Color c) {}

... new ColoredPoint(new Point(3,4), Color.GREEN) ...

if (r instanceof ColoredPoint(Point p, Color c)) {
    ... p.x() ... p.y() ...
}

Record patterns such as ColoredPoint(Point p, Color c) are pleasingly descriptive, but it will be common for programs to need only some of the components for further processing. For example, the code above needs only p in the if block, not c. Developers will find it laborious to write out all the components of a record class every time they perform pattern matching. This is especially evident when record patterns are nested to extract data within components, such as:

if (r instanceof ColoredPoint(Point(int x, int y), Color c)) {
    ... x ... y ...
}

Developers can use var to reduce the visual cost of the unnecessary component Color c, e.g., ColoredPoint(Point(int x, int y), var c), but it would better to reduce the cost even further by omitting unnecessary components altogether. This would not only simplify the task of writing a record pattern, but would also improve readability by making the code look less cluttered.

As developers gain experience with the data-oriented methodology of record classes and their sister mechanism, sealed classes, we expect that pattern matching over complex data structures will be commonplace. Frequently, the shape of the structure will be just as important as the individual data items. As a highly simplified example, consider the following Box and Ball classes, and a switch that explores the content of a Box:

record Box<T extends Ball>(T content) {}

sealed abstract class Ball permits RedBall, BlueBall, GreenBall {}
final  class RedBall   extends Ball {}
final  class BlueBall  extends Ball {}
final  class GreenBall extends Ball {}

Box<? extends Ball> b = ...
switch (b) {
    case Box(RedBall   red)   -> processBox(b);
    case Box(BlueBall  blue)  -> processBox(b);
    case Box(GreenBall green) -> stopProcessing();
}

Every case deals with a Box based on its content, but the variables red and blue and green are not used. Since the variables are unused it would be ideal if the developer could elide their names. (Note that eliding the name red in RedBall red is distinct from omitting the entire RedBall component, as discussed earlier.)

Furthermore, if the switch was refactored to group the first two patterns in one case:

case Box(RedBall red), Box(BlueBall blue) -> processBox(b);

then it would be erroneous to name the components: Neither of the names is usable on the right-hand side because either of the patterns on the left-hand side could have matched. Since the names are unusable it would be ideal to elide them.

Turning to traditional imperative code, many developers have faced the situation of having to declare a variable that they did not intend to use. This typically occurs when the side effect of a statement is more important than its result. For example, the following code uses an enhanced-for statement to step through a collection, calculating total as a side effect, without using the loop variable order assigned by the statement:

int total = 0;
for (Order order : orders) {
    if (total < LIMIT) { 
        ... total++ ...
    }
}

The prominence of order's declaration is unfortunate given that order is not used. The declaration can be shortened to var order, but there is no way to avoid giving a name. The name itself can be shortened, e.g., o, but this syntactic trick does not communicate the semantic intent that the variable will go unused. In addition, static analysis tools typically complain about unused variables, even when the developer intended non-use and may not have a way to silence the warnings.

Here is another example where the side effect of a statement is more important than its result, leading to an unused variable. The following code dequeues data but only needs two out of every three elements:

Queue<Integer> q = ... // x1, y1, z1, x2, y2, z2 .. 
while (!q.isEmpty()) {
    int x = q.pop();
    int y = q.pop();
    int z = q.pop();  // z is unused
    ... new Point(x, y) ...
}

The third call to pop() has the desired side effect -- dequeuing an element -- regardless of whether its result is assigned to a variable, so the declaration of z could be elided. However, for maintainability, the developer may wish to consistently denote the result of pop() by declaring a variable, even if one is not used (and leads to static analysis warnings). Unfortunately, in many programs, the choice of variable name will not come so easily as z in the code above.

Unused variables occur frequently in two other kinds of statement that focus on side effects:

try (var acquiredContext = ScopedContext.acquire()) {
    ... acquiredContext not used ...
}
String s = ...;
try { 
    int i = Integer.parseInt(s);
    ... i ...
} catch (NumberFormatException ex) { 
    System.out.println("Bad number: " + s);
}

Even code without side effects is sometimes forced to declare unused variables. For example, the following code generates a map where each key mapped to the same placeholder value; since the lambda parameter v is not used, its name is irrelevant:

...stream.collect(Collectors.toMap(String::toUpperCase, v -> "NODATA"));

In all these scenarios where variables are unused and their names are irrelevant, it would be ideal if developers could declare variables with no name. This would free the code's maintainers from having to understand irrelevant names, and would avoid false positives on non-use from static analysis tools.

The kinds of variable that make sense being unnamed are those which have no visibility outside a method: local variables, exception parameters, and lambda parameters, as shown above. These kinds of variable can be renamed or made unnamed without external impact. In contrast, fields (even private ones) communicate the state of an object across methods, and unnamed state is neither helpful nor maintainable.

Description

The unnamed pattern is denoted by an underscore _. It allows the type and name of a record component to be elided in pattern matching, e.g., ... instanceof Point(int x, _) or case Point(int x, _).

An unnamed pattern variable is declared when the pattern variable in a type pattern is denoted by an underscore. It allows the identifier which usually follows the type (or var) in a type pattern to be elided, e.g., ... instanceof Point(int x, int _) or case Point(int x, int _).

An unnamed variable is declared when either the local variable in a local variable declaration statement, or an exception parameter in a catch clause, or a lambda parameter in a lambda expression, is denoted by an underscore. It allows the identifier which usually follows the type (or var) in the statement or expression to be elided, e.g., int _ = q.pop(); or catch (NumberFormatException _) or (int x, int _) -> x+x;. In the special case of a single-parameter lambda expression, such as _ -> "NODATA", the unnamed variable should not be confused with the unnamed pattern.

Underscore is the lightest reasonable syntax for signifying "no name". While it was originally a Java identifier, there has been a multi-year process to reclaim it for the purpose of unnamed patterns and variables, starting with warnings from javac in 2014 (Java 8) and continuing with errors in 2017 (Java 9, JEP 213). Many other languages use underscore to declare a variable with no name, such as Scala and Python.

The ability to use underscore in a Java identifier is unchanged; it remains a Java letter and a Java letter-or-digit. For example, identifiers such as _age and __age continue to be legal. The use of underscore as a digit separator is also unchanged. For example, numeric literals such as 123_456_789 and 0b1010_0101 continue to be legal.

The unnamed pattern

The unnamed pattern is an unconditional pattern which binds nothing. It may be used in a nested position in place of a type pattern or a record pattern. For example, ... instanceof Point(_, int y) is legal, but these are not: r instanceof _ and r instanceof _(int x, int y). Consequently, the earlier example can omit the type pattern for the Color component entirely:

if (r instanceof ColoredPoint(Point(int x, int y), _)) { ... x ... y ... }

The following example extracts the Color component while omitting the record pattern for Point component:

if (r instanceof ColoredPoint(_, Color c)) { ... c ... }

In deeply nested positions, using the unnamed pattern improves the readability of complex data extraction. Revisiting an earlier example, the following code extracts the x coordinate of the top-left corner of a rectangle:

if (r instanceof Rectangle(ColoredPoint(Point(int x, _), _), _)) { ... x ... }

Unnamed pattern variables

An unnamed pattern variable can appear in any type pattern, whether the type pattern appears at the top level or is nested in a record pattern. For example, both these appearances are legal: r instanceof Point _ and r instanceof ColoredPoint(Point(int x, int _), Color _).

Since unnamed pattern variables allow the user to omit names, they make run-time data exploration based on type patterns visually clearer, especially when used with switch statements and expressions.

Unnamed pattern variables are helpful when a switch needs to execute the same action for multiple cases. For example, the earlier example of Box and Ball can be rewritten as follows:

switch (b) {
    case Box(RedBall _), Box(BlueBall _) -> processBox(b);
    case Box(GreenBall _)                -> stopProcessing();
    case Box(_)                          -> pickAnotherBox();
}

The first two cases use unnamed pattern variables because their right-hand sides do not use the Box's component. The third case, which is new, uses the unnamed pattern in order to match a Box with a null component.

The unnamed pattern is shorthand for the type pattern var _. Neither the unnamed pattern nor var _ may be used at the top level of a pattern: both ... instanceof _ and ... instanceof var _ are prohibited, as are case _ and case var _.

Unnamed variables

The following kinds of declaration can introduce either a named variable (denoted by an identifier) or an unnamed variable (denoted by an underscore):

(The possibility of an unnamed local variable being declared by a pattern, i.e., a pattern variable (JLS 14.30.1), was described earlier in "Unnamed pattern variables".)

When an unnamed variable is declared, no name is placed in scope, so the variable cannot be written or read after it has been initialized. An initializer must be provided for an unnamed variable in each kind of declaration above.

Since an unnamed variable has no name, it never shadows any other variable, so multiple unnamed variables can be declared in the same block.

Here are examples from earlier, modified to use unnamed variables:

int acc = 0;
for (Order _ : orders) {
    if (acc < LIMIT) { 
        ... acc++ ...
    }
}

In addition, the initialization of a basic for loop can declare unnamed local variables:

for (int i = 0, _ = sideEffect(); i < 10; i++) { ... i ... }
Queue<Integer> q = ... // x1, y1, z1, x2, y2, z2, ...
while (!q.isEmpty()) {
    var x = q.pop();
    var y = q.pop();
    var _ = q.pop(); 
   ... new Point(x, y) ...
}

If the program needed to process only the x1, x2, etc coordinates, then unnamed variables could be used in multiple assignment statements:

while (!q.isEmpty()) {
    var x = q.pop();
    var _ = q.pop();
    var _ = q.pop(); 
    ... new Point(x, 0) ...
}
String s = ...
try { 
    int i = Integer.parseInt(s);
    ... i ...
} catch (NumberFormatException _) { 
    System.out.println("Bad number: " + s);
}

Unnamed variables can be used in multiple catch blocks:

try { ... } 
catch (Exception _) { ... } 
catch (Throwable _) { ... }
try (var _ = ScopedContext.acquire()) {
    ... no use of acquired resource ...
}
...stream.collect(Collectors.toMap(String::toUpperCase, _ -> "NODATA"))

Risks and Assumptions

We assume that very little code is using _ as a variable name. Such code was almost certainly written for JDK 7 or earlier, and has not been recompiled on JDK 9 or later. The risk to such code is a compile-time error when reading or writing a variable called _, and when declaring any other kind of entity (class, field, etc) with the name _. We assume that developers can modify such code to avoid using _ as the name of a variable or any other kind of entity, e.g., by renaming _ to _1.

We expect developers of static analysis tools to realize the new role of _ for unnamed variables, and avoid flagging the non-use of such variables in modern code.

Alternatives

It would be possible to apply the concept of unnamed variables to method parameters. However, this has some interactions with specification (how do you write javadoc for unnamed parameters?) and overriding (what does it mean to override a method with unnamed parameters?), so it will not be pursued in this JEP.

JEP 302 examined the issue of unused lambda parameters, and identified the role of underscore to denote them, but also covered many other issues which were deemed to be handled better in other ways.