JEP draft: Unnamed Variables and Patterns
Owner | Angelos Bimpoudis |
Type | Feature |
Scope | SE |
Status | Submitted |
Component | specification / language |
Discussion | amber dash dev at openjdk dot org |
Effort | S |
Duration | S |
Reviewed by | Brian Goetz |
Created | 2023/07/10 16:17 |
Updated | 2023/09/11 14:56 |
Issue | 8311828 |
Summary
Enhance the Java language with unnamed variables, which can be initialized but
not used, and unnamed patterns, which match a record component without stating
the component's name or type. Both are denoted by an underscore character, _
.
Goals
-
Capture programmer intent that a given binding or parameter is unused, and enforce that in the code, to clarify programs and reduce opportunity for error.
-
Improve the maintainability of all code by identifying variables that must be declared (e.g., in a
catch
clause) but will not be used. -
Improve the readability of record patterns by eliding unnecessary nested patterns.
Non-Goals
-
It is not a goal to allow unnamed fields or method parameters.
-
It is not a goal to alter the semantics of local variables, e.g., in definite assignment analysis.
Motivation
For various reasons, Java developers will sometimes declare a variable that they do not intend to use. While such design intent may be known at the time the code is written, without capturing this intent, maintainers of this code may accidentally use the variable and violate its design intent. By making it impossible to accidentally use such variables, code is made more informative and readable, and less error-prone.
Unused variables
In traditional imperative code, most developers have encountered the situation of declaring a variable they did not intend to use, whether for reasons of code style, or because the language requires a variable declaration in certain contexts, especially those whose side-effect is more important than its result. For example:
-
The
try
-with-resources statement is always used for its side effect, namely the automatic closing of resources. In some cases a resource represents a context in which the code of thetry
block executes; the code does not use the context directly, so the name of the resource variable is irrelevant. For example, assuming aScopedContext
resource that isAutoCloseable
, the following code acquires and automatically releases a context:try (var acquiredContext = ScopedContext.acquire()) { ... acquiredContext not used ... }
The name
acquiredContext
is merely clutter, so it would be nice to elide it. -
Exceptions are the ultimate side effect, and handling one often gives rise to an unused variable. For example, most Java developers have written
catch
blocks of this form, where the actual exception is unused in exception handling:String s = ...; try { int i = Integer.parseInt(s); ... i ... } catch (NumberFormatException ex) { System.out.println("Bad number: " + s); }
Here is an example where the side effect of q.remove()
is more important than
its result, leading to an unused variable. The following code dequeues data but
only needs two out of every three elements:
Queue<Integer> q = ... // x1, y1, z1, x2, y2, z2 ..
while (q.size() >= 3) {
int x = q.remove();
int y = q.remove();
int z = q.remove(); // z is unused
... new Point(x, y) ...
}
The third call to remove()
has the desired side effect — dequeuing an element
— regardless of whether its result is assigned to a variable, so the declaration
of z
could be elided. However, for maintainability, the developer may wish to
consistently denote the result of remove()
by declaring a variable. Authors of
this code currently have a choice of bad options: don't declare the variable
z
, which leads to an asymmetry and possibly a static analysis warning about
ignoring the return value, or declare a variable that will not be used and
possibly getting a static analysis warning about unused variables
Another example where the side effect of an expression is more important than
its result, leading to an unused variable, comes from the enhanced-for loop
where the local variable in its header is not needed. For example, the following
code calculates total
as the side effect of a loop, without using the loop
variable order
:
static int count(Iterable<Order> orders) {
int total = 0;
for (Order order : orders) // order is unused
++total;
return total;
}
The prominence of order
's declaration is unfortunate, given that order
is
not used. The declaration can be shortened to var order
, but there is no way
to avoid giving this variable a name. The name itself can be shortened to, e.g.,
o
, but this syntactic trick does not communicate the semantic intent that the
variable will go unused. In addition, static analysis tools typically complain
about unused variables, even when the developer intends non-use and may not have
a way to silence the warnings.
Unused lambda parameters
Even code without side effects must sometimes declare unused variables. For example:
...stream.collect(Collectors.toMap(String::toUpperCase,
v -> "NODATA"));
This code generates a map which maps each key to the same placeholder value.
Since the lambda parameter v
is not used, its name is irrelevant.
In all these scenarios where variables are unused and their names are irrelevant, it would be better if we could simply declare variables with no name. This would free maintainers from having to understand irrelevant names, and would avoid false positives on non-use from static analysis tools.
The kinds of variables that can reasonably be declared with no name are those
which have no visibility outside a method: local variables, exception
parameters, and lambda parameters, as shown above. These kinds of variables can
be renamed or made unnamed without external impact. In contrast, fields — even
if they are private
— communicate the state of an object across methods, and
unnamed state is neither helpful nor maintainable.
Unused Pattern Variables
Type patterns match selector expressions by specifying a type name and a binding
name. For example, consider the following Ball
class, and a switch
that
explores the type of the ball:
sealed abstract class Ball permits RedBall, BlueBall, GreenBall { }
final class RedBall extends Ball { }
final class BlueBall extends Ball { }
final class GreenBall extends Ball { }
Ball ball = ...
switch (ball) {
case RedBall red -> process(ball);
case BlueBall blue -> process(ball);
case GreenBall green -> stopProcessing();
}
Each case
examines the type of a Ball
while the binding variables red
,
blue
, and green
are not used. Since the variables introduced by the type
patterns are not used, this code would be more clear if we could elide their
names.
As developers increasingly use records and their companion mechanism, sealed
classes (JEP 409), we expect that pattern
matching over complex data structures will become commonplace. Frequently, the
shape of a structure will be just as important as the individual data items
within it. Assume a switch
that explores the content of a Box
which can be
any of the previous Ball
types with also an additional case when the content
is null
:
record Box<T extends Ball>(T content) { }
Box<? extends Ball> box = ...
switch (box) {
case Box(RedBall red) -> processBox(box);
case Box(BlueBall blue) -> processBox(box);
case Box(GreenBall green) -> stopProcessing();
case Box(var itsNull) -> pickAnotherBox();
}
Similarly, those type patterns in nested positions also introduce bindings
variables that are not used. Since this switch
is more involved than the
previous one, eliding the names of the unused bindings in nested type patterns
would further increase readability.
Multiple unused pattern variables
Even if we could elide the names of unused pattern variables in the previous
Ball
and Box
examples, they still contain duplicated code on the right-hand
side for the corresponding red
and blue
cases. If the switch
es were
refactored to group the first two patterns in one case
label:
case RedBall red, BlueBall blue -> process(ball); // compile error
and
case Box(RedBall red), Box(BlueBall blue) -> processBox(box); // also compile error
then it would be erroneous to name the components: Neither of the names in each occasion is usable on the right-hand side because either of the patterns on the left-hand side can match. Since the names are unusable it would be better if we could elide them.
Unused nested patterns
Records (JEP 395) and record patterns (JEP 440) work together to streamline data processing. A record class aggregates the components of a data item into an instance, while code that receives an instance of a record class uses pattern matching to disaggregate the instance into its components. For example:
record Point(int x, int y) { }
enum Color { RED, GREEN, BLUE }
record ColoredPoint(Point p, Color c) { }
... new ColoredPoint(new Point(3,4), Color.GREEN) ...
if (r instanceof ColoredPoint(Point p, Color c)) {
... p.x() ... p.y() ...
}
In this code, one part of the program creates a ColoredPoint
instance while
another part uses pattern matching with instanceof
to test whether a variable
is a ColoredPoint
and, if so, extract its two components.
Record patterns such as ColoredPoint(Point p, Color c)
are pleasingly
descriptive, but it is common for programs to need only some of the components
for further processing. For example, the code above needs only p
in the if
block, not c
. It is laborious to write out all the components of a record
class every time we do such pattern matching. Furthermore, it is not visually
clear that the entire Color
component is irrelevant; this makes the condition
in the if
block harder to read, too. This is especially evident when record
patterns are nested to extract data within components, such as:
if (r instanceof ColoredPoint(Point(int x, int y), Color c)) {
... x ... y ...
}
We can use var
to reduce the visual cost of the unnecessary component Color c
,
e.g., ColoredPoint(Point(int x, int y), var c)
, but it would better to
reduce the cost even further by omitting unnecessary components altogether. This
would both simplify the task of writing record patterns and improve readability,
by removing clutter from the code.
Description
An unnamed variable is declared when either the local variable in a local
variable declaration statement, or an exception parameter in a catch
clause,
or a lambda parameter in a lambda expression, or a pattern variable in a type
pattern is denoted by an underscore. The underscore allows the identifier which
follows the type or var
in the statement or expression to be elided. In the
case of type patterns, the unnamed variable is called unnamed pattern variable.
Underscore is commonly used in other languages such as Scala and Python, to declare a variable with no name. Since underscore was valid as an identifier in Java 1.0, Java 8 (2014) initiated a long-term process to reclaim it, issuing compile-time warnings, which was completed in Java 9 (2017, JEP 213) by turning those warnings into errors.
The ability to use underscore in
identifiers
of length two or more is unchanged, since underscore remains a Java letter and a
Java letter-or-digit. For example, identifiers such as _age
and MAX_AGE
and
__
(two underscores) continue to be legal.
The ability to use underscore as a digit separator
is unchanged. For example, numeric literals such as 123_456_789
and 0b1010_0101
continue to be legal.
The unnamed pattern is denoted by an underscore character _
(U+005F) and is
equivalent to the type pattern var _
. It allows the type and name of a record
component to be elided in pattern matching.
Unnamed variables
The following kinds of declarations can introduce either a named variable (denoted by an identifier) or an unnamed variable (denoted by an underscore):
- A local variable declaration statement in a block (JLS 14.4.2),
- A resource specification of a
try
-with-resources statement (JLS 14.20.3), - The header of a basic
for
statement (JLS 14.14.1), - The header of an enhanced
for
loop (JLS 14.14.2), - An exception parameter of a
catch
block (JLS 14.20), and - A formal parameter of a lambda expression (JLS 15.27.1).
(The possibility of an unnamed local variable being declared by a pattern, i.e., a pattern variable (JLS 14.30.1), was covered above.)
Declaring an unnamed variable does not place a name in scope, so the variable cannot be written or read after it has been initialized. An initializer must be provided for an unnamed variable in each kind of declaration above.
An unnamed variable never shadows any other variable, since it has no name, so multiple unnamed variables can be declared in the same block.
Here are the examples from the Motivation, modified to use unnamed variables.
-
In
try
-with-resources:try (var _ = ScopedContext.acquire()) { ... no use of acquired resource ... }
-
A
catch
block:String s = ... try { int i = Integer.parseInt(s); ... i ... } catch (NumberFormatException _) { System.out.println("Bad number: " + s); }
Unnamed variables can be used in multiple
catch
blocks:try { ... } catch (Exception _) { ... } catch (Throwable _) { ... }
-
An enhanced
for
loop with side effects:static int count(Iterable<Order> orders) { int total = 0; for (Order _ : orders) ++total; return total; }
The initialization of a basic
for
loop can also declare unnamed local variables:for (int i = 0, _ = sideEffect(); i < 10; i++) { ... i ... }
-
An assignment statement, where the result of the expression on the right hand side is not needed:
Queue<Integer> q = ... // x1, y1, z1, x2, y2, z2, ... while (q.size() >= 3) { var x = q.remove(); var y = q.remove(); var _ = q.remove(); ... new Point(x, y) ... }
If the program needed to process only the
x1
,x2
, etc., coordinates then unnamed variables could be used in multiple assignment statements:while (q.size() >= 3) { var x = q.remove(); var _ = q.remove(); var _ = q.remove(); ... new Point(x, 0) ... }
-
A lambda whose parameter is irrelevant:
...stream.collect(Collectors.toMap(String::toUpperCase, _ -> "NODATA"))
Unnamed pattern variables
An unnamed pattern variable can appear in any type pattern, including "var" type
patterns, whether the type pattern appears at the top level or is nested in a
record pattern. For example the Ball
example can now be written:
switch (ball) {
case RedBall _ -> process(ball);
case BlueBall _ -> process(ball);
case GreenBall _ -> stopProcessing();
}
and the Box
example:
switch (box) {
case Box(RedBall _) -> processBox(box);
case Box(BlueBall _) -> processBox(box);
case Box(GreenBall _) -> stopProcessing();
case Box(var _) -> pickAnotherBox();
}
By allowing us to elide names, unnamed pattern variables make run-time data
exploration based on type patterns visually clearer, both in switch
statements
and expressions, and in instanceof
.
Multiple unnamed pattern variables
Unnamed pattern variables are particularly helpful when a switch
executes the
same action for multiple cases. For example, the earlier Ball
code snippet can
be rewritten as:
switch (ball) {
case RedBall _, BlueBall _ -> process(ball);
case GreenBall _ -> stopProcessing();
}
The first two cases use top-level unnamed pattern variables because their
right-hand sides do not use the bindings. Similarly, the Box
and Ball
code
snippet can also be rewritten as:
switch (box) {
case Box(RedBall _), Box(BlueBall _) -> processBox(box);
case Box(GreenBall _) -> stopProcessing();
case Box(var _) -> pickAnotherBox();
}
All cases use unnamed pattern variables because their right-hand sides do not
use the Box
's component.
A case
label with multiple patterns can have a
guard. A guard governs the case
as a whole, rather than the individual patterns. For example, assuming that
there is an int
variable x
, the first case of the previous example could be
further constrained:
case Box(RedBall _), Box(BlueBall _) when x == 42 -> processBox(b);
Pairing a guard with each pattern is not allowed, so this is prohibited:
case Box(RedBall _) when x == 0, Box(BlueBall _) when x == 42 -> processBox(b);
The unnamed pattern
The unnamed pattern is an unconditional pattern which binds nothing. Like the
var _
type pattern, the unnamed pattern is usable in a nested context of a
record pattern, but not at the top level of an instanceof
or case
.
Consequently, the earlier example can omit the type pattern for the Color
component entirely:
if (r instanceof ColoredPoint(Point(int x, int y), _)) { ... x ... y ... }
Likewise, we can extract the Color
component while eliding the record pattern
for the Point
component:
if (r instanceof ColoredPoint(_, Color c)) { ... c ... }
In deeply nested positions, using the unnamed pattern improves the readability of code that does complex data extraction. For example:
if (r instanceof ColoredPoint(Point(int x, _), _)) { ... x ... }
This code extracts the x
coordinate of the nested Point
while omitting both
the y
and Color
components.
Turning to the previous example with multiple unnamed pattern variables, since
var _
in a nested position is equivalent to the unnamed pattern, the switch
can use unnamed pattern variables as explaind previously but also an unnamed
pattern instead of var _
for the last case:
switch (box) {
case Box(RedBall _), Box(BlueBall _) -> processBox(box);
case Box(GreenBall _) -> stopProcessing();
case Box(_) -> pickAnotherBox();
}
Risks and Assumptions
-
We assume that very little existing and maintained code uses underscore as a variable name. A Java developer migrating from Java 7 to Java 22 without having gone through the gradual process introduced in Java 8, will face the risk of dealing with compile-time errors when reading or writing a variable called
_
and when declaring any other kind of entity (class, field, etc.) with the name_
. -
We expect developers of static analysis tools to realize the new role of underscore for unnamed variables and to avoid flagging the non-use of such variables in modern code.
Alternatives
-
It is possible to define an analogous concept of unnamed method parameters. However, this has some interactions with specification (e.g., how do you write JavaDoc for unnamed parameters?) and overriding (e.g., what does it mean to override a method with unnamed parameters?). This may be the subject of a future JEP.
-
JEP 302 (Lambda Leftovers) examined the issue of unused lambda parameters and identified the role of underscore to denote them, but also covered many other issues which were handled better in other ways. This JEP addresses the use of unused lambda parameters explored in JEP 302 but does not address the other issues explored there.