JEP draft: Primitive types in patterns, instanceof, and switch (Preview)
Owner | Angelos Bimpoudis |
Type | Feature |
Scope | SE |
Status | Submitted |
Component | specification / language |
Discussion | amber dash dev at openjdk dot org |
Effort | M |
Duration | M |
Reviewed by | Alex Buckley |
Created | 2022/06/15 10:05 |
Updated | 2023/09/20 14:02 |
Issue | 8288476 |
Summary
Enhance pattern matching by allowing primitive type patterns to be used in all
pattern contexts, align the semantics of primitive type patterns with
instanceof
, and extend switch
to allow primitive constants as case
labels. This is a preview language feature.
Goals
-
Enable uniform data exploration by allowing type patterns to match values of any type, whether primitive or reference.
-
Align primitive type patterns with safe casting.
-
Allow pattern matching to use primitive type patterns in both nested and top-level contexts.
-
Provide easy-to-use constructs that eliminate the risk of losing information due to unsafe casts.
-
Following the enhancements to
switch
in Java 5 (enumswitch
) and Java 7 (stringswitch
), allowswitch
to process values of any primitive type.
Non-Goals
- It is not a goal to introduce new types of conversions or new conversion contexts.
Motivation
Records and record patterns work together to streamline data processing. Records (JEP 395) make it easy to aggregate components, and record patterns (JEP 440) make it easy to decompose aggregates using pattern matching.
For example, we can model JSON documents with a sealed hierarchy of records:
sealed interface JsonValue {
record JsonString(String s) implements JsonValue { }
record JsonNumber(double d) implements JsonValue { }
record JsonNull() implements JsonValue { }
record JsonBoolean(boolean b) implements JsonValue { }
record JsonArray(List<JsonValue> values) implements JsonValue { }
record JsonObject(Map<String, JsonValue> map) implements JsonValue { }
}
With respect to numbers JSON does not distinguish integers from non-integers, so
in JsonNumber
we represent all numbers with double
values as recommended by the specification.
Given a JSON payload of
{ "name" : "John", "age" : 30 }
we can construct a corresponding JsonValue
via
var json = new JsonObject(Map.of("name", new JsonString("John")
"age", new JsonNumber(30)));
For each key in the map, this code instantiates an appropriate record for the
corresponding value. For the first, the value "John"
has the same type as the
record's component, namely String
. For the second, however, the Java compiler
applies a widening primitive conversion to convert the int
value, 30, to a
double
.
Nested primitive type patterns are limited
We can, of course, use record patterns to disaggregate this json
value:
record Customer(String name, int age) { }
...
if (json instanceof JsonObject(var map)
&& map.get("name") instanceof JsonString(String name)
&& map.get("age") instanceof JsonNumber(double age))
{
return new Customer(name, (int)age); // unavoidable cast
}
Here we see that primitive type patterns in nested contexts have a limitation:
In this application we expect the age
value always to be an int
, but from
the JsonNumber
pattern we can only extract a double
and must rely upon a
lossy manual cast to convert that to an int
. We should return a Customer
object only when the age
value is representable as an int
, which requires
additional code:
if (json instanceof JsonObject(var map)
&& map.get("name") instanceof JsonString(String name)
&& map.get("age") instanceof JsonNumber(double age))
{
int age2 = (int)age; // unavoidable cast
if (age2 == age)
return new Customer(name, age2);
}
What we would really like to do is use int
directly in the JsonNumber
pattern such that the pattern matches only when the double
value inside the
JsonNumber
object can be converted to an int
without loss of information,
and when it does match it automatically narrows the double
value to an int
:
if (json instanceof JsonObject(var map)
&& map.get("name") instanceof JsonString(String name)
&& map.get("age") instanceof JsonNumber(int age))
{
return new Customer(name, age); // no cast!
}
This sort of usage is characteristic of pattern matching's ability to reject illegal values automatically. Pattern matching eliminates the need for potentially unsafe casts by raising match failures to control-flow decisions. It already works this way for reference types in patterns; for example:
record Box(Object o) { }
Box b = new Box(new RedBall());
if (b instanceof Box(RedBall r)) { ... }
Here the pattern Box(RedBall r)
matches only when b
is a Box
that holds a
RedBall
, in which case it binds the local variable r
of type RedBall
to
that object. Unfortunately, primitives are today comparatively limited: If the
type of the matched component is T
then the type in the primitive type pattern
must be T
as well. Primitive type patterns should not mean something
different from reference type patterns; they should both mean that the value can
be cast safely.
Primitive type patterns are not permitted in top-level contexts
The previous examples show that primitive type patterns are invariant in nested contexts. In addition to that, primitive types patterns cannot be used at top-level contexts at all; only type patterns of reference types are allowed in top-level contexts.
It would be ideal, to spread the utility of pattern matching via instanceof
to
all types in general. In Java, pattern matching with instanceof
means that a
block of code is safeguarded against a certain type of errors. In the following
example, before o
is converted to String
via a cast, the developer
safeguards that an object o
has the correct run-time type. If instanceof
returns true then the upcoming cast conversion is guaranteed to be safe (no
ClassCastException
or NullPointerException
) and a variable s
of a sharper
type, String
, will be initialized:
Object o = ...
if (o instanceof String s) { // type pattern
... s.isEmpty() ... // will execute without error
}
Lifting restrictions to primitive type patterns, means that instanceof
can now
be able to safeguard any cast conversion supported by Java (JLS 5.5), at
top-level too. In the following example, instanceof
with a primitive type
pattern byte b
implies that instanceof
safeguards whether i
can be safely
cast to byte
without loss of information. If instanceof
returns true, it
means that (byte) i
will not lead to loss of information about e.g., magnitude
and sign:
int i = ...
if (i instanceof byte b) {
... b ...
}
For example, if the int
variable i
holds 1000
then the value of (byte) i
will become -24
. Pattern matching and occasionally Java developers, must
safeguard casts by checking, for example, that a 32-bit int
can be represented
by an 8-bit byte
:
int i = 42;
byte b = 0;
if(i >= -128 && i <= 127) {
b = (byte) i;
}
or alternatively use round-trip casts:
int i = 42;
byte b = 0;
if((int)(byte)i == i) {
b = (byte) i;
}
Enhancing pattern matching with primitive type patterns means that safeguarding casts is done automatically by the compiler.
Primitive types are not permitted in type comparisons
Extending instanceof
as the pattern matching operator means that we can extend
the semantics for the type comparison operator instanceof
, symmetrically. It
would be desirable to generalize the instanceof type-testing operator to work on
all types recognising the connection between instanceof and casting.
instanceof
involving any pair of types (the type of the operand and the type
described by the RHS) would succeed if a casting conversion exists and can be
performed without loss of magnitude, sign, precision, or range. As a result, the
safe-cast testing operator will be able to defend against potentially lossy
casts between any types. Following the previous example that pattern matches
over byte b
, instanceof
can now support safe-cast testing over the primitive
type byte
:
int i = ...
if (i instanceof byte) {
...
}
Primitive type patterns in switch
At present, primitive type patterns are not allowed at a top-level context of a
switch either. For example, with a top-level primitive type pattern we could
rewrite the switch
expression (JEP 361)
switch (x.getStatus()) {
case 0 -> "okay";
case 1 -> "warning";
case 2 -> "error";
default -> "unknown status: " + x.getStatus();
}
more clearly as
switch (x.getStatus()) {
case 0 -> "okay";
case 1 -> "warning";
case 2 -> "error";
case int i -> "unknown status: " + i;
}
Here the case int i
label matches any status value not previously matched,
making the switch
expression exhaustive so that no default
label is
required.
Permitting top-level primitive type patterns would allow guards to be used to
further restrict the values matched by case
labels:
switch (x.getYearlyFlights()) {
case 0 -> ...;
case 1 -> ...;
case 5 -> issueDiscount();
case int i when i > 100 -> issueGoldCard();
case int i -> ...;
}
Combining primitive type patterns and record patterns facilitates further opportunities for case analysis when combined with record patterns:
switch (x.order()) {
case NormalOrder(Product(int productCode)) -> ...;
case BadOrder x -> switch (x.reason()) {
case MissingProduct q -> switch (q.code()) {
case 1 -> ...;
case 2 -> ...;
case int i -> ...;
}
}
}
switch
does not support all primitive types
Prior to this JEP, switch
expressions and switch
statements can switch on
some primitive types — but not boolean
, float
, double
, or long
. We can
switch on a long
value only when it fits within an int
, so we must handle
any remaining cases with if
statements:
long v = ...;
if (v == (int)v) {
switch ((int)v) {
case 0x01 -> ...;
case 0x02 -> ...;
case int i -> ... i ...;
}
}
if (v == 10_000_000_000L) { ... }
if (v == 20_000_000_000L) { ... }
If we could use long
constant expressions in case
labels then we could
instead write:
long v = ...;
switch (v) {
case 0x01 -> ...;
case 0x02 -> ...;
case 10_000_000_000L -> ...;
case 20_000_000_000L -> ...;
case long l -> ... l ...;
}
Similarly, consider code that uses if
-else
chains to test float
values:
float f = ...;
if (Float.isNaN(f)) {
...
} else if (Float.isInfinite(f)) {
...
} else {
...
}
With float
values in case
labels we could declutter this into:
float f = ...;
switch (f) {
case Float.NaN -> ...;
case Float.POSITIVE_INFINITY -> ...;
case Float.NEGATIVE_INFINITY -> ...;
case float g -> ... g ...;
}
Switching on boolean
values could be a useful alternative to the ternary
conditional operator (?
/:
). Unlike that operator, a boolean
switch
expression can contain both expressions and statements in its rules. For
example:
startProcessing(OrderStatus.NEW, switch (user.isLoggedIn()) {
case true -> user.id();
case false -> { log("Unrecognized user"); yield -1; }
});
Here the second argument to the startProcessing
method uses a boolean
switch
to encapsulate some business logic.
When switching on a primitive value, a switch
expression or statement should
automatically convert between the type of that value and the types of its case
labels, as long as those conversions do not lose precision or range. For
example, when switching on a float
value the case
labels could be of type
float
, double
, int
, or long
as long as the constant value of each label
converts sensibly to a float
.
float f = ...;
switch (f) {
case 16_777_216 -> ...;
case 16_777_217 -> ...;
default -> ...;
}
This switch
accepts a float
but its case labels are integral values that
convert to the same float
value. The cases are indistinguishable at run time,
so this code should be rejected at compile time.
In summary, primitive types in instanceof
, and in type patterns for
instanceof
and switch
, would increase program reliability and enable more
uniform data exploration with pattern matching. This JEP removes the following
restrictions:
- primitive type patterns could only be used on a match target of the exact same type,
- primitive type patterns were only allowed in a nested context and not at top-level,
instanceof
was restricted to reference types only andswitch
and constant case labels were restricted to support only a subset of primitive types.
Description
This JEP extends type patterns in the Java language by removing restrictions
around primitive types. The semantics of type patterns will be defined in terms
of instanceof
, the type comparison operator. Henceforth, instanceof
will not
only be able to test and compare reference types but also to safeguard any
pre-existing casting conversion already supported by the Java language. As a
result instanceof
will be able to test whether a value can be safely cast to a
target type where ‘safe’ means guarding against any erroneous situations such as
the raise of a ClassCastException
, the raise of a NullPointerException
, and
any loss of information involving primitive types (magnitude, precision, range
or sign).
instanceof
as the precondition test for safe casting in general
As of Java 16, the instanceof
operator is either a type comparison operator
or a pattern match operator, depending on its syntactic form.
When instanceof
is a type comparison operator, support for primitive types is
realized by removing the restrictions that (1) the type of the left-hand operand
must be a reference type, and (2) the right-hand operand must name a reference
type. The form of a type comparison operator becomes:
InstanceofExpression:
RelationalExpression instanceof Type
...
Before this JEP, the result of a type comparison operator was false if the value
was the null reference, true if the value could be cast to the right-hand
operand without raising a ClassCastException
, and false otherwise. This JEP
generalizes an expression e instanceof T
as if asking whether a value e
of
static type S
can be converted exactly to the given primitive or reference
type T
in a casting context (JLS 5.5). This makes instanceof
the
precondition test for safe casting in general.
Under this generalization, the instanceof
type comparison operator is defined
to work for all pairs of types that are allowed to be converted in a casting
context. Before this JEP, pairs between reference types that are not supported,
a compile-time error occurs. Under this JEP, type-checking instanceof
continues to follow the rules of cast conversions and for pairs between both
reference and primitive types that are not supported, a compile-time error
occurs. The examples given earlier rely on conversions allowed in a casting
context, so they can be rewritten to use instanceof
directly:
int i = 1000;
if (i instanceof byte) { // false
byte b = (byte) i;
... b ...
}
byte b = 42;
if (b instanceof int) { // true
int i = (byte) b;
... i ...
}
int i = 16_777_216; // 2^24
if (i instanceof float) { // true
float f = (float) i;
... f ...
}
int i = 16_777_217; // 2^24+1
if (i instanceof float) { // false
float f = (float) i;
... f ...
}
This JEP does not add any conversions to the casting context, nor creates any
new conversion contexts. Whether instanceof
is applicable to a given
expression and type is determined entirely by whether there is already a
conversion allowed by the casting context. The conversions permitted in casting
context are as follows:
- identity conversions (JLS 5.1.1)
- widening primitive conversions (JLS 5.1.2)
- narrowing primitive conversions (JLS 5.1.3)
- widening and narrowing primitive conversions (JLS 5.1.4)
- boxing conversions (JLS 5.1.7)
- unboxing conversions (JLS 5.1.8)
and specified combinations of these:
- an identity conversion (JLS 5.1.1)
- a widening reference conversion (JLS 5.1.5)
- a widening reference conversion followed by an unboxing conversion
- a widening reference conversion followed by an unboxing conversion, then followed by a widening primitive conversion
- a narrowing reference conversion (JLS 5.1.6)
- a narrowing reference conversion followed by an unboxing conversion
- an unboxing conversion (JLS 5.1.8)
- an unboxing conversion followed by a widening primitive conversion
Consider the following examples. All of the following are allowed because the
left-hand operand of instanceof
, an expression e
, can be converted to the
specified type in a casting context:
int i = ...
i instanceof byte
i instanceof float
boolean b = ...
b instanceof Boolean
Short s = ...
s instanceof int
s instanceof long
long l = ...
l instanceof float
l instanceof double
Long ll = ...
ll instanceof float
ll instanceof double
However, all of the following examples raise a compile-time error, since they do not correspond to a pre-existing casting conversion:
boolean b = ...
b instanceof char // error
Byte bb = ...
bb instanceof char // error
Integer ii = ...
ii instanceof byte // error
ii instanceof short // error
Long ll = ...
ll instanceof int // error
ll instanceof Float // error
ll instanceof Double // error
If e
has a reference type and the relational expression is null
,
instanceof
continues to evaluate to false
.
Exactness of Casting Conversions
A conversion is exact if no loss of information occurs. Whether a conversion is exact depends on the pair of types involved and potentially on the input value:
-
For some pairs, the conversion from the first type to the second type is guaranteed not to lose information for any value, and requires no action at run time. The conversion is said to be unconditionally exact. Examples include
int
toint
andint
tolong
. -
For other pairs, a run-time test is needed to check whether the value can be converted from the first type to the second type without loss of information. Examples include
long
toint
andint
tofloat
-- both of these conversions detect loss of precision by relying to the notion of "representation equivalence" in java.lang.Double.
Adopting the notation from JLS (5.5) the primitive conversions in the following
table show which conversions are unconditionally exact with the symbol ɛ
. For
completeness: -
(no conversion allowed), ≈
(identity conversion), ω
(widening primitive conversion), η
(narrowing primitive conversion), ωη
(widening and narrowing primitive conversion):
To → | byte |
short |
char |
int |
long |
float |
double |
boolean |
---|---|---|---|---|---|---|---|---|
From ↓ | ||||||||
byte |
≈ |
ɛ |
ωη |
ɛ |
ɛ |
ɛ |
ɛ |
- |
short |
η |
≈ |
η |
ɛ |
ɛ |
ɛ |
ɛ |
- |
char |
η |
η |
≈ |
ɛ |
ɛ |
ɛ |
ɛ |
- |
int |
η |
η |
η |
≈ |
ɛ |
ω |
ɛ |
- |
long |
η |
η |
η |
η |
≈ |
ω |
ω |
- |
float |
η |
η |
η |
η |
η |
≈ |
ɛ |
- |
double |
η |
η |
η |
η |
η |
η |
≈ |
- |
boolean |
- |
- |
- |
- |
- |
- |
- |
≈ |
Consider the following examples, the unconditionally exact conversions are marked with (ε), those always return true regardless the value, the rest of the results were obtained via a runtime check:
byte b = 42;
b instanceof int; // true (ε)
int i = 1000;
i instanceof byte; // false
int i = 42;
i instanceof byte; // true
int i = 16_777_217; // 2^24+1
i instanceof float; // false
i instanceof double; // true (ε)
i instanceof Integer; // true (ε)
i instanceof Number; // true (ε)
float f = 1000.0f;
f instanceof byte; // false
f instanceof int; // true
f instanceof double; // true (ε)
double d = 1000.0d;
d instanceof byte; // false
d instanceof int; // true
d instanceof float; // true
Integer ii = 1000;
ii instanceof int; // true
ii instanceof float; // true
ii instanceof double; // true
Integer ii = 16_777_217;
ii instanceof float; // false
ii instanceof double; // true
Primitive type patterns
Type patterns currently do not allow primitive types when they are top-level, only when they appear in a nested pattern list of a record pattern. We lift that restriction so that primitives types are allowed in top-level as well.
The semantics of primitive type patterns (and reference type patterns on targets of primitive type) are derived from that of casting conversions.
-
A type pattern
T t
is applicable to a target of typeU
if aU
could be cast toT
without an unchecked warning. -
A type pattern
T t
is unconditional on a target of typeU
if all values ofU
can be exactly cast toT
. This includes widening from one reference type to another, widening from one integral type to another, widening from one floating-point type to another, widening frombyte
,short
, orchar
to a floating-point type, wideningint
todouble
, and boxing. -
A set of patterns containing a type pattern
T t
is exhaustive on a target of typeU
ifT t
is unconditional onU
or if there is an unboxing conversion fromT
toU
. -
A type pattern
T t
dominates a type patternU u
, or a record patternU(...)
, ifT t
would be unconditional on a target of typeU
. -
A type pattern
T t
that is not null-matching, matches a targetu
ifu instanceof T
. Theinstanceof
check examines whether the implied casting conversion would not result in loss of information or error.
Exhaustiveness
A switch expression requires that all statically known, possible values of the selector expression be handled in the switch block; in other words, the switch must be exhaustive. While a switch can be exhaustive if it contains an unconditional type pattern, it can be exhaustive in other occasions as well deferring any possibly unhandled cases at run-time (Patterns: Exhaustiveness, Unconditionality, and in Remainder). If a set of patterns is exhaustive for a type, we call the runtime values that are not matched by any pattern in the set the remainder of the set.
With pattern labels involving record patterns, some patterns are allowed to be
exhaustive even when they are not unconditional. For example, the following
switch is considered exhaustive on Box<Box<String>>
, even though it will not
match new Box(null)
:
Box<Box<String>> bbs = ...
switch (bbs) {
case Box(Box(String s)): ...
}
The pathological value new Box(null)
is part of that remainder set, and is
handled by a synthetic default
clause that throws MatchException
.
With the introduction of primitive type patterns, we observe that unboxing
follows the same philosophy. As a result, a type pattern int x
is considered
exhaustive on Integer
, so the following switch is considered exhaustive on
Box<Integer>
even if Box(null)
is not covered by Box(int i)
(as is left as
a remainder of the set):
Box<Integer> bi = ...
switch (bi) {
case Box(int i): ...
}
Constant expressions in case
labels
Turning to constant expressions in the case
labels of a switch
, the
primitive types long
, float
, double
, boolean
, and their boxes can be
associated with a switch block as long as the type of the selector expression
(which can be a primitive type or a boxed reference type) is the same as the
type of the constant expression.
For example, the constant expression 0f
can only be used when the selector
expression's type is float
or Float
:
float f = ...
switch (f) {
case 0f -> 5f + 0f;
case Float fi when fi == 1f -> 6f + fi;
case Float fi -> 7f + fi;
}
Two floating-point numbers are the same per IEEE 754 if their finite values, the
sign, exponent, and significand components of the floating-point values are the
same. For that reason, representation equivalence defines how switch labels can
be selected in the presence of non-integral or boolean values. The same
definition is used to signal duplicate label errors in case a developer writes
the following switch
:
float f = ...
switch (f) {
case 1.0f -> ...
case 0.999999999f -> ...
default -> ...
}
While 1.0f
is represented as a float
, 0.999999999f
is not. The latter is
rounded up to 1.0f
as well, a situation that results in a compile-time error.
Since boolean
(and its box) consist of only two distinct values, a switch
that lists both the true and false cases is considered exhaustive:
boolean b = ...
switch (b) {
case true -> ...
case false -> ...
// Alternatively: case true, false -> ...
}
It is a compile-time error for that switch
to include a default
clause.
Risks and Assumptions
Outside pattern matching and instanceof
, lossy assignment is endemic in Java
source code. For example if a method returns int
then its result can be
assigned to a float
variable without casting:
int getSalary() { ... }
float salary = getSalary();
The risk is that Java developers do not realize the possible loss of range that can occur at this assignment, because it is silent.
We assume that developers of static analysis tools will realize the new role of
instanceof
, and avoid flagging code that uses converted data without a prior
manual range-check while at the same time they are safeguarded by the extended
instanceof
.