JEP draft: Primitive types in patterns, instanceof, and switch (Preview)

OwnerAngelos Bimpoudis
TypeFeature
ScopeSE
StatusSubmitted
Componentspecification / language
Discussionamber dash dev at openjdk dot org
EffortM
DurationM
Reviewed byAlex Buckley
Created2022/06/15 10:05
Updated2023/09/20 14:02
Issue8288476

Summary

Enhance pattern matching by allowing primitive type patterns to be used in all pattern contexts, align the semantics of primitive type patterns with instanceof, and extend switch to allow primitive constants as case labels. This is a preview language feature.

Goals

Non-Goals

Motivation

Records and record patterns work together to streamline data processing. Records (JEP 395) make it easy to aggregate components, and record patterns (JEP 440) make it easy to decompose aggregates using pattern matching.

For example, we can model JSON documents with a sealed hierarchy of records:

sealed interface JsonValue {
    record JsonString(String s) implements JsonValue { }
    record JsonNumber(double d) implements JsonValue { }
    record JsonNull() implements JsonValue { }
    record JsonBoolean(boolean b) implements JsonValue { }
    record JsonArray(List<JsonValue> values) implements JsonValue { }
    record JsonObject(Map<String, JsonValue> map) implements JsonValue { }
}

With respect to numbers JSON does not distinguish integers from non-integers, so in JsonNumber we represent all numbers with double values as recommended by the specification.

Given a JSON payload of

{ "name" : "John", "age" : 30 }

we can construct a corresponding JsonValue via

var json = new JsonObject(Map.of("name", new JsonString("John")
                                 "age", new JsonNumber(30)));

For each key in the map, this code instantiates an appropriate record for the corresponding value. For the first, the value "John" has the same type as the record's component, namely String. For the second, however, the Java compiler applies a widening primitive conversion to convert the int value, 30, to a double.

Nested primitive type patterns are limited

We can, of course, use record patterns to disaggregate this json value:

record Customer(String name, int age) { }
...
if (json instanceof JsonObject(var map)
    && map.get("name") instanceof JsonString(String name)
    && map.get("age") instanceof JsonNumber(double age))
{
    return new Customer(name, (int)age);    // unavoidable cast
}

Here we see that primitive type patterns in nested contexts have a limitation: In this application we expect the age value always to be an int, but from the JsonNumber pattern we can only extract a double and must rely upon a lossy manual cast to convert that to an int. We should return a Customer object only when the age value is representable as an int, which requires additional code:

if (json instanceof JsonObject(var map)
    && map.get("name") instanceof JsonString(String name)
    && map.get("age") instanceof JsonNumber(double age))
{
    int age2 = (int)age;                    // unavoidable cast
    if (age2 == age)
        return new Customer(name, age2);
}

What we would really like to do is use int directly in the JsonNumber pattern such that the pattern matches only when the double value inside the JsonNumber object can be converted to an int without loss of information, and when it does match it automatically narrows the double value to an int:

if (json instanceof JsonObject(var map)
    && map.get("name") instanceof JsonString(String name)
    && map.get("age") instanceof JsonNumber(int age))
{
    return new Customer(name, age);         // no cast!
}

This sort of usage is characteristic of pattern matching's ability to reject illegal values automatically. Pattern matching eliminates the need for potentially unsafe casts by raising match failures to control-flow decisions. It already works this way for reference types in patterns; for example:

record Box(Object o) { }
Box b = new Box(new RedBall());
if (b instanceof Box(RedBall r)) { ... }

Here the pattern Box(RedBall r) matches only when b is a Box that holds a RedBall, in which case it binds the local variable r of type RedBall to that object. Unfortunately, primitives are today comparatively limited: If the type of the matched component is T then the type in the primitive type pattern must be T as well. Primitive type patterns should not mean something different from reference type patterns; they should both mean that the value can be cast safely.

Primitive type patterns are not permitted in top-level contexts

The previous examples show that primitive type patterns are invariant in nested contexts. In addition to that, primitive types patterns cannot be used at top-level contexts at all; only type patterns of reference types are allowed in top-level contexts.

It would be ideal, to spread the utility of pattern matching via instanceof to all types in general. In Java, pattern matching with instanceof means that a block of code is safeguarded against a certain type of errors. In the following example, before o is converted to String via a cast, the developer safeguards that an object o has the correct run-time type. If instanceof returns true then the upcoming cast conversion is guaranteed to be safe (no ClassCastException or NullPointerException) and a variable s of a sharper type, String, will be initialized:

Object o = ...
if (o instanceof String s) { // type pattern
    ... s.isEmpty() ...      // will execute without error
}

Lifting restrictions to primitive type patterns, means that instanceof can now be able to safeguard any cast conversion supported by Java (JLS 5.5), at top-level too. In the following example, instanceof with a primitive type pattern byte b implies that instanceof safeguards whether i can be safely cast to byte without loss of information. If instanceof returns true, it means that (byte) i will not lead to loss of information about e.g., magnitude and sign:

int i = ...
if (i instanceof byte b) {
    ... b ...
}

For example, if the int variable i holds 1000 then the value of (byte) i will become -24. Pattern matching and occasionally Java developers, must safeguard casts by checking, for example, that a 32-bit int can be represented by an 8-bit byte:

int i = 42;
byte b = 0;
if(i >= -128 && i <= 127) { 
    b = (byte) i;
}

or alternatively use round-trip casts:

int i = 42;
byte b = 0;
if((int)(byte)i == i) { 
    b = (byte) i;
}

Enhancing pattern matching with primitive type patterns means that safeguarding casts is done automatically by the compiler.

Primitive types are not permitted in type comparisons

Extending instanceof as the pattern matching operator means that we can extend the semantics for the type comparison operator instanceof, symmetrically. It would be desirable to generalize the instanceof type-testing operator to work on all types recognising the connection between instanceof and casting. instanceof involving any pair of types (the type of the operand and the type described by the RHS) would succeed if a casting conversion exists and can be performed without loss of magnitude, sign, precision, or range. As a result, the safe-cast testing operator will be able to defend against potentially lossy casts between any types. Following the previous example that pattern matches over byte b, instanceof can now support safe-cast testing over the primitive type byte:

int i = ...
if (i instanceof byte) {
  ...
}

Primitive type patterns in switch

At present, primitive type patterns are not allowed at a top-level context of a switch either. For example, with a top-level primitive type pattern we could rewrite the switch expression (JEP 361)

switch (x.getStatus()) {
    case 0 -> "okay";
    case 1 -> "warning";
    case 2 -> "error";
    default -> "unknown status: " + x.getStatus();
}

more clearly as

switch (x.getStatus()) {
    case 0 -> "okay";
    case 1 -> "warning";
    case 2 -> "error";
    case int i -> "unknown status: " + i;
}

Here the case int i label matches any status value not previously matched, making the switch expression exhaustive so that no default label is required.

Permitting top-level primitive type patterns would allow guards to be used to further restrict the values matched by case labels:

switch (x.getYearlyFlights()) {
    case 0 -> ...;
    case 1 -> ...;
    case 5 -> issueDiscount();
    case int i when i > 100 -> issueGoldCard();
    case int i -> ...;
}

Combining primitive type patterns and record patterns facilitates further opportunities for case analysis when combined with record patterns:

switch (x.order()) {
    case NormalOrder(Product(int productCode)) -> ...;
    case BadOrder x -> switch (x.reason()) {
        case MissingProduct q -> switch (q.code()) {
            case 1     -> ...;
            case 2     -> ...;
            case int i -> ...;
        }
    }
}

switch does not support all primitive types

Prior to this JEP, switch expressions and switch statements can switch on some primitive types — but not boolean, float, double, or long. We can switch on a long value only when it fits within an int, so we must handle any remaining cases with if statements:

long v = ...;
if (v == (int)v) {
    switch ((int)v) {
        case 0x01  -> ...;
        case 0x02  -> ...;
        case int i -> ... i ...;
    }
}

if (v == 10_000_000_000L) { ... }
if (v == 20_000_000_000L) { ... }

If we could use long constant expressions in case labels then we could instead write:

long v = ...;
switch (v) {
    case 0x01            -> ...;
    case 0x02            -> ...;
    case 10_000_000_000L -> ...;
    case 20_000_000_000L -> ...;
    case long l          -> ... l ...;
}

Similarly, consider code that uses if-else chains to test float values:

float f = ...;
if (Float.isNaN(f)) {
    ...
} else if (Float.isInfinite(f)) {
    ...
} else {
    ...
}

With float values in case labels we could declutter this into:

float f = ...;
switch (f) {
    case Float.NaN    -> ...;
    case Float.POSITIVE_INFINITY -> ...;
    case Float.NEGATIVE_INFINITY -> ...;
    case float g -> ... g ...;
}

Switching on boolean values could be a useful alternative to the ternary conditional operator (?/:). Unlike that operator, a boolean switch expression can contain both expressions and statements in its rules. For example:

startProcessing(OrderStatus.NEW, switch (user.isLoggedIn()) {
    case true  -> user.id();
    case false -> { log("Unrecognized user"); yield -1; }
});

Here the second argument to the startProcessing method uses a boolean switch to encapsulate some business logic.

When switching on a primitive value, a switch expression or statement should automatically convert between the type of that value and the types of its case labels, as long as those conversions do not lose precision or range. For example, when switching on a float value the case labels could be of type float, double, int, or long as long as the constant value of each label converts sensibly to a float.

float f = ...;
switch (f) {
    case 16_777_216 -> ...;
    case 16_777_217 -> ...;
    default -> ...;
}

This switch accepts a float but its case labels are integral values that convert to the same float value. The cases are indistinguishable at run time, so this code should be rejected at compile time.

In summary, primitive types in instanceof, and in type patterns for instanceof and switch, would increase program reliability and enable more uniform data exploration with pattern matching. This JEP removes the following restrictions:

Description

This JEP extends type patterns in the Java language by removing restrictions around primitive types. The semantics of type patterns will be defined in terms of instanceof, the type comparison operator. Henceforth, instanceof will not only be able to test and compare reference types but also to safeguard any pre-existing casting conversion already supported by the Java language. As a result instanceof will be able to test whether a value can be safely cast to a target type where ‘safe’ means guarding against any erroneous situations such as the raise of a ClassCastException, the raise of a NullPointerException, and any loss of information involving primitive types (magnitude, precision, range or sign).

instanceof as the precondition test for safe casting in general

As of Java 16, the instanceof operator is either a type comparison operator or a pattern match operator, depending on its syntactic form.

When instanceof is a type comparison operator, support for primitive types is realized by removing the restrictions that (1) the type of the left-hand operand must be a reference type, and (2) the right-hand operand must name a reference type. The form of a type comparison operator becomes:

InstanceofExpression:
    RelationalExpression instanceof Type
    ...

Before this JEP, the result of a type comparison operator was false if the value was the null reference, true if the value could be cast to the right-hand operand without raising a ClassCastException, and false otherwise. This JEP generalizes an expression e instanceof T as if asking whether a value e of static type S can be converted exactly to the given primitive or reference type T in a casting context (JLS 5.5). This makes instanceof the precondition test for safe casting in general.

Under this generalization, the instanceof type comparison operator is defined to work for all pairs of types that are allowed to be converted in a casting context. Before this JEP, pairs between reference types that are not supported, a compile-time error occurs. Under this JEP, type-checking instanceof continues to follow the rules of cast conversions and for pairs between both reference and primitive types that are not supported, a compile-time error occurs. The examples given earlier rely on conversions allowed in a casting context, so they can be rewritten to use instanceof directly:

int i = 1000;
if (i instanceof byte) {     // false
  byte b = (byte) i;
  ... b ...
}

byte b = 42;
if (b instanceof int) {      // true
  int i = (byte) b;
  ... i ...
}

int i = 16_777_216;          // 2^24
if (i instanceof float) {    // true
  float f = (float) i;
  ... f ...
}

int i = 16_777_217;          // 2^24+1
if (i instanceof float) {    // false
  float f = (float) i;
  ... f ...
}

This JEP does not add any conversions to the casting context, nor creates any new conversion contexts. Whether instanceof is applicable to a given expression and type is determined entirely by whether there is already a conversion allowed by the casting context. The conversions permitted in casting context are as follows:

and specified combinations of these:

Consider the following examples. All of the following are allowed because the left-hand operand of instanceof, an expression e, can be converted to the specified type in a casting context:

int i = ...
i instanceof byte
i instanceof float

boolean b = ...
b instanceof Boolean

Short s = ...
s instanceof int
s instanceof long

long l = ...
l instanceof float
l instanceof double

Long ll = ...
ll instanceof float
ll instanceof double

However, all of the following examples raise a compile-time error, since they do not correspond to a pre-existing casting conversion:

boolean b = ...
b instanceof char    // error

Byte bb = ...
bb instanceof char   // error

Integer ii = ...
ii instanceof byte   // error
ii instanceof short  // error

Long ll = ...
ll instanceof int    // error
ll instanceof Float  // error
ll instanceof Double // error

If e has a reference type and the relational expression is null, instanceof continues to evaluate to false.

Exactness of Casting Conversions

A conversion is exact if no loss of information occurs. Whether a conversion is exact depends on the pair of types involved and potentially on the input value:

Adopting the notation from JLS (5.5) the primitive conversions in the following table show which conversions are unconditionally exact with the symbol ɛ. For completeness: - (no conversion allowed), (identity conversion), ω (widening primitive conversion), η (narrowing primitive conversion), ωη (widening and narrowing primitive conversion):

To → byte short char int long float double boolean
From ↓
byte ɛ ωη ɛ ɛ ɛ ɛ -
short η η ɛ ɛ ɛ ɛ -
char η η ɛ ɛ ɛ ɛ -
int η η η ɛ ω ɛ -
long η η η η ω ω -
float η η η η η ɛ -
double η η η η η η -
boolean - - - - - - -

Consider the following examples, the unconditionally exact conversions are marked with (ε), those always return true regardless the value, the rest of the results were obtained via a runtime check:

byte b = 42;
b instanceof int;         // true (ε)

int i = 1000;
i instanceof byte;        // false

int i = 42;
i instanceof byte;        // true

int i = 16_777_217;       // 2^24+1
i  instanceof float;      // false
i  instanceof double;     // true (ε)
i  instanceof Integer;    // true (ε)
i  instanceof Number;     // true (ε)

float f = 1000.0f;
f instanceof byte;        // false
f instanceof int;         // true
f instanceof double;      // true (ε)

double d = 1000.0d;
d instanceof byte;        // false
d instanceof int;         // true
d instanceof float;       // true

Integer ii = 1000;
ii instanceof int;        // true
ii instanceof float;      // true
ii instanceof double;     // true

Integer ii = 16_777_217;
ii instanceof float;      // false
ii instanceof double;     // true

Primitive type patterns

Type patterns currently do not allow primitive types when they are top-level, only when they appear in a nested pattern list of a record pattern. We lift that restriction so that primitives types are allowed in top-level as well.

The semantics of primitive type patterns (and reference type patterns on targets of primitive type) are derived from that of casting conversions.

Exhaustiveness

A switch expression requires that all statically known, possible values of the selector expression be handled in the switch block; in other words, the switch must be exhaustive. While a switch can be exhaustive if it contains an unconditional type pattern, it can be exhaustive in other occasions as well deferring any possibly unhandled cases at run-time (Patterns: Exhaustiveness, Unconditionality, and in Remainder). If a set of patterns is exhaustive for a type, we call the runtime values that are not matched by any pattern in the set the remainder of the set.

With pattern labels involving record patterns, some patterns are allowed to be exhaustive even when they are not unconditional. For example, the following switch is considered exhaustive on Box<Box<String>>, even though it will not match new Box(null):

Box<Box<String>> bbs = ...
switch (bbs) {
    case Box(Box(String s)): ...
}

The pathological value new Box(null) is part of that remainder set, and is handled by a synthetic default clause that throws MatchException.

With the introduction of primitive type patterns, we observe that unboxing follows the same philosophy. As a result, a type pattern int x is considered exhaustive on Integer, so the following switch is considered exhaustive on Box<Integer> even if Box(null) is not covered by Box(int i) (as is left as a remainder of the set):

Box<Integer> bi = ...
switch (bi) {
    case Box(int i): ...
}

Constant expressions in case labels

Turning to constant expressions in the case labels of a switch, the primitive types long, float, double, boolean, and their boxes can be associated with a switch block as long as the type of the selector expression (which can be a primitive type or a boxed reference type) is the same as the type of the constant expression.

For example, the constant expression 0f can only be used when the selector expression's type is float or Float:

float f = ...
switch (f) {
    case 0f -> 5f + 0f;
    case Float fi when fi == 1f -> 6f + fi;
    case Float fi -> 7f + fi;
}

Two floating-point numbers are the same per IEEE 754 if their finite values, the sign, exponent, and significand components of the floating-point values are the same. For that reason, representation equivalence defines how switch labels can be selected in the presence of non-integral or boolean values. The same definition is used to signal duplicate label errors in case a developer writes the following switch:

float f = ...
switch (f) {
    case 1.0f -> ...
    case 0.999999999f -> ...
    default -> ...
}

While 1.0f is represented as a float, 0.999999999f is not. The latter is rounded up to 1.0f as well, a situation that results in a compile-time error.

Since boolean (and its box) consist of only two distinct values, a switch that lists both the true and false cases is considered exhaustive:

boolean b = ...
switch (b) {
  case true -> ...
  case false -> ...
  // Alternatively: case true, false -> ...
}

It is a compile-time error for that switch to include a default clause.

Risks and Assumptions

Outside pattern matching and instanceof, lossy assignment is endemic in Java source code. For example if a method returns int then its result can be assigned to a float variable without casting:

int getSalary() { ... }
float salary = getSalary();

The risk is that Java developers do not realize the possible loss of range that can occur at this assignment, because it is silent.

We assume that developers of static analysis tools will realize the new role of instanceof, and avoid flagging code that uses converted data without a prior manual range-check while at the same time they are safeguarded by the extended instanceof.