JEP draft: Primitive types in patterns, instanceof, and switch

OwnerAngelos Bimpoudis
TypeFeature
ScopeSE
StatusSubmitted
Componentspecification / language
Discussionamber dash dev at openjdk dot org
EffortM
DurationM
Reviewed byAlex Buckley
Created2022/06/15 10:05
Updated2023/03/21 19:28
Issue8288476

Summary

Enhance pattern matching by allowing primitive type patterns to be used in all pattern contexts. Align the semantics of primitive type patterns with instanceof. Extend switch to allow primitive constants as case labels. This is a preview language feature.

Goals

Non-Goals

Motivation

Record classes and record patterns work together to streamline data processing in Java, because records make it easy to freely aggregate components, and dually record patterns decompose that aggregate using pattern matching.

For example, a JSON model can be encoded with a sealed hierarchy according to its specification as follows:

sealed interface JsonValue { 
  record JsonString(String s) implements JsonValue { }
  record JsonNumber(double d) implements JsonValue { } 
  record JsonNull() implements JsonValue { }
  record JsonBoolean(boolean b) implements JsonValue { }
  record JsonArray(List<JsonValue> values) implements JsonValue { }
  record JsonObject(Map<String, JsonValue> pairs) implements JsonValue { }
}

It is noteworthy that in JSON the number type represents either integers or floating-point numbers: the double data type is the widest possible primitive type that can represent both. Assuming a JSON payload of { "name":"John", "age":30 }, a Java developer could represent the same information in the JsonValue domain by running the following code:

var json = new JsonObject(Map.of(
  "name", new JsonString("John")
  "age", new JsonNumber(30)))

The previous code initializes two key-value pairs and for each key a record class is instantiated. For the first, the value John has the same type as the String s component. However for the second, a widening primitive conversion is applied to 30: from int to the type of the record component, double. To disaggregate the json value, Java offers record patterns, however, primitive type patterns in a nested context already expose a limitation. While it would be desirable to recover the int value provided to the JsonNumber constructor, currently Java developers can only extract the double value, and rely on lossy manual casts:

record Customer(String name, int age) {}
...
if (j instanceof JsonObject(var pairs)
    && pairs.get("name")     instanceof JsonString(String name)
    && pairs.get("age")      instanceof JsonNumber(double age)) { 
    int age2 = (int) age;    // extraneous cast is unavoidable
    if (age2 < 0 || age2 > 125) {  /* ... error handling ... */  }
    ... orderIds ...
    Customer c = new Customer(name, age2); 
}

While developers can inline custom-validation logic, unfortunately the manual conversion is still unavoidable:

if (j instanceof JsonObject(var pairs)
    && pairs.get("name") instanceof JsonString(String name)
    && pairs.get("age") instanceof JsonNumber(double age) && ageValidation((int) age)) { 
    ... orderIds ...
    Customer c = new Customer(name, (int) age); 
}

It would be ideal if developers could use int directly without relying on manual casts or conversions:

if (j instanceof JsonObject(var pairs)
    && pairs.get("name") instanceof JsonString(String name)
    && pairs.get("age") instanceof JsonNumber(int age) && ageValidation(age)) { 
    ... orderIds ..
    Customer c = new Customer(name, age); 
}

Pattern matching automatically attempts such narrowing conversions for reference types. In the example below, new Box(new RedBall()) is widening a RedBall type to Object, while pattern matching with Box(RedBall r) ensures that if b is a Box that holds a RedBall, then o can be safely casted to RedBall which is also the type of r inside the if:

record Box(Object o){}

Box b = new Box(new RedBall());          // automatic widening conversion
if (b instanceof Box(RedBall r)) { ... } // automatic narrowing conversion

Unfortunately, primitives are limited; if the type in the component is T then the type in the primitive type pattern must be T as well (the type of the use-site is invariant). Being able to vary double to int in a primitive type pattern is characteristic of the pattern matching ability to reject illegal state automatically similarly to the previous example. Back to the JSON example if the age element was a huge number then instanceof JsonNumber(int age) would fail and the if-branch would not be taken. Pattern matching means that wherever there would be a potentially unsafe cast it would be made safe by raising match failures to control flow decisions. Primitive type patterns should not mean something different from reference type patterns; they both mean "can this value be safely cast."

Primitive type patterns are useful outside record patterns as well. The ability of a primitive type pattern to label the primitive value that was matched is extremely helpful. For example, in the following switch, the primitive pattern int i is better than a default and avoids the need for one:

switch (x.getStatus()) {
  case 0 -> ...
  case 1 -> ...
  case 2 -> ...
  case int i -> ... 
}

The case int i serves as a remainder in all of the previous switch expressions.

Supporting primitive type patterns means that guards can also be used to further restrict the values matched by a case:

switch (x.getYearlyFlights()) {
  case 0 -> ...
  case 1 -> ...
  case 5 -> issueDiscount();
  case int i when i > 100 -> issueGoldCard();
  case int i -> ... 
}

Combining primitive type patterns and record patterns facilitates further opportunities for case analysis even within nested record patterns:

switch (x.order()) {
  case NormalOrder(Product(int productCode)) -> ...
  case BadOrder x -> switch (x.reason()) {
    case MissingProduct q -> switch (q.code()) {
      case 1     -> ...
      case 2     -> ...
      case int i -> ...
    }
  }
}

Before this JEP, code that involved float, double, long, and boolean needed to rely on manual conversions once again. For example in the following code, a long value type could be inspected with a switch if and only if the values were actually in the int range; code with long integers was impossible:

long type = ...;
...
if (type >= Integer.MIN_VALUE && type <= Integer.MAX_VALUE) {
  switch ((int) type) {
      case 0x01      -> ...
      case 0x02      -> ...
      case int i     -> ... 
  }
}

if (type == 10_000_000_000L) { ... }
if (type == 20_000_000_000L) { ... }

Consequently, it would make sense for case labels to allow constant expressions of any primitive type, including float, double, long, and boolean. For example:

long type = ...;
switch (type) {
    case 0x01               -> ...
    case 0x02               -> ...
    case 10_000_000_000L    -> ...
    case 20_000_000_000L    -> ...
    case long l             -> ... l ...
}

or current code that uses if-else chains to test floats:

float f = ...;
if (Float.isNaN(f)) { 
  ... 
}
else if (Float.isInfinite(f)) { 
  ... 
}
else { ... }

will be decluttered into:

float f = ...;
switch (f) {
    case Float.NaN    -> ...
    case Float.POSITIVE_INFINITY -> ...
    case Float.NEGATIVE_INFINITY -> ...
    case float g -> ... g ...
}

The Boolean switch would be a useful alternative to the conditional operator (?:) when making inline decisions. Unlike the conditional operator, a boolean switch expression can contain both expressions and statements in its true and false arms. For example, in the method call below, the second argument uses a boolean switch to encapsulate some business logic:

startProcessing(OrderStatus.NEW, switch (user.isLoggedIn()) {
    case true  -> user.id();
    case false -> { log("Unrecognized user"); yield -1; }
});

It would be ideal if the primitive-supporting switch could automatically perform reasonable conversions between the type of its expression and the types of its case labels. For example, if the expression is of type float, then the case labels could be of type float, double, int, or long. However, the loss of precision and range that can occur with other automatic conversions is best avoided. In the following example, switch accepts a float but its case labels are integral values that (as described earlier) convert to the same float value; in other words, the cases are indistinguishable at run time, and the code would be rejected.

float f = ...;
switch (f) {
    case 16_777_216 -> ...
    case 16_777_217 -> ...
    default -> ...
}

Turning to the instanceof type comparison operator, its semantics can be naturally derived from pattern matching. Assuming the type pattern String s, pattern matching safeguards that s has the correct run-time type before it casts o to String:

Object o = ...
if (o instanceof String s) { // type pattern
    ... s.isEmpty() ...      // will execute without error
}

Pattern matching the String s type pattern means safeguarding the casting conversion to String. The previous code denotes that if instanceof succeeds, then casting o to the reference type String will succeed, and the resulting object will be non-`null``:

Object o = ...
if (o instanceof String) {   // type comparison
    String s = (String) o;
    ... s.isEmpty() ...
}

Lifting restrictions to type patterns means that instanceof must now be able to safeguard any casting conversion supported by Java, since a type pattern byte b would imply a casting conversion of a primitive type int to byte:

int i = ...
if (i instanceof byte) {   
    byte b = (byte) i;
    ... b ...
}

As with casting among references, many casting conversions between primitives are unsafe too. However, applying a cast between primitives will not fail (as with reference types); a cast between primitives will accommodate a representation mismatch with potential information loss. As a result, applying a cast may lead to loss of information about e.g., magnitude and sign, likely causing bugs. For example, if the int variable i holds 1000 then the value of (byte) i will become -24. Pattern matching and occasionally Java developers, must safeguard casts by checking, for example, that a 32-bit int can be represented by an 8-bit byte:

int i = ...;
if (i >= -128 && i <= 127) {
    byte b = (byte) i;
    ... b ...
}

instanceof is in principle about asking whether an upcoming cast of a value to a type would succeed without loss of information or error. When instanceof returns true, the program has gained information: a value can be safely cast and the program knows a sharper type for that value than previously known. It would be ideal to remove the restrictions from instanceof and extend those safeguarding and sharpening semantics to conversions involving primitive types as well. instanceof for a primitive type would succeed if a conversion exists and can be performed without loss of magnitude, sign, precision, or range, thus defending against lossy casts between primitive types.

In summary, primitive types in instanceof, and in type patterns for instanceof and switch, would increase program reliability and enable more uniform data exploration with pattern matching. This JEP removes the following restrictions:

Description

Primitive Type Patterns

Type patterns currently do not allow primitive types when they are top-level, only when they appear in a nested pattern list of a record pattern. We lift that restriction so that primitives types are allowed in top-level as well.

The semantics of primitive type patterns (and reference type patterns on targets of primitive type) are derived from casting conversions.

A type pattern T t is applicable to a target of type U if a U could be cast to T without an unchecked warning.

A type pattern T t is unconditional on a target of type U if all values of U can be exactly cast to T. This includes widening from one reference type to another, widening from one integral type to another, widening from one floating-point type to another, widening from byte, short, or char to a floating point type, widening int to double, and boxing.

A set of patterns containing a type pattern T t is exhaustive on a target of type U if T t is unconditional on U or if there is an unboxing conversion from T to U.

A type pattern T t dominates a type pattern U u, or a record pattern U(...), if T t would be unconditional on a target of type U.

A type pattern T t that does not resolve to any pattern matches a target u if u instanceof T.

With pattern labels involving record patterns, some patterns are allowed to be exhaustive even when they are not unconditional. For example, the following switch is considered exhaustive on Box<Box<String>>, even though it will not match new Box(null):

Box<Box<String>> bbs = ...
switch (bbs) {
    case Box(Box(String s)): ...
}

The pathological value new Box(null) is considered "remainder", and is handled by a synthetic default clause that throws MatchException. Unboxing follows the same philosophy, being allowed even when there are pathological values that cannot be converted (a null boxed value), because it would be burdensome to require a null check every time we want to unbox. Similarly, novel subtypes (those not known at compile time) of sealed types are considered "remainder" at runtime. This accommodation is made because requiring users to specify all possible combinations of pathological values would be tedious and impractical.

Analogously, a type pattern int x is considered exhaustive on Integer, so the following switch is considered exhaustive on Box<Integer> for the same reason:

Box<Integer> bi = ...
switch (bi) {
    case Box(int i): ...
}

Primitive Types in instanceof

As of Java 16, the instanceof operator is either a type comparison operator or a pattern match operator, depending on its syntactic form.

When instanceof is a type comparison operator, support for primitive types is realized by removing the restrictions that (1) the type of the left-hand operand must be a reference type, and (2) the right-hand operand must name a reference type. The form of a type comparison operator becomes:

InstanceofExpression:
    RelationalExpression instanceof Type
    ...

Before this JEP, the result of a type comparison operator was false if the value was the null reference, true if the value could be cast to the right-hand operand without raising a ClassCastException, and false otherwise. This JEP generalizes an expression e instanceof T as if asking whether a value e of static type S can be converted to the given primitive or reference type T in a casting context (JLS 5.5) without error or loss of information. This makes instanceof the precondition test for safe casting in general.

Under this generalization, the instanceof type comparison operator is defined to work for all pairs of types that are allowed to be converted in a casting context. Before this JEP, pairs between reference types that are not supported, a compile-time error occurs. Under this JEP, type-checking instanceof continues to follow the rules of cast conversions and for pairs between both reference and primitive types that are not supported, a compile-time error occurs. The examples given earlier rely on conversions allowed in a casting context, so they can be rewritten to use instanceof directly:

int i = 1000;
if (i instanceof byte) {     // false
  byte b = (byte) i;
  ... b ...
}

byte b = 42;
if (b instanceof int) {      // true
  int i = (byte) b;
  ... i ...
}

int i = 16_777_216;          // 2^24
if (i instanceof float) {    // true
  float f = (float) i;
  ... f ...
}

int i = 16_777_217;          // 2^24+1
if (i instanceof float) {    // false
  float f = (float) i;
  ... f ...
}

This JEP does not add any conversions to the casting context, nor creates any new conversion contexts. Whether instanceof is applicable to a given expression and type is determined entirely by whether there is already a conversion allowed by the casting context. The conversions permitted in casting context are as follows:

and specified combinations of these:

The following tables present all the pairs where instanceof is defined. This JEP does not propose any changes to those tables.

To → byte short char int long float double boolean
From ↓
byte -
short -
char -
int -
long -
float -
double -
boolean - - - - - - -
To → byte short char int long float double boolean
From ↓
Byte - -
Short - - -
Character - - -
Integer - - - -
Long - - - - -
Float - - - - - -
Double - - - - - - -
Boolean - - - - - - -
Object
To → Byte Short Character Integer Long Float Double Boolean Object
From ↓
byte - - - - - - -
short - - - - - - -
... ... ... ... ... ... ... ... ... ...
Byte - - - - - - -
Short - - - - - - -
... ... ... ... ... ... ... ... ... ...
Object

Consider the following examples. All of the following are allowed because the left-hand operand of instanceof, an expression e, can be converted to the specified type in a casting context:

int i = ...
i instanceof byte
i instanceof float

boolean b = ...
b instanceof Boolean

Short s = ...
s instanceof int
s instanceof long

long l = ...
l instanceof float
l instanceof double

Long ll = ...
ll instanceof float
ll instanceof double

However, all of the following examples raise a compile-time error, since they do not correspond to a pre-existing casting conversion:

boolean b = ...
b instanceof char    // error

Byte bb = ...
bb instanceof char   // error

Integer ii = ...
ii instanceof byte   // error
ii instanceof short  // error

Long ll = ...
ll instanceof int    // error
ll instanceof Float  // error
ll instanceof Double // error

If e has a reference type and the relational expression is null, instanceof continues to evaluate to false.

Exactness of Conversions

A conversion is exact if no loss of information occurs. Whether a conversion is exact depends on the pair of types involved and potentially on the input value:

Adopting the notation from JLS (5.5) the primitive conversions in the following table show which conversions are unconditionally exact with the symbol ɛ. For completeness: - (no conversion allowed), (identity conversion), ω (widening primitive conversion), η (narrowing primitive conversion), ωη (widening and narrowing primitive conversion):

To → byte short char int long float double boolean
From ↓
byte ɛ ωη ɛ ɛ ɛ ɛ -
short η η ɛ ɛ ɛ ɛ -
char η η ɛ ɛ ɛ ɛ -
int η η η ɛ ω ɛ -
long η η η η ω ω -
float η η η η η ɛ -
double η η η η η η -
boolean - - - - - - -

Consider the following examples, the unconditionally exact conversions are marked with (ε), those always return true regardless the value, the rest of the results were obtained via a runtime check:

byte b = 42;
b instanceof int;         // true (ε)

int i = 1000;
i instanceof byte;        // false

int i = 42;
i instanceof byte;        // true

int i = 16_777_217;       // 2^24+1
i  instanceof float;      // false
i  instanceof double;     // true (ε)
i  instanceof Integer;    // true (ε)
i  instanceof Number;     // true (ε)

float f = 1000.0f;       
f instanceof byte;        // false    
f instanceof int;         // true
f instanceof double;      // true (ε)

double d = 1000.0d;
d instanceof byte;        // false
d instanceof int;         // true
d instanceof float;       // true

Integer ii = 1000;
ii instanceof int;        // true
ii instanceof float;      // true
ii instanceof double;     // true

Integer ii = 16_777_217;
ii instanceof float;      // false
ii instanceof double;     // true

Constant Expressions in case labels

Turning to constant expressions in the case labels of a switch, the primitive types long, float, double, boolean, and their boxes can be associated with a switch block as long as the type of the selector expression (which can be a primitive type or a boxed reference type) is the same as the type of the constant expression.

For example, the constant expression 0f can only be used when the selector expression's type is float or Float:

float f = ...
switch (f) {
    case 0f -> 5f + 0f;
    case Float fi when fi == 1f -> 6f + fi;
    case Float fi -> 7f + fi;
}

Two floating-point numbers are the same per IEEE 754 if their finite values, the sign, exponent, and significand components of the floating-point values are the same. For that reason, representation equivalence defines how switch labels can be selected in the presence of non-integral or boolean values. The same definition is used to signal duplicate label errors in case a developer writes the following switch:

float f = ...
switch (f) {
    case 1.0f -> ...
    case 0.999999999f -> ...
    default -> ...
}

While 1.0f is represented as a float, 0.999999999f is not. The latter is rounded up to 1.0f as well, a situation that results in a compile-time error.

Since boolean (and its box) consist of only two distinct values, a switch that lists both the true and false cases is considered exhaustive:

boolean b = ...
switch (b) {
  case true -> ...
  case false -> ...
  // Alternatively: case true, false -> ...
}

It is a compile-time error for that switch to include a default clause.

Risks and Assumptions

Outside pattern matching and instanceof, lossy assignment is endemic in Java source code. For example if a method returns int then its result can be assigned to a float variable without casting:

int getSalary() { ... }
float salary = getSalary();

The risk is that Java developers do not realize the possible loss of range that can occur at this assignment, because it is silent.

We assume that developers of static analysis tools will realize the new role of instanceof, and avoid flagging code that uses converted data without a prior manual range-check while at the same time they are safeguarded by the extended instanceof.