JEP draft: Frozen Arrays (Preview)

OwnerJohn Rose
TypeFeature
ScopeJDK
StatusDraft
Created2021/02/03 00:59
Updated2021/02/08 09:46
Issue8261007

Summary

Introduce a new variation within the built-in Java array types, which is unmodifiable (shallowly immutable).

Frozen arrays can be safely shared without coordination or risk of unexpected modification. Freezing is a more efficient alternative to defensive copying, in that the copy can frequently be optimized away by the runtime.

This is a preview language and VM feature.

Goals

This JEP proposes minor changes to the Java Programming Language and Java Virtual Machine, which currently provide that all Java arrays are mutable. Users will be able to freeze any input array, yielding a so-called "frozen array" whose type, length, and contents are identical with the input array, but whose elements can never be reassigned. Frozen arrays will support certain optimizations and safety techniques (such as constant folding and defensive copying) which ordinary mutable arrays cannot support.

Non-Goals

None of the following plausible extensions to support immutable arrays are in scope for this JEP, though they could possibly be the subject of future work:

Other followup efforts may enhance existing APIs to take advantage of frozen arrays, or introduce new language features and APIs built on top of frozen arrays.

Motivation

Java programmers enjoy the use of compact and intuitive notations for working with Java arrays, as defined by the Java Language Specification. (We may call these "built-in Java arrays" when other kinds of arrays might be understood, such as off-heap arrays, or object-based array types such has some libraries supply.) Built-in arrays can be read and written using the familiar bracket notation a[i]. Because of their built-in status, Java arrays often offer better performance, in footprint and speed, than other kinds of linear collections (such as ArrayList or ByteBuffer).

Because Java arrays, as originally specified, are always modifiable, they cannot always be used safely. For example:

All of these problems stem from the fact that the mutability of array elements is a permanent feature in all arrays. It can be addressed by providing a way to create arrays whose elements are not modifiable.

In effect programmers are faced with an unpleasant and un-Java-like choice between reliability and performance, where the default action (omitting the defensive copy) leads to unreliability which is often difficult to diagnose.

The problems are exacerbated by the built-in status of Java arrays, and also their age. (Unlike List, built-in arrays have never not been a part of Java.) For those reasons, existing APIs, especially older ones (such as Core Reflection), use array types even when their use entails one or more of the above hazards. The problem cannot be shifted unless either the affected APIs are rewritten or until array behavior changes. This JEP does the latter.

Description

The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

What's a frozen array?

Given any array type T[] (including primitive-array types like int[]), instances of that type may be in the frozen state; such instances are frozen arrays. By contrast, before this JEP, all arrays are not frozen; they may be referred to as "modifiable" (or "mutable"). Any non-null reference of array type points to either a modifiable or a frozen array.

Any attempt to write an element of a frozen array elicits an exception of class ArrayStoreException (or a subclass).

Since the type system (of both language and VM) gives no way to distinguish between variables which always refer to mutable arrays, always refer to frozen arrays, or may refer to a mix of arrays, there will in general be a dynamic check (for "frozen-ness") as part of any store to an array (as a[i]=x, or bytecodes like iastore or aastore). This check is similar to the existing checks for null references (potentially raising NullPointerException) or reference array subtypes (potentially raising ArrayStoreException).

Like primitive wrapper types and strings, frozen arrays have identity and may be synchronized upon. As with the wrappers and strings, programmers are encouraged to avert their gaze from these features.

Where do frozen arrays come from?

Because a frozen array is a new configuration of data on the Java heap, at least one special low-level JVM method is required in order to create frozen arrays.

A separate JEP, JDK-8261099, proposes a low-level unsafe JVM method for locking down a previously mutable array "in place", after which it is frozen. Such a method must be used very carefully, but it is possibly the original source of all frozen arrays described in this JEP. In the safe user model proposed by this JEP, all factories of frozen arrays appear to create fresh arrays which have been frozen from the beginning ("frozen from birth").

Since frozen arrays are "frozen from birth", it follows that their factory methods must be passed all elements to be stored in the new frozen array.

Thus, one or more low-level static factory methods will be created which allocate arrays in the frozen states. The input to such a method will be an array (either frozen or mutable) which contains the elements to be stored in the new frozen array, and possibly starting and ending indexes within the input array (cf. Arrays.copyOfRange).

As a straw-man example of the simplest possible API, a method System.arrayfreeze could be created, by analogy with System.arraycopy. Instead of destination arguments, it would return a frozen array which contains exactly the indicated source arguments:

T[] src = ...;
T[] dest = System.arrayfreeze(src, 0, src.length);

This method would internally use an unsafe primitive for freezing an array in place, not available to general users. The method would look something like this:

Object arrayfreeze(Object src, int beg, int end) {
  Object dest = Arrays.newInstance(src.getClass().getComponentType(), end-beg);
  arraycopy(src, beg, dest, 0, dest.length);
  UNSAFE.freeze(dest);  // privileged and unsafe
  return dest;
}

Please Note: In this user model, freezing a mutable array will always return a different array. A single array instance cannot transition between the mutable and frozen conditions. The exception to this rule is code which uses a privileged low-level "freeze-in-place" method described in JDK-8261099.

A mutable array can always be created from a possibly-frozen array by means of the clone operation. That is, clone never returns a frozen array, when applied to an array operand.

All these details may change.

We do not intend to add any new bytecode instructions.

How do I use a frozen array?

Regardless of the low-level details of frozen array creation, a limited number of utility methods will be created to encourage users to adopt them for defensive copies.

Performance model of freezing and cloning

The specification of every frozen array factory will include the provision that the JVM is always free to recycle any previous frozen array to produce the result. That is, in response to a request for a frozen array containing values x[i], if the JVM can identify an already existing frozen array a whose elements are the same (in the same order) as the x[i], then the request may be fulfilled with a reference to a, rather than to a newly-created frozen array.

The recycling optimization is likely to occur (and may even be guaranteed) in the following circumstances:

The JIT may be able to perform additional optimizations of this sort after inlining. In both the interpreter and JIT, a long chain of defensive copies can be collapsed to less expensive operation. The user model can be summarized as:

Trusted code which is performance sensitive may use a restricted unsafe primitive method (of JDK-8261099) to create new arrays and freeze them in place without the extra copying step. Using the unsafe primitive should not be done unless there is a special reason the JIT cannot optimize the code using the more generic techniques outlined in this JEP. Users of the unsafe primitive outside of java.base are likely to have their code break when the internal implementation of frozen arrays changes.

Language changes and translation strategy

The exceptions throwable from an assignment expression, when the target is an array element, must be amended to include the possibility of an ArrayStoreException when the destination array is frozen.

(Note that this is a fundamentally new dynamic check and exception for primitive arrays and Object arrays, which previously could not raise an ArrayStoreException when written to.)

The syntax a.freeze() requires special permission from the Java Language Specification, analogous to clone.

It seems most undesirable to hardwire freeze into the Java VM in the same way clone was hardwired, using ad hoc special pleading to give clone a special access mode and verifier type. Therefore, we will consider injecting the freeze method using a more scalable technique that does not affect the VM specification. For example, a.freeze() may be defined as "sugar" for a static method call, such as java.lang.ArrayMethods.freeze(a). This technique unlocks follow-on opportunities to connect to additional methods of java.util.Arrays, without changing either the languauge or the VM.

Serialization

Serialization will be enhanced to transmit the frozen status of arrays. It seems sad to do this, but it can be done fairly simply on top of the existing code, and doing so would seem to be a way to forestall serialization-related security bugs that could arise if data structurues which are relying on frozen arrays suddenly became internally mutable in the elements of those arrays, due to a deserialization trick. The rejoinder to this point is that properly written readObject methods already do a clone and would be changed to call freeze instead. The rejoinder to this rejoinder is that it will be less surprising to end users if passing an array through a serialization/deserialization cycle preserves every bit of that array's structure, not just the type, length, and elements, but also the frozen bit.

JNI and Unsafe

The JNI operations the allow mutation of arrays should at least warn if they are applied to frozen arrays. (Perhaps it could be a warning plus a no-op.) Code which uses Unsafe to poke new values into arrays will need to check the isFrozen bit, in general.

No type system effects

No type system, either that of the language, the method and field descriptors, the bytecode verifier, the dynamically checkable types (via instanceof), or the reflective Class system, will make a distinction between frozen and mutable arrays. Only the dynamic isFrozen check will be available to apply to instances.

In particular, the system will not enforce deep immutability of nested arrays. (It may create and observe such structures, and arrange its optimizations accordingly, as long as this is done invisibly to users.)

Java memory model

The JMM may be updated to state that the elements of a frozen array have the same effects on safe publication as final fields. In effect a frozen array acts like an all-final object, whose fields are referenced by index (aaload, iaload, etc.) rather than by name (getfield).

The JMM has a "freeze" operation for objects with final fields, which occurs at the end of the constructor invocation; this "freeze" operation will also apply to the creation of a frozen array.

(The privileged low-level "freeze-in-place" method is also a "freeze" operation with respect to the JMM; this internal method does not need to be mentioned in the JMM, unless we figure out a way to make it safely exposed in the public user model.)

Example

class Node {
  private final String label;
  private final Node[] children;  //always frozen
  public Node[] children() { return children; }
  public String label() { return label; }
  public Node(String l, Node... cn) {
    cn = cn.freeze(); // O(1) if already frozen
    label = l;
    children = cn;
  }
}
class NodeUser {
  Node defPair(Node a, Node b) {
    Node[] ab = { a, b };
    ab = ab.freeze();  // JIT might optimize this locally
    return new Node("pair", ab);
  }
  void use(Node node) {
    Node[] frozenNodes = node.children();
    Node[] mutableNodes frozenNodes.clone();
    Arrays.sort(mutableNodes);
    doStuff(mutableNodes);
    doStuff(frozenNodes);  // still safe to use these
  }
}

Alternatives

Unmodifiable lists created by List.of are a recent workaround for a parallel set of problems with mutable lists. Programmers are encouraged to use such lists when possible in preference to built-in Java arrays. However, this is not always possible, since an existing API may require the use of built-in arrays, or programmers may require the speed and footprint characteristics of built-in arrays, or they may simply stubbornly prefer the built-in Java expression notation for working with built-in arrays.

We could wait for a larger virtualization effort ("Arrays 2.0") at which point adding frozen arrays would be as simple as adding List.of was. (Which wasn't actually as simple as one might have thought.) But that would surely put off for many years the specific benefits of frozen arrays in today's language and APIs.

We could wait for primitive classes and then cross-apply the concepts to built an enhanced "frozen identity-free" array, which acts like a primitive object. However, that is not a universally applicable patch to the problems which identity-laden frozen arrays address in the present JEP. In particular, the performance model of "frozen identity-free" arrays would presumably suffer from the surprising requirement that the == operator (and the acmp instructions) must perform O(N) work instead of O(1) work in order to prove that two frozen arrays are distinct.

We could try to build in some sort of "lock" operation which converts part or all of an array in place to a frozen status, either permanently or until a corresponding "unlock" operation, or build some sort of "immutable view" of a mutable backing array, like the Collection APIs. Such mechanisms are more complex and costly than the present proposal, and are thus harder to use correctly, as well as being less appealing as an alternative to defensive copying. Allowing an array to change its mutability state creates a new kind of side channel with new concurrency failure modes, and so seems to be a poor solution to many of the problems mentioned above.

Risks and Assumptions

The extra checks require some careful optimizer work, analogous to null pointer removal. These might cause performance problems, though not relative to plain clone, of course.

If the user model is not simple enough, or there are unpleasant surprises, or the performance is not good enough, or if retrofitted APIs are hard to roll out, adoption may be weak.

Retrofitting array-returning APIs to return frozen arrays will be a delicate operation, because of use cases in the wild (perhaps 0.0001% of them) where somebody has decided it would be clever to mutate a returned array.

This JEP assumes we can afford to add (as a preview) a handful of factories and other API points for arrays, notably the a.freeze() notation, as discussed above. Alternatively, we could stage all of these API points as static methods on jdk.internal.FrozenArrays, for use internally within java.base, and then roll them out as user-visible API points in a separate step.

Dependencies and Future Work

There are no dependencies on existing work.

Some engineering of array store checks for Project Valhalla may cross-apply to the mutability store checks required by this JEP.

If we decide to forbid synchronization on frozen arrays, some work for primitive classes (Valhalla again) can cross-apply.

It seems very desirable to add a way for varargs methods to request frozen input arrays. This would have knock-on benefits for both footprint and security. Inside the method, there would be no need for annotations of the form @SafeVarargs. Outside the method, clients could preferentially pass frozen arrays instead of mutable ones.

If we do frozen varargs, we should consider using a related syntax (a contextual keyword?) to declare initialized arrays and for array creation expressions:

int sumInts(__Frozen int... ints) {
  // bytecode does ints = ints.freeze()
  int sum = 0;
  for (int i : ints)  sum += i;
  return sum;
}
void foo() {
  sum();  // passes frozen zero-length array?
  sum(1,2,3);  // passes frozen length-3 array?
}
void bar() {
  __Frozen int[] stuff = { 1,2,3 };
  assert stuff.isFrozen();
  sum(stuff);
  sum(new __Frozen int[]{ 4,5,6 });
}

We will want to retrofit many APIs in java.base. Examples of API points which could profit from frozen arrays:

In many of these cases, it is likely that backward compatibility with a few existing clients may force us to continue to return mutable arrays. This situation can sometimes be remedied by deprecating the existing API point, in favor of a new one which returns the "real" frozen array. The existing API point can be documented as equivalent to running clone on the result of the new API point, For example:

public interface Collection<T> {
  /** Returns a frozen array containing all of the elements in this collection,
   * as if by {@code this.toArray().freeze()}.
   */
  default Object[] freezeArray() {
    return toArray().freeze();
  }
}

The naming of such new API points will be a delicate matter of balancing conflicting concerns, to say nothing of bikeshed-painting.

See non-goals above for more possible follow-on features.