JEP draft: Frozen Arrays (Preview)

Owner	John Rose
Type	Feature
Scope	JDK
Status	Draft
Created	2021/02/03 00:59
Updated	2021/02/08 09:46
Issue	8261007

Summary

Introduce a new variation within the built-in Java array types, which is unmodifiable (shallowly immutable).

Frozen arrays can be safely shared without coordination or risk of unexpected modification. Freezing is a more efficient alternative to defensive copying, in that the copy can frequently be optimized away by the runtime.

This is a preview language and VM feature.

Goals

This JEP proposes minor changes to the Java Programming Language and Java Virtual Machine, which currently provide that all Java arrays are mutable. Users will be able to freeze any input array, yielding a so-called "frozen array" whose type, length, and contents are identical with the input array, but whose elements can never be reassigned. Frozen arrays will support certain optimizations and safety techniques (such as constant folding and defensive copying) which ordinary mutable arrays cannot support.

Non-Goals

None of the following plausible extensions to support immutable arrays are in scope for this JEP, though they could possibly be the subject of future work:

Language support for directly declaring variables which are frozen arrays, for array creation expressions that directly create frozen arrays, or for variable arity methods which receive their arguments as frozen arrays. (But see notes under "Future Work".)
An identity-free variation of frozen arrays whose instances are not only arrays but in fact primitive objects.
While allowing identity to frozen arrays, we could borrow the idea of a non-synchronizable references from primitive objects. (This would seem to be a well-intentioned irregularity of the sort that just buys trouble.)
Adding more (non-static) utility methods Java arrays.
The ability to forcibly "recycle" a user-created mutable array as a frozen array by telling it to "freeze itself". Although the JVM can sometimes do this under the covers, calling "freeze" on a mutable array will always return a different array.
Transactional and/or confinement and/or slicing array operations, which allow one or more element positions to change between mutable and frozen status, or which provide restricted views on larger and/or mutable and/or larval and/or globally shared arrays.
Retrofitting existing APIs (such as Core Reflection) to make use of frozen arrays where they presently use mutable arrays. (But see notes under "Future Work".)
Direct support for deeply nested immutable array-based data structures.
Direct support in the class file format for creating and initializing frozen arrays. (E.g., mechanisms which condy or indy could use to pull "read only data" directly from the class file, as with .rodata segments in other systems.)
Adding analogous support to List types (beyond List.of, etc.), such as support for list-based initialized declarations, constant or template expressions, or varargs.
New types or implementations of arrays, or virtualized representations of Java arrays, sometimes loosely designated as "Arrays 2.0".
Freezing operations for Java types other than arrays.
Changes to arrays to make them appear more "value-based" or identity hostile, such as list-like equals, hashCode, and toString methods.

Other followup efforts may enhance existing APIs to take advantage of frozen arrays, or introduce new language features and APIs built on top of frozen arrays.

Motivation

Java programmers enjoy the use of compact and intuitive notations for working with Java arrays, as defined by the Java Language Specification. (We may call these "built-in Java arrays" when other kinds of arrays might be understood, such as off-heap arrays, or object-based array types such has some libraries supply.) Built-in arrays can be read and written using the familiar bracket notation a[i]. Because of their built-in status, Java arrays often offer better performance, in footprint and speed, than other kinds of linear collections (such as ArrayList or ByteBuffer).

Because Java arrays, as originally specified, are always modifiable, they cannot always be used safely. For example:

A method that returns multiple results in an array cannot return them in an array which has been cached, since one client may "clobber" the array making it unfit for use by future clients.
A method that receives multiple arguments using an array (say, via the "varargs" feature), cannot cache that array, because the caller may recycle the argument array for a future call with different arguments.
A shared array variable (such as a global static constant or an array stored in a shared object) cannot in general be publicly exposed for sharing, because a malfunctioning client might clobber an element of the array, creating a race condition, and making other threads see an unpredictable mix of the old and new element values.
The workaround for the three previous problems is returning or making a so-called "defensive copy" as needed. But defensive copies are easy to omit accidentally, and their inherent cost tempts programmers to "optimize them away", sometimes erroneously.
APIs which are structured as multiple layers or encapsulations will require in general a defensive copy every time an array crosses into or out of an abstraction boundary. This burdens abstract, well-factored code with the unfair cost of deeply nested copy-chains of array data.
More subtly, a method receiving an array of arguments (whether "varargs" or not) cannot trust that those arguments to remain stable even during its own execution, since there may be side effects impinging on that array concurrently from other threads. This has been known to lead to so-called "TOCTTOU" bugs, where the method trusts arguments that have changed after they were checked.
Side effects to arrays are a potential source of side channel creation, perhaps usable to export information by malware.

All of these problems stem from the fact that the mutability of array elements is a permanent feature in all arrays. It can be addressed by providing a way to create arrays whose elements are not modifiable.

In effect programmers are faced with an unpleasant and un-Java-like choice between reliability and performance, where the default action (omitting the defensive copy) leads to unreliability which is often difficult to diagnose.

The problems are exacerbated by the built-in status of Java arrays, and also their age. (Unlike List, built-in arrays have never not been a part of Java.) For those reasons, existing APIs, especially older ones (such as Core Reflection), use array types even when their use entails one or more of the above hazards. The problem cannot be shifted unless either the affected APIs are rewritten or until array behavior changes. This JEP does the latter.

Description

The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

What's a frozen array?

Given any array type T[] (including primitive-array types like int[]), instances of that type may be in the frozen state; such instances are frozen arrays. By contrast, before this JEP, all arrays are not frozen; they may be referred to as "modifiable" (or "mutable"). Any non-null reference of array type points to either a modifiable or a frozen array.

Any attempt to write an element of a frozen array elicits an exception of class ArrayStoreException (or a subclass).

Since the type system (of both language and VM) gives no way to distinguish between variables which always refer to mutable arrays, always refer to frozen arrays, or may refer to a mix of arrays, there will in general be a dynamic check (for "frozen-ness") as part of any store to an array (as a[i]=x, or bytecodes like iastore or aastore). This check is similar to the existing checks for null references (potentially raising NullPointerException) or reference array subtypes (potentially raising ArrayStoreException).

Like primitive wrapper types and strings, frozen arrays have identity and may be synchronized upon. As with the wrappers and strings, programmers are encouraged to avert their gaze from these features.

Where do frozen arrays come from?

Because a frozen array is a new configuration of data on the Java heap, at least one special low-level JVM method is required in order to create frozen arrays.

A separate JEP, JDK-8261099, proposes a low-level unsafe JVM method for locking down a previously mutable array "in place", after which it is frozen. Such a method must be used very carefully, but it is possibly the original source of all frozen arrays described in this JEP. In the safe user model proposed by this JEP, all factories of frozen arrays appear to create fresh arrays which have been frozen from the beginning ("frozen from birth").

Since frozen arrays are "frozen from birth", it follows that their factory methods must be passed all elements to be stored in the new frozen array.

Thus, one or more low-level static factory methods will be created which allocate arrays in the frozen states. The input to such a method will be an array (either frozen or mutable) which contains the elements to be stored in the new frozen array, and possibly starting and ending indexes within the input array (cf. Arrays.copyOfRange).

As a straw-man example of the simplest possible API, a method System.arrayfreeze could be created, by analogy with System.arraycopy. Instead of destination arguments, it would return a frozen array which contains exactly the indicated source arguments:

T[] src = ...;
T[] dest = System.arrayfreeze(src, 0, src.length);

This method would internally use an unsafe primitive for freezing an array in place, not available to general users. The method would look something like this:

Object arrayfreeze(Object src, int beg, int end) {
  Object dest = Arrays.newInstance(src.getClass().getComponentType(), end-beg);
  arraycopy(src, beg, dest, 0, dest.length);
  UNSAFE.freeze(dest);  // privileged and unsafe
  return dest;
}

Please Note: In this user model, freezing a mutable array will always return a different array. A single array instance cannot transition between the mutable and frozen conditions. The exception to this rule is code which uses a privileged low-level "freeze-in-place" method described in JDK-8261099.

A mutable array can always be created from a possibly-frozen array by means of the clone operation. That is, clone never returns a frozen array, when applied to an array operand.

All these details may change.

We do not intend to add any new bytecode instructions.

How do I use a frozen array?

Regardless of the low-level details of frozen array creation, a limited number of utility methods will be created to encourage users to adopt them for defensive copies.

For any array, the syntax a.freeze() will be accepted, with a meaning closely analogous to a.clone(). (The only difference is in the mutability status of the respective return values.)
In the class java.util.Arrays, the methods copyOf and copyOfRange will be accompanied by analogous methods freeze and freezeRange.
The low-level static boolean method System.isFrozenArray (and/or java.util.Arrays.isFrozen) will test whether an array is frozen or modifiable.
A method handle factory MethodHandles.frozenArrayConstructor could be created to access factory methods, by analogy with arrayConstructor. The returned method handle would not take an integer length, but rather a whole array of the contents to freeze. If turned into a varargs method, it could provide access to hidden fast paths for particular lengths, and thus be useful with a translation strategy built on invokedynamic.
The method handle combinator variations of asCollector and asVarargsCollector might be adapted to perform freezing as well.

Performance model of freezing and cloning

The specification of every frozen array factory will include the provision that the JVM is always free to recycle any previous frozen array to produce the result. That is, in response to a request for a frozen array containing values x[i], if the JVM can identify an already existing frozen array a whose elements are the same (in the same order) as the x[i], then the request may be fulfilled with a reference to a, rather than to a newly-created frozen array.

The recycling optimization is likely to occur (and may even be guaranteed) in the following circumstances:

The length of the requested array is zero. (A cached zero-length frozen array of the right element type can be used, perhaps found via a ClassValue table.)
The input array a to the factory method is already frozen, and the requested length is the same as the input array. In this case, a can be returned directly (an O(1) operation). This may be described as the idempotency property of freezing.
Thus, a copy chain of the form a.freeze().freeze() may be evaluated as if it had been a.freeze().
In a copy chain of the form a.clone().freeze(), if the VM can prove that the intermediate operand (to freeze) is not used except in the freezing request, the copy chain may be evaluated as if it had been a.freeze().
A locally confined (non-escaping) array of the form a.clone(), if never modified, can be treated as if created by a.freeze() (and, if a is already frozen, just a). Thus, end-users of defensively-copied arrays might see their code improve in performance if such transforms are done "under the covers".
A locally confined (non-escaping) array of the form new a[N], if modified only before a subsequent freeze operation, can be treated as if created by an arrayfreeze operation used on a scratch buffer; the scratch buffer could be reused later by the same thread. Alternatively, if the JVM supplies an N-ary factory method of the right type, that could be called directly. Thus, end-users of newly created frozen arrays their code perform as if only the final frozen array is created, not the scratch array. This sort of optimization will be easier to do if we add direct notation for creating initialized frozen arrays, and these are translated using invokedynamic, which has a good chance of finding the best factory method in a given runtime.

The JIT may be able to perform additional optimizations of this sort after inlining. In both the interpreter and JIT, a long chain of defensive copies can be collapsed to less expensive operation. The user model can be summarized as:

store and pass your arrays in frozen form, when possible
let someone else do the cloning, if it must happen

Trusted code which is performance sensitive may use a restricted unsafe primitive method (of JDK-8261099) to create new arrays and freeze them in place without the extra copying step. Using the unsafe primitive should not be done unless there is a special reason the JIT cannot optimize the code using the more generic techniques outlined in this JEP. Users of the unsafe primitive outside of java.base are likely to have their code break when the internal implementation of frozen arrays changes.

Language changes and translation strategy

The exceptions throwable from an assignment expression, when the target is an array element, must be amended to include the possibility of an ArrayStoreException when the destination array is frozen.

(Note that this is a fundamentally new dynamic check and exception for primitive arrays and Object arrays, which previously could not raise an ArrayStoreException when written to.)

The syntax a.freeze() requires special permission from the Java Language Specification, analogous to clone.

It seems most undesirable to hardwire freeze into the Java VM in the same way clone was hardwired, using ad hoc special pleading to give clone a special access mode and verifier type. Therefore, we will consider injecting the freeze method using a more scalable technique that does not affect the VM specification. For example, a.freeze() may be defined as "sugar" for a static method call, such as java.lang.ArrayMethods.freeze(a). This technique unlocks follow-on opportunities to connect to additional methods of java.util.Arrays, without changing either the languauge or the VM.

Serialization

Serialization will be enhanced to transmit the frozen status of arrays. It seems sad to do this, but it can be done fairly simply on top of the existing code, and doing so would seem to be a way to forestall serialization-related security bugs that could arise if data structurues which are relying on frozen arrays suddenly became internally mutable in the elements of those arrays, due to a deserialization trick. The rejoinder to this point is that properly written readObject methods already do a clone and would be changed to call freeze instead. The rejoinder to this rejoinder is that it will be less surprising to end users if passing an array through a serialization/deserialization cycle preserves every bit of that array's structure, not just the type, length, and elements, but also the frozen bit.

JNI and Unsafe

The JNI operations the allow mutation of arrays should at least warn if they are applied to frozen arrays. (Perhaps it could be a warning plus a no-op.) Code which uses Unsafe to poke new values into arrays will need to check the isFrozen bit, in general.

No type system effects

No type system, either that of the language, the method and field descriptors, the bytecode verifier, the dynamically checkable types (via instanceof), or the reflective Class system, will make a distinction between frozen and mutable arrays. Only the dynamic isFrozen check will be available to apply to instances.

In particular, the system will not enforce deep immutability of nested arrays. (It may create and observe such structures, and arrange its optimizations accordingly, as long as this is done invisibly to users.)

Java memory model

The JMM may be updated to state that the elements of a frozen array have the same effects on safe publication as final fields. In effect a frozen array acts like an all-final object, whose fields are referenced by index (aaload, iaload, etc.) rather than by name (getfield).

The JMM has a "freeze" operation for objects with final fields, which occurs at the end of the constructor invocation; this "freeze" operation will also apply to the creation of a frozen array.

(The privileged low-level "freeze-in-place" method is also a "freeze" operation with respect to the JMM; this internal method does not need to be mentioned in the JMM, unless we figure out a way to make it safely exposed in the public user model.)

Example

class Node {
  private final String label;
  private final Node[] children;  //always frozen
  public Node[] children() { return children; }
  public String label() { return label; }
  public Node(String l, Node... cn) {
    cn = cn.freeze(); // O(1) if already frozen
    label = l;
    children = cn;
  }
}
class NodeUser {
  Node defPair(Node a, Node b) {
    Node[] ab = { a, b };
    ab = ab.freeze();  // JIT might optimize this locally
    return new Node("pair", ab);
  }
  void use(Node node) {
    Node[] frozenNodes = node.children();
    Node[] mutableNodes frozenNodes.clone();
    Arrays.sort(mutableNodes);
    doStuff(mutableNodes);
    doStuff(frozenNodes);  // still safe to use these
  }
}

Alternatives

Unmodifiable lists created by List.of are a recent workaround for a parallel set of problems with mutable lists. Programmers are encouraged to use such lists when possible in preference to built-in Java arrays. However, this is not always possible, since an existing API may require the use of built-in arrays, or programmers may require the speed and footprint characteristics of built-in arrays, or they may simply stubbornly prefer the built-in Java expression notation for working with built-in arrays.

We could wait for a larger virtualization effort ("Arrays 2.0") at which point adding frozen arrays would be as simple as adding List.of was. (Which wasn't actually as simple as one might have thought.) But that would surely put off for many years the specific benefits of frozen arrays in today's language and APIs.

We could wait for primitive classes and then cross-apply the concepts to built an enhanced "frozen identity-free" array, which acts like a primitive object. However, that is not a universally applicable patch to the problems which identity-laden frozen arrays address in the present JEP. In particular, the performance model of "frozen identity-free" arrays would presumably suffer from the surprising requirement that the == operator (and the acmp instructions) must perform O(N) work instead of O(1) work in order to prove that two frozen arrays are distinct.

We could try to build in some sort of "lock" operation which converts part or all of an array in place to a frozen status, either permanently or until a corresponding "unlock" operation, or build some sort of "immutable view" of a mutable backing array, like the Collection APIs. Such mechanisms are more complex and costly than the present proposal, and are thus harder to use correctly, as well as being less appealing as an alternative to defensive copying. Allowing an array to change its mutability state creates a new kind of side channel with new concurrency failure modes, and so seems to be a poor solution to many of the problems mentioned above.

Risks and Assumptions

The extra checks require some careful optimizer work, analogous to null pointer removal. These might cause performance problems, though not relative to plain clone, of course.

If the user model is not simple enough, or there are unpleasant surprises, or the performance is not good enough, or if retrofitted APIs are hard to roll out, adoption may be weak.

Retrofitting array-returning APIs to return frozen arrays will be a delicate operation, because of use cases in the wild (perhaps 0.0001% of them) where somebody has decided it would be clever to mutate a returned array.

This JEP assumes we can afford to add (as a preview) a handful of factories and other API points for arrays, notably the a.freeze() notation, as discussed above. Alternatively, we could stage all of these API points as static methods on jdk.internal.FrozenArrays, for use internally within java.base, and then roll them out as user-visible API points in a separate step.

Dependencies and Future Work

There are no dependencies on existing work.

Some engineering of array store checks for Project Valhalla may cross-apply to the mutability store checks required by this JEP.

If we decide to forbid synchronization on frozen arrays, some work for primitive classes (Valhalla again) can cross-apply.

It seems very desirable to add a way for varargs methods to request frozen input arrays. This would have knock-on benefits for both footprint and security. Inside the method, there would be no need for annotations of the form @SafeVarargs. Outside the method, clients could preferentially pass frozen arrays instead of mutable ones.

If we do frozen varargs, we should consider using a related syntax (a contextual keyword?) to declare initialized arrays and for array creation expressions:

int sumInts(__Frozen int... ints) {
  // bytecode does ints = ints.freeze()
  int sum = 0;
  for (int i : ints)  sum += i;
  return sum;
}
void foo() {
  sum();  // passes frozen zero-length array?
  sum(1,2,3);  // passes frozen length-3 array?
}
void bar() {
  __Frozen int[] stuff = { 1,2,3 };
  assert stuff.isFrozen();
  sum(stuff);
  sum(new __Frozen int[]{ 4,5,6 });
}

We will want to retrofit many APIs in java.base. Examples of API points which could profit from frozen arrays:

reflective array-returning methods in Class, jlr.Executable, etc.
Enum.values
Throwable.getStackTrace
MethodType.parameterArray, DynamicCallSiteDesc.bootstrapArgs, etc.
methods that return byte or char arrays as a "result" or "payload"
Collection.toArray method and various related
methods which accept array arguments, such as Runtime.exec

In many of these cases, it is likely that backward compatibility with a few existing clients may force us to continue to return mutable arrays. This situation can sometimes be remedied by deprecating the existing API point, in favor of a new one which returns the "real" frozen array. The existing API point can be documented as equivalent to running clone on the result of the new API point, For example:

public interface Collection<T> {
  /** Returns a frozen array containing all of the elements in this collection,
   * as if by {@code this.toArray().freeze()}.
   */
  default Object[] freezeArray() {
    return toArray().freeze();
  }
}

The naming of such new API points will be a delicate matter of balancing conflicting concerns, to say nothing of bikeshed-painting.

See non-goals above for more possible follow-on features.