JEP draft: Frozen Arrays (Preview)
Owner | John Rose |
Type | Feature |
Scope | JDK |
Status | Draft |
Created | 2021/02/03 00:59 |
Updated | 2021/02/08 09:46 |
Issue | 8261007 |
Summary
Introduce a new variation within the built-in Java array types, which is unmodifiable (shallowly immutable).
Frozen arrays can be safely shared without coordination or risk of unexpected modification. Freezing is a more efficient alternative to defensive copying, in that the copy can frequently be optimized away by the runtime.
This is a preview language and VM feature.
Goals
This JEP proposes minor changes to the Java Programming Language and Java Virtual Machine, which currently provide that all Java arrays are mutable. Users will be able to freeze any input array, yielding a so-called "frozen array" whose type, length, and contents are identical with the input array, but whose elements can never be reassigned. Frozen arrays will support certain optimizations and safety techniques (such as constant folding and defensive copying) which ordinary mutable arrays cannot support.
Non-Goals
None of the following plausible extensions to support immutable arrays are in scope for this JEP, though they could possibly be the subject of future work:
-
Language support for directly declaring variables which are frozen arrays, for array creation expressions that directly create frozen arrays, or for variable arity methods which receive their arguments as frozen arrays. (But see notes under "Future Work".)
-
An identity-free variation of frozen arrays whose instances are not only arrays but in fact primitive objects.
-
While allowing identity to frozen arrays, we could borrow the idea of a non-synchronizable references from primitive objects. (This would seem to be a well-intentioned irregularity of the sort that just buys trouble.)
-
Adding more (non-static) utility methods Java arrays.
-
The ability to forcibly "recycle" a user-created mutable array as a frozen array by telling it to "freeze itself". Although the JVM can sometimes do this under the covers, calling "freeze" on a mutable array will always return a different array.
-
Transactional and/or confinement and/or slicing array operations, which allow one or more element positions to change between mutable and frozen status, or which provide restricted views on larger and/or mutable and/or larval and/or globally shared arrays.
-
Retrofitting existing APIs (such as Core Reflection) to make use of frozen arrays where they presently use mutable arrays. (But see notes under "Future Work".)
-
Direct support for deeply nested immutable array-based data structures.
-
Direct support in the class file format for creating and initializing frozen arrays. (E.g., mechanisms which condy or indy could use to pull "read only data" directly from the class file, as with
.rodata
segments in other systems.) -
Adding analogous support to
List
types (beyondList.of
, etc.), such as support for list-based initialized declarations, constant or template expressions, or varargs. -
New types or implementations of arrays, or virtualized representations of Java arrays, sometimes loosely designated as "Arrays 2.0".
-
Freezing operations for Java types other than arrays.
-
Changes to arrays to make them appear more "value-based" or identity hostile, such as list-like
equals
,hashCode
, andtoString
methods.
Other followup efforts may enhance existing APIs to take advantage of frozen arrays, or introduce new language features and APIs built on top of frozen arrays.
Motivation
Java programmers enjoy the use of compact and intuitive notations for
working with Java arrays, as defined by the Java Language
Specification. (We may call these "built-in Java arrays" when other
kinds of arrays might be understood, such as off-heap arrays, or
object-based array types such has some libraries supply.) Built-in
arrays can be read and written using the familiar bracket notation
a[i]
. Because of their built-in status, Java arrays often offer
better performance, in footprint and speed, than other kinds of linear
collections (such as ArrayList
or ByteBuffer
).
Because Java arrays, as originally specified, are always modifiable, they cannot always be used safely. For example:
-
A method that returns multiple results in an array cannot return them in an array which has been cached, since one client may "clobber" the array making it unfit for use by future clients.
-
A method that receives multiple arguments using an array (say, via the "varargs" feature), cannot cache that array, because the caller may recycle the argument array for a future call with different arguments.
-
A shared array variable (such as a global
static
constant or an array stored in a shared object) cannot in general be publicly exposed for sharing, because a malfunctioning client might clobber an element of the array, creating a race condition, and making other threads see an unpredictable mix of the old and new element values. -
The workaround for the three previous problems is returning or making a so-called "defensive copy" as needed. But defensive copies are easy to omit accidentally, and their inherent cost tempts programmers to "optimize them away", sometimes erroneously.
-
APIs which are structured as multiple layers or encapsulations will require in general a defensive copy every time an array crosses into or out of an abstraction boundary. This burdens abstract, well-factored code with the unfair cost of deeply nested copy-chains of array data.
-
More subtly, a method receiving an array of arguments (whether "varargs" or not) cannot trust that those arguments to remain stable even during its own execution, since there may be side effects impinging on that array concurrently from other threads. This has been known to lead to so-called "TOCTTOU" bugs, where the method trusts arguments that have changed after they were checked.
-
Side effects to arrays are a potential source of side channel creation, perhaps usable to export information by malware.
All of these problems stem from the fact that the mutability of array elements is a permanent feature in all arrays. It can be addressed by providing a way to create arrays whose elements are not modifiable.
In effect programmers are faced with an unpleasant and un-Java-like choice between reliability and performance, where the default action (omitting the defensive copy) leads to unreliability which is often difficult to diagnose.
The problems are exacerbated by the built-in status of Java arrays,
and also their age. (Unlike List
, built-in arrays have never not
been a part of Java.) For those reasons, existing APIs, especially
older ones (such as Core Reflection), use array types even when their
use entails one or more of the above hazards. The problem cannot be
shifted unless either the affected APIs are rewritten or until array
behavior changes. This JEP does the latter.
Description
The features described below are preview features, enabled with the
--enable-preview
compile-time and runtime flags.
What's a frozen array?
Given any array type T[]
(including primitive-array types like
int[]
), instances of that type may be in the frozen state; such
instances are frozen arrays. By contrast, before this JEP, all
arrays are not frozen; they may be referred to as "modifiable" (or
"mutable"). Any non-null reference of array type points to either a
modifiable or a frozen array.
Any attempt to write an element of a frozen array elicits an exception
of class ArrayStoreException
(or a subclass).
Since the type system (of both language and VM) gives no way to
distinguish between variables which always refer to mutable arrays,
always refer to frozen arrays, or may refer to a mix of arrays, there
will in general be a dynamic check (for "frozen-ness") as part of any
store to an array (as a[i]=x
, or bytecodes like iastore
or
aastore
). This check is similar to the existing checks for null
references (potentially raising NullPointerException
) or reference
array subtypes (potentially raising ArrayStoreException
).
Like primitive wrapper types and strings, frozen arrays have identity and may be synchronized upon. As with the wrappers and strings, programmers are encouraged to avert their gaze from these features.
Where do frozen arrays come from?
Because a frozen array is a new configuration of data on the Java heap, at least one special low-level JVM method is required in order to create frozen arrays.
A separate JEP, JDK-8261099, proposes a low-level unsafe JVM method for locking down a previously mutable array "in place", after which it is frozen. Such a method must be used very carefully, but it is possibly the original source of all frozen arrays described in this JEP. In the safe user model proposed by this JEP, all factories of frozen arrays appear to create fresh arrays which have been frozen from the beginning ("frozen from birth").
Since frozen arrays are "frozen from birth", it follows that their factory methods must be passed all elements to be stored in the new frozen array.
Thus, one or more low-level static factory methods will be created
which allocate arrays in the frozen states. The input to such a
method will be an array (either frozen or mutable) which contains the
elements to be stored in the new frozen array, and possibly starting
and ending indexes within the input array (cf. Arrays.copyOfRange
).
As a straw-man example of the simplest possible API, a method
System.arrayfreeze
could be created, by analogy with
System.arraycopy
. Instead of destination arguments, it would return
a frozen array which contains exactly the indicated source arguments:
T[] src = ...;
T[] dest = System.arrayfreeze(src, 0, src.length);
This method would internally use an unsafe primitive for freezing an array in place, not available to general users. The method would look something like this:
Object arrayfreeze(Object src, int beg, int end) {
Object dest = Arrays.newInstance(src.getClass().getComponentType(), end-beg);
arraycopy(src, beg, dest, 0, dest.length);
UNSAFE.freeze(dest); // privileged and unsafe
return dest;
}
Please Note: In this user model, freezing a mutable array will always return a different array. A single array instance cannot transition between the mutable and frozen conditions. The exception to this rule is code which uses a privileged low-level "freeze-in-place" method described in JDK-8261099.
A mutable array can always be created from a possibly-frozen array
by means of the clone
operation. That is, clone
never returns
a frozen array, when applied to an array operand.
All these details may change.
We do not intend to add any new bytecode instructions.
How do I use a frozen array?
Regardless of the low-level details of frozen array creation, a limited number of utility methods will be created to encourage users to adopt them for defensive copies.
-
For any array, the syntax
a.freeze()
will be accepted, with a meaning closely analogous toa.clone()
. (The only difference is in the mutability status of the respective return values.) -
In the class
java.util.Arrays
, the methodscopyOf
andcopyOfRange
will be accompanied by analogous methodsfreeze
andfreezeRange
. -
The low-level static boolean method
System.isFrozenArray
(and/orjava.util.Arrays.isFrozen
) will test whether an array is frozen or modifiable. -
A method handle factory
MethodHandles.frozenArrayConstructor
could be created to access factory methods, by analogy witharrayConstructor
. The returned method handle would not take an integer length, but rather a whole array of the contents to freeze. If turned into a varargs method, it could provide access to hidden fast paths for particular lengths, and thus be useful with a translation strategy built oninvokedynamic
. -
The method handle combinator variations of
asCollector
andasVarargsCollector
might be adapted to perform freezing as well.
Performance model of freezing and cloning
The specification of every frozen array factory will include the
provision that the JVM is always free to recycle any previous frozen
array to produce the result. That is, in response to a request for a
frozen array containing values x[i]
, if the JVM can identify an
already existing frozen array a
whose elements are the same (in the
same order) as the x[i]
, then the request may be fulfilled with a
reference to a
, rather than to a newly-created frozen array.
The recycling optimization is likely to occur (and may even be guaranteed) in the following circumstances:
-
The length of the requested array is zero. (A cached zero-length frozen array of the right element type can be used, perhaps found via a
ClassValue
table.) -
The input array
a
to the factory method is already frozen, and the requested length is the same as the input array. In this case,a
can be returned directly (an O(1) operation). This may be described as the idempotency property of freezing. -
Thus, a copy chain of the form
a.freeze().freeze()
may be evaluated as if it had beena.freeze()
. -
In a copy chain of the form
a.clone().freeze()
, if the VM can prove that the intermediate operand (tofreeze
) is not used except in the freezing request, the copy chain may be evaluated as if it had beena.freeze()
. -
A locally confined (non-escaping) array of the form
a.clone()
, if never modified, can be treated as if created bya.freeze()
(and, ifa
is already frozen, justa
). Thus, end-users of defensively-copied arrays might see their code improve in performance if such transforms are done "under the covers". -
A locally confined (non-escaping) array of the form
new a[N]
, if modified only before a subsequentfreeze
operation, can be treated as if created by anarrayfreeze
operation used on a scratch buffer; the scratch buffer could be reused later by the same thread. Alternatively, if the JVM supplies an N-ary factory method of the right type, that could be called directly. Thus, end-users of newly created frozen arrays their code perform as if only the final frozen array is created, not the scratch array. This sort of optimization will be easier to do if we add direct notation for creating initialized frozen arrays, and these are translated usinginvokedynamic
, which has a good chance of finding the best factory method in a given runtime.
The JIT may be able to perform additional optimizations of this sort after inlining. In both the interpreter and JIT, a long chain of defensive copies can be collapsed to less expensive operation. The user model can be summarized as:
-
store and pass your arrays in frozen form, when possible
-
let someone else do the cloning, if it must happen
Trusted code which is performance sensitive may use a restricted
unsafe primitive method (of JDK-8261099) to create new
arrays and freeze them in place without the extra copying step. Using
the unsafe primitive should not be done unless there is a special
reason the JIT cannot optimize the code using the more generic
techniques outlined in this JEP. Users of the unsafe primitive
outside of java.base
are likely to have their code break when the
internal implementation of frozen arrays changes.
Language changes and translation strategy
The exceptions throwable from an assignment expression, when the
target is an array element, must be amended to include the possibility
of an ArrayStoreException
when the destination array is frozen.
(Note that this is a fundamentally new dynamic check and exception for
primitive arrays and Object
arrays, which previously could not raise
an ArrayStoreException
when written to.)
The syntax a.freeze()
requires special permission from the Java
Language Specification, analogous to clone
.
It seems most undesirable to hardwire freeze
into the Java VM in the
same way clone
was hardwired, using ad hoc special pleading to give
clone
a special access mode and verifier type. Therefore, we will
consider injecting the freeze
method using a more scalable technique
that does not affect the VM specification. For example, a.freeze()
may be defined as "sugar" for a static method call, such as
java.lang.ArrayMethods.freeze(a)
. This technique unlocks follow-on
opportunities to connect to additional methods of java.util.Arrays
,
without changing either the languauge or the VM.
Serialization
Serialization will be enhanced to transmit the frozen status of
arrays. It seems sad to do this, but it can be done fairly simply on
top of the existing code, and doing so would seem to be a way to
forestall serialization-related security bugs that could arise if data
structurues which are relying on frozen arrays suddenly became
internally mutable in the elements of those arrays, due to a
deserialization trick. The rejoinder to this point is that
properly written readObject
methods already do a clone
and
would be changed to call freeze
instead. The rejoinder to
this rejoinder is that it will be less surprising to end users
if passing an array through a serialization/deserialization
cycle preserves every bit of that array's structure, not
just the type, length, and elements, but also the frozen bit.
JNI and Unsafe
The JNI operations the allow mutation of arrays should at least
warn if they are applied to frozen arrays. (Perhaps it could
be a warning plus a no-op.) Code which uses Unsafe to poke
new values into arrays will need to check the isFrozen
bit,
in general.
No type system effects
No type system, either that of the language, the method and field
descriptors, the bytecode verifier, the dynamically checkable types
(via instanceof
), or the reflective Class
system, will make a
distinction between frozen and mutable arrays. Only the dynamic
isFrozen
check will be available to apply to instances.
In particular, the system will not enforce deep immutability of nested arrays. (It may create and observe such structures, and arrange its optimizations accordingly, as long as this is done invisibly to users.)
Java memory model
The JMM may be updated to state that the elements of a frozen array
have the same effects on safe publication as final
fields. In
effect a frozen array acts like an all-final
object, whose fields
are referenced by index (aaload
, iaload
, etc.) rather than by name
(getfield
).
The JMM has a "freeze" operation for objects with final fields, which occurs at the end of the constructor invocation; this "freeze" operation will also apply to the creation of a frozen array.
(The privileged low-level "freeze-in-place" method is also a "freeze" operation with respect to the JMM; this internal method does not need to be mentioned in the JMM, unless we figure out a way to make it safely exposed in the public user model.)
Example
class Node {
private final String label;
private final Node[] children; //always frozen
public Node[] children() { return children; }
public String label() { return label; }
public Node(String l, Node... cn) {
cn = cn.freeze(); // O(1) if already frozen
label = l;
children = cn;
}
}
class NodeUser {
Node defPair(Node a, Node b) {
Node[] ab = { a, b };
ab = ab.freeze(); // JIT might optimize this locally
return new Node("pair", ab);
}
void use(Node node) {
Node[] frozenNodes = node.children();
Node[] mutableNodes frozenNodes.clone();
Arrays.sort(mutableNodes);
doStuff(mutableNodes);
doStuff(frozenNodes); // still safe to use these
}
}
Alternatives
Unmodifiable lists created by List.of
are a recent workaround for a
parallel set of problems with mutable lists. Programmers are
encouraged to use such lists when possible in preference to built-in
Java arrays. However, this is not always possible, since an existing
API may require the use of built-in arrays, or programmers may require
the speed and footprint characteristics of built-in arrays, or they
may simply stubbornly prefer the built-in Java expression notation for
working with built-in arrays.
We could wait for a larger virtualization effort ("Arrays 2.0") at
which point adding frozen arrays would be as simple as adding
List.of
was. (Which wasn't actually as simple as one might have
thought.) But that would surely put off for many years the specific
benefits of frozen arrays in today's language and APIs.
We could wait for primitive classes and then cross-apply the concepts
to built an enhanced "frozen identity-free" array, which acts like a
primitive object. However, that is not a universally applicable patch
to the problems which identity-laden frozen arrays address in the
present JEP. In particular, the performance model of "frozen
identity-free" arrays would presumably suffer from the surprising
requirement that the ==
operator (and the acmp
instructions) must
perform O(N) work instead of O(1) work in order to prove that two
frozen arrays are distinct.
We could try to build in some sort of "lock" operation which converts part or all of an array in place to a frozen status, either permanently or until a corresponding "unlock" operation, or build some sort of "immutable view" of a mutable backing array, like the Collection APIs. Such mechanisms are more complex and costly than the present proposal, and are thus harder to use correctly, as well as being less appealing as an alternative to defensive copying. Allowing an array to change its mutability state creates a new kind of side channel with new concurrency failure modes, and so seems to be a poor solution to many of the problems mentioned above.
Risks and Assumptions
The extra checks require some careful optimizer work, analogous to
null pointer removal. These might cause performance problems, though
not relative to plain clone
, of course.
If the user model is not simple enough, or there are unpleasant surprises, or the performance is not good enough, or if retrofitted APIs are hard to roll out, adoption may be weak.
Retrofitting array-returning APIs to return frozen arrays will be a delicate operation, because of use cases in the wild (perhaps 0.0001% of them) where somebody has decided it would be clever to mutate a returned array.
This JEP assumes we can afford to add (as a preview) a handful of
factories and other API points for arrays, notably the a.freeze()
notation, as discussed above. Alternatively, we could stage all of
these API points as static methods on jdk.internal.FrozenArrays
,
for use internally within java.base
, and then roll them out as
user-visible API points in a separate step.
Dependencies and Future Work
There are no dependencies on existing work.
Some engineering of array store checks for Project Valhalla may cross-apply to the mutability store checks required by this JEP.
If we decide to forbid synchronization on frozen arrays, some work for primitive classes (Valhalla again) can cross-apply.
It seems very desirable to add a way for varargs methods to request
frozen input arrays. This would have knock-on benefits for both
footprint and security. Inside the method, there would be no need for
annotations of the form @SafeVarargs
. Outside the method, clients
could preferentially pass frozen arrays instead of mutable ones.
If we do frozen varargs, we should consider using a related syntax (a contextual keyword?) to declare initialized arrays and for array creation expressions:
int sumInts(__Frozen int... ints) {
// bytecode does ints = ints.freeze()
int sum = 0;
for (int i : ints) sum += i;
return sum;
}
void foo() {
sum(); // passes frozen zero-length array?
sum(1,2,3); // passes frozen length-3 array?
}
void bar() {
__Frozen int[] stuff = { 1,2,3 };
assert stuff.isFrozen();
sum(stuff);
sum(new __Frozen int[]{ 4,5,6 });
}
We will want to retrofit many APIs in java.base
. Examples of API
points which could profit from frozen arrays:
- reflective array-returning methods in
Class
,jlr.Executable
, etc. Enum.values
Throwable.getStackTrace
MethodType.parameterArray
,DynamicCallSiteDesc.bootstrapArgs
, etc.- methods that return byte or char arrays as a "result" or "payload"
Collection.toArray
method and various related- methods which accept array arguments, such as
Runtime.exec
In many of these cases, it is likely that backward compatibility with
a few existing clients may force us to continue to return mutable
arrays. This situation can sometimes be remedied by deprecating
the existing API point, in favor of a new one which returns the
"real" frozen array. The existing API point can be documented as
equivalent to running clone
on the result of the new API point,
For example:
public interface Collection<T> {
/** Returns a frozen array containing all of the elements in this collection,
* as if by {@code this.toArray().freeze()}.
*/
default Object[] freezeArray() {
return toArray().freeze();
}
}
The naming of such new API points will be a delicate matter of balancing conflicting concerns, to say nothing of bikeshed-painting.
See non-goals above for more possible follow-on features.