JEP 269: Convenience Factory Methods for Collections

Owner	Stuart Marks
Type	Feature
Scope	SE
Status	Closed / Delivered
Release	9
Component	core-libs / java.util:collections
Discussion	core dash libs dash dev at openjdk dot java dot net
Effort	S
Duration	S
Relates to	JEP 186: Collection Literals
Reviewed by	Brian Goetz, Chris Hegarty, Paul Sandoz
Endorsed by	Brian Goetz
Created	2014/06/26 23:05
Updated	2017/06/26 21:31
Issue	8048330

Summary

Define library APIs to make it convenient to create instances of collections and maps with small numbers of elements, so as to ease the pain of not having collection literals in the Java programming language.

Goals

Provide static factory methods on the collection interfaces that will create compact, unmodifiable collection instances. The API is deliberately kept minimal.

Non-Goals

It is not a goal to provide a fully-general "collection builder" facility that, for example, lets the user control the collection implementation or various characteristics such as mutability, expected size, loading factor, concurrency level, and so forth.

It is not a goal to support high-performance, scalable collections with arbitrary numbers of elements. The focus is on small collections.

It is not a goal to provide unmodifiable collection types. That is, this proposal does not expose the characteristic of unmodifiability in the type system, even though the proposed implementations are actually unmodifiable.

It is not a goal to provide "immutable persistent" or "functional" collections.

Motivation

Java is often criticized for its verbosity. Creating a small, unmodifiable collection (say, a set) involves constructing it, storing it in a local variable, and invoking add() on it several times, and then wrapping it. For example,

Set<String> set = new HashSet<>();
set.add("a");
set.add("b");
set.add("c");
set = Collections.unmodifiableSet(set);

This is quite verbose, and because it cannot be expressed in a single expression, static collections must be populated in static initializer blocks rather than via a more convenient field initializer. Alternatively, one can populate a collection using a copy constructor from another collection:

Set<String> set = Collections.unmodifiableSet(new HashSet<>(Arrays.asList("a", "b", "c")));

This is still somewhat verbose and also less obvious, since one has to create a List before creating the Set. Another alternative is to use the so-called "double brace" technique:

Set<String> set = Collections.unmodifiableSet(new HashSet<String>() {{
    add("a"); add("b"); add("c");
}});

This uses the instance-initializer construct in an anonymous inner class, which is a bit prettier. However, it is quite obscure, and it costs an extra class at each usage. It also holds hidden references to the enclosing instance and to any captured objects. This may cause memory leaks or problems with serialization. For these reasons, this technique is best avoided.

The Java 8 Stream API can be used to construct small collections, by combining stream factory methods and collectors. For example,

Set<String> set = Collections.unmodifiableSet(Stream.of("a", "b", "c").collect(toSet()));

(The streams collectors make no guarantees about the mutability of collections they return. In Java 8, the returned collections are ordinary, mutable collections such as ArrayList, HashSet, and HashMap, but this might change in future JDK releases.)

This is somewhat roundabout, and while not obscure, it's also not very obvious. It also involves a certain amount of unnecessary object creation and computation. As is typical, Map is the outlier. Streams cannot be used this way to construct a Map, unless the value can be computed from the key, or if the stream elements contain both the key and the value.

In the past, there have been a few proposals floated to change the Java programming language to support collection literals. However, as is often the case with language features, no feature is as simple or as clean as one might first imagine, and so collection literals will not be appearing in the next version of Java.

Much of the benefit of collection literals can be gained by providing library APIs for creating small collection instances, at significantly reduced cost and risk compared to changing the language. For example, the code to create a small Set instance might look like this:

Set<String> set = Set.of("a", "b", "c");

There are existing factories in the Collections class to support the creation of empty Lists, Sets, and Maps. There are also factories to produce singleton Lists, Sets, and Maps that have exactly one element or key-value pair. EnumSet contains several overloaded of(...) methods that take fixed or variable numbers of arguments, for conveniently creating an EnumSet with the specified elements. However, there is no good general-purpose way of creating Lists, Sets, and Maps containing objects of arbitrary types.

There are combinator methods in the Collections class for the creation of unmodifiable Lists, Sets, and Maps. These don't create inherently unmodifiable collections. Instead, they take another collection and wrap it in a class that rejects modification requests, creating an unmodifiable view of the original collection. Possession of a reference to the underlying collection still allows modification. Each wrapper is an additional object, requiring another level of indirection and consuming more memory than the original collection. Finally, the wrapped collection still bears the expense of supporting mutation even if it is never intended to be modified.

Description

Provide static factory methods on the List, Set, and Map interfaces for creating unmodifiable instances of those collections. (Note that unlike static methods on classes, static methods on interfaces are not inherited, so it will not be possible to invoke them via an implementing class, nor via an instance of the interface type.)

For List and Set, these factory methods would work as follows:

List.of(a, b, c);
Set.of(d, e, f, g);

These will include varargs overloads, so that there is no fixed limit on the collection size. However, the collection instances so created may be tuned for smaller sizes. Special-case APIs (fixed-argument overloads) for up to ten of elements will be provided. While this introduces some clutter in the API, it avoids array allocation, initialization, and garbage collection overhead that is incurred by varargs calls. Significantly, the source code of the call site is the same regardless of whether a fixed-arg or varargs overload is called.

For Maps, a set of fixed-argument methods will be provided:

Map.of()
Map.of(k1, v1)
Map.of(k1, v1, k2, v2)
Map.of(k1, v1, k2, v2, k3, v3)
...

We expect that supporting small maps of up to ten key-value pairs will be sufficient to cover the majority of use cases. For larger numbers of entries, an API will be provided that will create a Map instance given an arbitrary number of key-value pairs:

Map.ofEntries(Map.Entry<K,V>...)

While this approach is analogous to the equivalent varargs APIs for List and Set, it unfortunately requires that each key-value pair be boxed. A method for boxing keys and values, suitable for static import, will make this more convenient:

Map.Entry<K,V> entry(K k, V v)

Using these methods, it will be possible to create a map with an arbitrary number of entries:

Map.ofEntries(
    entry(k1, v1),
    entry(k2, v2),
    entry(k3, v3),
    // ...
    entry(kn, vn));

(It might be possible to mitigate the expense of boxing by using value types in a future version of the JDK. The entry() convenience method will actually return a newly introduced concrete type that implements Map.Entry, in order to facilitate potential future migration to a value type.)

Providing APIs for creating small, unmodifiable collections satisfies a large set of use cases, and it helps keep the specification and implementations simple. Unmodifiable collections avoid the need to make defensive copies, and they are more amenable to parallel processing.

The runtime space occupied by small collections is also a strong consideration. A straightforward creation of an unmodifiable HashSet with two elements, using the wrapper APIs, would be comprised of six objects: the wrapper, the HashSet, which contains a HashMap, its table of buckets (an array), and one Node instance per element. This imposes a tremendous amount of overhead compared to the amount of data stored, and access to the data unavoidably requires multiple method calls and pointer dereferences. Implementations designed for small, fixed-sized collections can avoid most of this overhead, using a compact field-based or array-based layout. Not needing to support mutation (and knowing the collection size at creation time) also contributes to space savings.

The concrete classes returned by these factories will not be exposed as public APIs. No guarantees will be made about the runtime type or identity of the returned collection. This will allow the implementations to change over time without breaking compatibility. The only thing the caller should rely on is that the reference returned is an implementation of its interface type.

The resulting objects will be serializable. A serialization proxy object will be used as the common serialized form for the implementation classes. This will prevent information about the concrete implementations from leaking into the serialized form, thus preserving flexibility for future maintenance, and allowing the concrete implementations to change from release to release without affecting serialization compatibility.

Null elements, keys, and values will be disallowed. (No recently introduced collections have supported nulls.) In addition, prohibiting nulls offers opportunities for a more compact internal representation, faster access, and fewer special cases.

The List implementations are expected to provide fast element access by index, so they will implement the RandomAccess marker interface.

Elements stored in these collections must support typical collections contracts, including proper support for hashCode() and equals(). If an element of a Set or a key of a Map is mutated in a way that affects its hashCode() or equals() methods, the behavior of the collection could become unspecified.

Once constructed and safely published, these collection instances will be safe for concurrent access by multiple threads.

The JDK will be searched for potential sites where these new APIs can be used. These sites will be updated to use the new APIs as time and schedule permit.

Alternatives

Language changes have been considered several times, and rejected:

Project Coin Proposal, 29 March 2009
Project Coin Proposal, 30 March 2009
JEP 186 discussion on lambda-dev, January-March 2014

The language proposals were set aside in preference to a library-based proposal as summarized in this message.

The Google Guava libraries have a rich set of utilities for creating immutable collections, including a builder pattern, and for creating a wide variety of mutable collections. The Guava libraries are very useful and general but are perhaps overkill for inclusion into the Java SE Platform. This proposal is similar to a proposal from Stephen Colebourne lambda-dev, 19 Feb 2014 and includes some ideas from the Guava immutable collection factory methods.

The Map.fromEntries() approach for initializing a Map with an arbitrary number of entries is not ideal, but it seems to be the least bad of the alternatives. Its advantages are that it is type-safe, it has keys and values adjacent in the syntax, the number of entries is known at compile time, and it is suitable for use as a field initializer. However, it involves boxing, and it is rather verbose. Several alternatives were considered, and they all introduced tradeoffs that seemed to make them worse than the current proposal.

Static factory methods on concrete collection classes (e.g., ArrayList, HashSet) have been removed from this proposal. They seem like they are useful, but in practice they tend distract developers from using the factory methods for the immutable collections. There is a small set of use cases for initializing a mutable collection instance with a predefined set of values. It's usually preferable to have those predefined values be in an immutable collection, and then to initialize the mutable collection via a copy constructor.

There is another wrinkle, which is that static methods on classes are inherited by subclasses. Suppose a static factory method HashMap.of() were to be added. Since LinkedHashMap is a subclass of HashMap, it would be possible for application code to call LinkedHashMap.of(). This would end up calling HashMap.of(), not at all what one would expect! One way to mitigate this is to ensure that all concrete collection implementations have the same set of factory methods, so that inheritance doesn't occur. Inheritance would still be an issue for user-defined subclasses of the concrete collections.

Testing

There will be the usual set of unit tests in the JDK regression test suite, and JCK tests for the public APIs. The serialized forms may also be covered by the JCK.

A set of size and performance tests will be developed. In contrast to typical goals of comparing to baseline measurements, these tests will compare the new collection implementations to existing ones. The expectation is that the new collections will consume less heap space, both in terms of fixed overhead and on a per-element basis. In some cases the new collections may be slower, however, because of the different internal representation as compared to the existing collections. Any such slowdown should be reasonable. Although there are no specific performance goals, being 10x slower is unacceptable. In addition, the new collections should maintain consistent performance as the number of elements is increased. Finally, the measurements will establish baseline performance figures against which future changes should be compared.