JEP draft: Sequenced Collections

OwnerStuart Marks
TypeFeature
ScopeSE
StatusSubmitted
Componentcore-libs / java.util:collections
Reviewed byBrian Goetz
Created2022/01/27 22:13
Updated2022/09/21 03:22
Issue8280836

Summary

Introduce a new Collection interface representing a collection with a defined encounter order. Such a collection has a well-defined first element, second element, and so forth, up to the last element.

"Life can only be understood backwards; but it must be lived forwards."
— Kierkegaard

Motivation

The Collections Framework lacks a collection type that represents a sequence of elements with a defined encounter order, and it also lacks a uniform set of operations that apply across such collections. These gaps have been a repeated source of complaints about the Collections Framework.

For example, List and Deque both define an encounter order, but their common supertype is Collection, which does not. Similarly, Set does not define an encounter order, and implementations such as HashSet do not provide one, but some subtypes such as SortedSet and LinkedHashSet do. Support for encounter order is thus spread across the type hierarchy, making it difficult to express certain useful concepts in APIs. Neither Collection nor List is very good for describing a parameter or return value that has an encounter order. Collection is too weak, relegating such constraints to the prose specification, possibly leading to hard-to-debug errors. List is too strong, excluding SortedSet and LinkedHashSet.

A related problem is that view collections are often forced to downgrade to weaker semantics. Wrapping a LinkedHashSet with Collections::unmodifiableSet yields a Set, discarding the information about encounter order.

Without an interface to define them, operations on collections with encounter order are inconsistent or missing. While many implementations support getting the first or last element, each collection defines its own way, and some are not obvious or are missing entirely:

.               First Element                     Last Element

List            list.get(0)                       list.get(list.size() - 1)
Deque           deque.getFirst()                  deque.getLast()
SortedSet       sortedSet.first()                 sortedSet.last()
LinkedHashSet   linkedHashSet.iterator().next()   [missing]

Some of these are unnecessarily cumbersome, such as getting the last element of a List. Some are not even possible without heroics: the only way to get the last element of a LinkedHashSet is to iterate the entire set.

Similarly, iterating the elements of a collection from first to last is straightforward and consistent, whereas iterating in reverse order is neither. All of these collections can be iterated forward with an Iterator, the enhanced for loop, a stream(), or toArray. Iterating in reverse is different in every case. NavigableSet provides the descendingSet view for reverse iteration:

for (var e : navSet.descendingSet())
    process(e);

Deque does so with a reverse Iterator:

for (var it = deque.descendingIterator(); it.hasNext();) {
    var e = it.next();
    process(e);
}

List does so but with ListIterator:

for (var it = list.listIterator(list.size()); it.hasPrevious();) {
    var e = it.previous();
    process(e);
}

LinkedHashSet provides no support for reverse iteration. The only practical way to process the elements of a LinkedHashSet in reverse order is to copy its elements into another collection.

Similarly, processing a collection's elements using streams is a powerful and effective alternative to processing elements using loops, but obtaining a stream in reverse order can be difficult. Of the various collections that provide encounter order, the only one that supports this conveniently is NavigableSet:

navSet.descendingSet().stream()

The others require either copying the elements to another collection or creating a stream from a customized Spliterator that reverses iteration.

This is an unfortunate state of affairs, as there is no fundamental reason all of these collection types could not easily implement reverse iteration. The lack of API points in the Collections Framework forces programmers to write slower and more complicated code than necessary. The concept of a collection with defined encounter order exists in multiple places in the Collections Framework, but there is no single type that represents it. As a result, the operations on such collections are inconsistent or missing, and processing elements in reverse order ranges from inconvenient to impossible. These gaps in the Collections Framework should be filled.

Description

A sequenced collection is a collection whose elements have a defined encounter order. (The word "sequenced" as used here is the past participle of the verb to sequence, meaning "to arrange elements in a particular order.") A sequenced collection has first and last elements, and the elements between them have successors and predecessors. A sequenced collection supports common operations at either end, and it supports processing of the elements from first to last and from last to first (forward and reverse). A sequenced set is a Set; it is a sequenced collection that contains no duplicate elements. A sequenced map is a Map whose entries have a defined encounter order.

We add three new interfaces embodying these concepts into the collection type hierarchy. They are highlighted in the diagram below:

The declarations for these interfaces are as follows:

interface SequencedCollection<E> extends Collection<E> {
    // new method
    SequencedCollection<E> reversed();
    // methods promoted from Deque
    void addFirst(E);
    void addLast(E);
    E getFirst();
    E getLast();
    E removeFirst();
    E removeLast();
}

interface SequencedSet<E> extends Set<E>, SequencedCollection<E> {
    // covariant override
    SequencedSet<E> reversed();
}

interface SequencedMap<K,V> extends Map<K,V> {
    // new methods
    SequencedMap<K,V> reversed();
    SequencedSet<K> sequencedKeySet();
    SequencedCollection<V> sequencedValues();
    SequencedSet<Entry<K,V>> sequencedEntrySet();
    V putFirst(K, V);
    V putLast(K, V);
    // methods promoted from SortedMap and NavigableMap
    Entry<K, V> firstEntry();
    Entry<K, V> lastEntry();
    K firstKey();
    K lastKey();
    Entry<K, V> pollFirstEntry();
    Entry<K, V> pollLastEntry();
}

Existing interfaces in the collection type hierarchy have the following adjustments:

All of the new methods have default implementations. There are no definitions of equals and hashCode in SequencedCollection, as the sub-interfaces have conflicting definitions. Several covariant overrides for reversed are in the appropriate places. For example, List::reversed is overridden with a return type of List.

We also add new methods to the Collections utility class to create unmodifiable wrappers for the respective types:

The reversed method provides a reverse-ordered view of the original collection. Any modifications to the original collection are visible in the view. If permitted, modifications to the view write through to the original collection.

The reverse-ordered view enables all the different sequenced types to process elements in both directions, using all the iteration mechanisms:

For example, obtaining a reverse-ordered stream from a LinkedHashSet was previously quite difficult; now it is simply

linkedHashSet.reversed().stream()

The reversed method is essentially a renamed NavigableSet::descendingSet, promoted to SequencedCollection.

The following methods of SequencedCollection are promoted from Deque. They support adding, getting, and removing elements at both ends:

The add and remove methods are optional, primarily to support the case of unmodifiable collections. The get and remove methods throw NoSuchElementException if the collection is empty.

Collections such as SortedSet, which position elements by relative comparison, cannot support explicit-positioning operations such as the addFirst and addLast methods. Thus, these methods throw UnsupportedOperationException.

The addFirst and addLast methods of SequencedSet have special case semantics for collections such as LinkedHashSet. If the element is already present in the set, it is also moved into the appropriate position. This remedies a long-standing deficiency in LinkedHashSet, the inability to reposition elements.

The following methods of SequencedMap are promoted from SortedMap and NavigableMap. They support getting and removing entries at both ends:

The following new methods are added to SequencedMap:

The put methods have special semantics similar to the corresponding add methods of SequencedSet. For maps such as LinkedHashMap, they have the additional effect of repositioning the mapping if it is already present in the map. For maps such as SortedMap, the methods throw UnsupportedOperationException.

Alternatives

Types

An alternative to adding new types would be to repurpose the List interface as a general sequenced collection type. Indeed List is sequenced, but it also supports element access by integer index. Many sequenced data structures don't naturally support indexing and would be required to support it iteratively. This would result in indexed access having O(n) performance instead of the expected O(1), perpetuating the mistake of LinkedList.

Deque seems promising as a general sequence type, as it already supports the right set of operations. However, it is cluttered with other operations, including a family of null-returning operations (offer, peek, and poll), stack operations (push and pop), and operations inherited from Queue. These operations are sensible for a queue but less so for other collections. If Deque were repurposed as a general sequence type, then List would also be a Queue and would support stack operations, resulting in a cluttered and confusing API.

Naming

The term sequence implies elements that are arranged in order. It's commonly used across various platforms to represent collections with similar semantics to those described here.

The term ordered is not quite specific enough. We require iteration in both directions and operations at both ends. An ordered collection such as a Queue is a notable outlier: it is ordered, but it is also decidedly asymmetric.

The term reversible (used in an earlier version of this proposal) doesn't immediately evoke the concept of having two ends. Perhaps a bigger issue is that the Map variant would have been named ReversibleMap, which misleadingly implies that it supports lookup by key and by value (sometimes called a "BiMap" or "BidiMap").

Add, Put, and UnsupportedOperationException

As described above, explicit-positioning APIs such as SortedSet::addFirst and SortedMap::putLast throw UnsupportedOperationException, because the sequencing is determined by relative comparisons among the elements. This introduces an asymmetry in that some of the collections do not implement all of the SequencedCollection operations. This arrangement is valuable, despite the asymmetry, because it brings SortedSet and SortedMap into the sequenced collection family, allowing them to be used more broadly than otherwise.

Alternatively, the addition operations could be kept separate by rearranging the interfaces along structural lines. This would result in new interface types with very thin semantics ("AddableCollection") that aren't useful in practice and that clutter up the type hierarchy.

This is consistent with prior design decisions in the Collections Framework. For example, the Map::keySet method returns a Set, even though the implementation returned doesn't support addition.

History

This proposal is an incremental evolution of the ReversibleCollections proposal posted to core-libs-dev on 2021-04-16. The major changes from that proposal are renaming, the addition of a Map interface, and the addition of unmodifiable wrapper methods.

The ReversibleCollection proposal was in turn based on an earlier OrderedMap/OrderedSet proposal posted by Tagir Valeev to core-libs-dev on 2020-04-25. Several fundamental concepts from the proposal are still present, although there are many differences in detail.

Over the years there have been a variety of requests and proposals filed in the vein of combining a List with a Set or Map. The recurring themes are a List that contains unique elements, or a Set or Map that maintains ordering. Some of the related requests in the JDK Bug System include the following:

JDK-4152834, JDK-4245809, JDK-4264420, JDK-4268146, JDK-6447049, and JDK-8037382.

Some of the requests were partially addressed with the introduction of LinkedHashSet and LinkedHashMap in JDK 1.4. While these classes do satisfy some use cases, their introduction left gaps in the abstractions and operations provided by the Collections Framework, as described here.

Testing

A comprehensive set of tests are added to the JDK Regression Test suite.

Risks and Assumptions

Introduction of new methods fairly high in the inheritance hierarchy runs the risk of method name clashes over obvious names such as "reversed" and "getFirst".

Of particular concern are the covariant overrides of the reversed method on List and Deque. This is source and binary incompatible with collections that implement both List and Deque. There are two examples of this in the JDK: LinkedList and an internal class sun.awt.util.IdentityLinkedList. This incompatibility must be handled by adding an overriding reversed method.

The introduction of covariant overrides on the keySet and entrySet methods of the SequencedMap interface could also result in incompatibilities with subclasses that override these methods.

If necessary, these incompatibilities can be mitigated by avoiding covariant overrides and by providing methods that differ in name and return type, for example, reversedList or sequencedKeySet. This approach was taken with the introduction of the navigableKeySet method in Java SE 6 instead of a covariant override of keySet.

Analysis of a large corpus of Java code will be performed in order to assess these risks.