JEP 218: Generics over Primitive Types

Owner	Brian Goetz
Type	Feature
Scope	SE
Status	Candidate
Component	specification / language
Discussion	valhalla dash dev at openjdk dot java dot net
Effort	XL
Duration	XL
Relates to	JEP 300: Augment Use-Site Variance with Declaration-Site Defaults
Reviewed by	Maurizio Cimadamore
Created	2014/06/06 21:55
Updated	2017/10/17 17:37
Issue	8046267

Summary

Extend generic types to support the specialization of generic classes and interfaces over primitive types.

Goals

Generic type arguments are constrained to extend Object, meaning that they are not compatible with primitive instantiations unless boxing is used, undermining performance. With the possible addition of value types to Java (subject of a separate JEP), this restriction becomes even more burdensome. We propose to remedy this by supporting specialization of generic classes and interfaces when instantiated with primitive type arguments.

Non-Goals

It is not a goal of this effort to produce fully reified generics.

Motivation

Using boxed types (e.g., Integer) to simulate generics over primitives ranges from irritating to costly; boxing requires more memory, more more indirection, allocation, and more garbage collection. Attempting to avoid the overhead of boxing causes another problem: the proliferation of pseudo-specialized types such as IntStream, ToIntFunction, etc. With the eight primitive types being the only ones hostile to generics, this is tolerable but annoying; with the advent of value types, this restriction would be far more painful.

Other languages with generics (e.g., C++, C#, Scala) provide varying support for specialized generics over primitives or structs.

Description

Parametric polymorphism always entails a tradeoff between code footprint and specificity, and different languages have chosen different tradeoffs. At one end of the spectrum, C++ creates a specialized class for each instantiation of a template, and at the other end, we have Java's current erased implementation which produces one class for all reference instantiations and no support for primitive instantiations. C# has generics over both reference and struct types; they have taken the approach of unifying the two in the bytecode, and generating one set of native code for all reference types, and a specialized representation for each instantiated struct type.

A separate tradeoff is the timing of specialization, which includes both the choice of ahead-of-time (as Scala does) or on-demand (as C# does), and for delayed specialization, whether the shared artifact produced by the compiler is generic (and therefore requires specialization for all cases, as C# does) or whether it is biased towards a specific instantiation pattern.

Example: a simple `Box` class

Suppose we want to specialize the following class with T=int:

class Box<T> {
    private final T t;

    public Box(T t) { this.t = t; }

    public T get() { return t; }
}

Compiling this class today yields the following bytecode:

class Box extends java.lang.Object{
private final java.lang.Object t;

public Box(java.lang.Object);
  Code:
   0:    aload_0
   1:    invokespecial    #1; //Method java/lang/Object."<init>":()V
   4:    aload_0
   5:    aload_1
   6:    putfield    #2; //Field t:Ljava/lang/Object;
   9:    return

public java.lang.Object get();
  Code:
   0:    aload_0
   1:    getfield    #2; //Field t:Ljava/lang/Object;
   4:    areturn
}

In this bytecode, some occurrences of Object really mean Object, and some mean the erasure of some type variable. If we were to specialize this class for T=int, we would expect the signature of get() to return int. Similarly, some of the a* bytecodes would have to become i* bytecodes.

There are numerous approaches we could take to representing the needed generic information in the bytecode; these range from a fully generic representation at the bytecode level (as .NET does) to a more modest tagging of types and bytecodes to indicate whether that type or bytecode is directly related to a type that was present in the source file, or the erasure of some type variable.

In order for on-demand specialization at runtime to be practical, specialization should be as simple and mechanical as possible; we would prefer to not do any additional dataflow analysis or typechecking at runtime beyond existing verification. Similarly, the result of specialization should be verifiable using existing verification rules.

Open questions

There are many questions to be answered before a feature proposal can be made.

Subtyping. What is the typing interaction between specialized generics (e.g., List<int>) and their corresponding raw type (List), if any?
Type representation. How should a specialized type be represented in the bytecode when used as a method parameter, field type, supertype, etc?
Mechanics. How is a generic class specialized? In response to what triggers? By what platform component?
Reflection. How should specialized classes appear when viewed reflectively?
Opt-in or opt-out. A generic type variable T, in the absence of a bound, is assumed to extend Object. How would we denote that we wish to generify over both reference and primitive/value types?
Generic methods. Just as classes can be specialized, so must generic methods. How can we inject arbitrarily many new methods into existing classes (ideally, without perturbing the layout of their vtables)?
Arrays. Classes like ArrayList frequently cast an Object[] to a T[], which is problematic if T might be int. We would likely have to assign a viable semantics to new T[] so that classes like ArrayList could be specialized.
Reference-primitive overloadings. Some overloadings that are valid today would become problematic under specialization. For example, a List-like class that overloaded remove(int) with remove(T) is reasonable when T is restricted to reference types, but problematic if T can be int.
Null. Null is a valid value of every reference type, and is often used to represent "nothing is there." Primitive and value types, however, have no analogue of null. This is a challenge for methods like Map.get, which are defined to return null if the specified key cannot be found in the map.
Hand-written replacements. The mechanical translation of a generic class into a specialized one is straightforward, but may be too limiting. We may wish to support a higher degree of user control. For example, an optimized version of ArrayList<boolean> could be written that uses a BitSet as its backing store instead of the boolean[] that specialization would generate.
Refinements. An alternate form of user control would be to enhance the automated specialization by overriding specific methods (or adding new methods) for specific type instantiations. For example, List<int> could have a sum() method, or an optimized version of existing methods could be hand-written for specific type instantiations.
Incomplete generification. Some classes were incompletely generified; for example, the argument type of Collection.remove is Object, not T. There would need to be a mechanism to allow classes like these to be properly specialized.
Type inference. In a generic method like <Z> m(Z a, Z b) when invoked as m(1,2), should we infer int or Integer for Z? Inferring int will raise type-system issues (how do LUB and GLB behave in these cases?) and may also raise source compatibility issues.