JEP 218: Generics over Primitive Types
Owner | Brian Goetz |
Type | Feature |
Scope | SE |
Status | Candidate |
Component | specification / language |
Discussion | valhalla dash dev at openjdk dot java dot net |
Effort | XL |
Duration | XL |
Relates to | JEP 300: Augment Use-Site Variance with Declaration-Site Defaults |
Reviewed by | Maurizio Cimadamore |
Created | 2014/06/06 21:55 |
Updated | 2017/10/17 17:37 |
Issue | 8046267 |
Summary
Extend generic types to support the specialization of generic classes and interfaces over primitive types.
Goals
Generic type arguments are constrained to extend Object
, meaning that they are not compatible with primitive instantiations unless boxing is used, undermining performance. With the possible addition of value types to Java (subject of a separate JEP), this restriction becomes even more burdensome. We propose to remedy this by supporting specialization of generic classes and interfaces when instantiated with primitive type arguments.
Non-Goals
It is not a goal of this effort to produce fully reified generics.
Motivation
Using boxed types (e.g., Integer
) to simulate generics over primitives ranges from irritating to costly; boxing requires more memory, more more indirection, allocation, and more garbage collection. Attempting to avoid the overhead of boxing causes another problem: the proliferation of pseudo-specialized types such as IntStream
, ToIntFunction
, etc. With the eight primitive types being the only ones hostile to generics, this is tolerable but annoying; with the advent of value types, this restriction would be far more painful.
Other languages with generics (e.g., C++, C#, Scala) provide varying support for specialized generics over primitives or structs.
Description
Parametric polymorphism always entails a tradeoff between code footprint and specificity, and different languages have chosen different tradeoffs. At one end of the spectrum, C++ creates a specialized class for each instantiation of a template, and at the other end, we have Java's current erased implementation which produces one class for all reference instantiations and no support for primitive instantiations. C# has generics over both reference and struct types; they have taken the approach of unifying the two in the bytecode, and generating one set of native code for all reference types, and a specialized representation for each instantiated struct type.
A separate tradeoff is the timing of specialization, which includes both the choice of ahead-of-time (as Scala does) or on-demand (as C# does), and for delayed specialization, whether the shared artifact produced by the compiler is generic (and therefore requires specialization for all cases, as C# does) or whether it is biased towards a specific instantiation pattern.
Example: a simple Box
class
Suppose we want to specialize the following class with T=int
:
class Box<T> {
private final T t;
public Box(T t) { this.t = t; }
public T get() { return t; }
}
Compiling this class today yields the following bytecode:
class Box extends java.lang.Object{
private final java.lang.Object t;
public Box(java.lang.Object);
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: aload_0
5: aload_1
6: putfield #2; //Field t:Ljava/lang/Object;
9: return
public java.lang.Object get();
Code:
0: aload_0
1: getfield #2; //Field t:Ljava/lang/Object;
4: areturn
}
In this bytecode, some occurrences of Object
really mean Object
, and some mean the erasure of some type variable. If we were to specialize this class for T=int
, we would expect the signature of get()
to return int
. Similarly, some of the a*
bytecodes would have to become i*
bytecodes.
There are numerous approaches we could take to representing the needed generic information in the bytecode; these range from a fully generic representation at the bytecode level (as .NET does) to a more modest tagging of types and bytecodes to indicate whether that type or bytecode is directly related to a type that was present in the source file, or the erasure of some type variable.
In order for on-demand specialization at runtime to be practical, specialization should be as simple and mechanical as possible; we would prefer to not do any additional dataflow analysis or typechecking at runtime beyond existing verification. Similarly, the result of specialization should be verifiable using existing verification rules.
Open questions
There are many questions to be answered before a feature proposal can be made.
- Subtyping. What is the typing interaction between specialized generics (e.g.,
List<int>
) and their corresponding raw type (List
), if any? - Type representation. How should a specialized type be represented in the bytecode when used as a method parameter, field type, supertype, etc?
- Mechanics. How is a generic class specialized? In response to what triggers? By what platform component?
- Reflection. How should specialized classes appear when viewed reflectively?
- Opt-in or opt-out. A generic type variable
T
, in the absence of a bound, is assumed to extendObject
. How would we denote that we wish to generify over both reference and primitive/value types? - Generic methods. Just as classes can be specialized, so must generic methods. How can we inject arbitrarily many new methods into existing classes (ideally, without perturbing the layout of their vtables)?
- Arrays. Classes like
ArrayList
frequently cast an Object[] to a T[], which is problematic ifT
might beint
. We would likely have to assign a viable semantics tonew T[]
so that classes likeArrayList
could be specialized. - Reference-primitive overloadings. Some overloadings that are valid today would become problematic under specialization. For example, a
List
-like class that overloadedremove(int)
withremove(T)
is reasonable whenT
is restricted to reference types, but problematic ifT
can beint
. - Null. Null is a valid value of every reference type, and is often used to represent "nothing is there." Primitive and value types, however, have no analogue of
null
. This is a challenge for methods likeMap.get
, which are defined to returnnull
if the specified key cannot be found in the map. - Hand-written replacements. The mechanical translation of a generic class into a specialized one is straightforward, but may be too limiting. We may wish to support a higher degree of user control. For example, an optimized version of
ArrayList<boolean>
could be written that uses aBitSet
as its backing store instead of theboolean[]
that specialization would generate. - Refinements. An alternate form of user control would be to enhance the automated specialization by overriding specific methods (or adding new methods) for specific type instantiations. For example,
List<int>
could have asum()
method, or an optimized version of existing methods could be hand-written for specific type instantiations. - Incomplete generification. Some classes were incompletely generified; for example, the argument type of
Collection.remove
isObject
, notT
. There would need to be a mechanism to allow classes like these to be properly specialized. - Type inference. In a generic method like
<Z> m(Z a, Z b)
when invoked asm(1,2)
, should we inferint
orInteger
for Z? Inferringint
will raise type-system issues (how do LUB and GLB behave in these cases?) and may also raise source compatibility issues.