JEP draft: special notation for the receiver helper pattern
Owner | John Rose |
Type | Feature |
Scope | JDK |
Status | Draft |
Component | specification / language |
Created | 2017/08/18 23:26 |
Updated | 2020/10/26 19:08 |
Issue | 8186473 |
Executive summary: Allow a new member reference syntax, akin to
this.m
and super.m
, to refer to the receiver of the innermost
enclosing method call. Thus, r.foo(__Receiver.bar())
is sugar for
r.foo(r.bar())
. Various alternatives are discussed.
The new notation supports fluent APIs.
Fluent APIs are an increasingly important design pattern in Java,
allowing simple representation of domain-specific languages. They are
characterized by very long method call chains of the form
head.a(*).b(*).c(*)
... where each method call builds on the results
of the previous calls. StringBuilder
is an early one, and Stream
is a more recent one.
This notation depends, for its clarity, on being able to be spread
across the page as a series of statements or requests expressed as a
chain of method calls (.a().b().c()
...). The arguments of each
method call are specific to that call, and the result of the method
call is (often) a completely new receiver which determines the meaning
of the next call.
This notation has a weak spot: You have to express every request as a
single method call. This means that the API designer is constrained
to make the method call the unit of request. This is not the case for
non-fluent APIs, where a request may consist of several method calls
to the same receiver. For example, extracting match groups from a
MatchResult
works like this, and so cannot be written fluently.
This limitation forces the designer of a fluent API to think very hard
about making each method call carry exactly the right amount of
information. Thinking very hard is a good thing. What is not so good
is the sharp cliff waiting to one side, whenever a fluent method might
need to ask a quick question of the receiver before making the next
request in the chain. Suppose I am in the middle of a chain a().b()
and I need to ask the next receiver for some help before I can form the
next call c()
. I want to write something like this:
head.a().b().c(?.help()).d()
But then I have to refactor it to introduce a temporary variable, so I can refer to the next receiver:
var btem = head.a().b();
btem.c(btem.help()).d()
This clearly defeats the reability of the fluent API. It is all the more true if the API supports chains nested inside of other chains. (An example of this would be an ASM-like fluent bytecode API, currently under development, where a class is a fluent chain, containing sub-chains for each method, containing sub-chains for smaller constructs like complex constants.) Requiring a temporary variable has a non-incremental effect, breaking all containing chains into separate statements.
For example, a fluent API for collecting constant pool constants (into
a series of such constants) may need to convert a user-oriented type
like String
to an internal type like int
(for an index), and the
user might need to acquire such an int in the course of making a more
complex request to the constant pool, such as "add this method type".
The conversion of a component of the method type is a side-question
which must interrupt the fluent chain of requests:
pool.startGroup()
.addMT(pool.addT(void.class), pool.addT(int.class))
.endGroup();
In this case, the addMT
command is asking some receiver (either the
pool itself or a temporary group builder) to add a method-type with
two components, but must first ask the pool (or the builder, in some
APIs) to accept the types void
(for the return type) and int
(for
the sole parameter type). Note that in order to complete the request
for a method type, there are two side "helper" requests, one for each
component. The problem this proposal addresses is finding the proper
recipient for those helper requests, without assuming that there is
always an ambient variable (pool
in this case) that holds the
desired receiver.
From a standpoint of avoiding temporaries, but keeping the equivalent efficient bytecode, the simplest fix is to allow a special syntax to access the next receiver in the chain, from within the method call being composed to that receiver.
Therefore, we should consider what such a syntax would look like.
(And also consider the alternatives, of course.) To avoid premature
concreteness, suppose there is a new token __Receiver
which can be
used in any expression nested anywhere inside a method call
expression. Then the problematic chains above could be rewritten as
follows:
head.a().b().c(__Receiver.help()).d()
pool.startGroup()
.addMT(__Receiver.addT(void.class), __Receiver.addT(int.class))
.endGroup();
Or, a fragment of a bytecode emitter chain could look like this:
.getfield(.classInfo(Math.class), "PI", .descriptor(double.class));
The receiver helper pattern shows up heavily when thinking about
rendering array notations from languages like R, Octave, APL, Julia,
or even Python or Ruby. In those notations, the expression inside the
array indexing brackets is interpreted relative to the array being
indexed. Indexing the last element as something like [*-1]
must
translate into [L-1]
where L
is an expression that evaluates
to the length of the array being indexed. For multi-dimensional
arrays, the whole tuple of index expressions must be interpreted
relative to the array itself. There are ways to avoid committing
to the array before the actual indexing operation, but (as with
other fluent APIs) they make the design of those operations much
more complex and constrained.
The receiver-helper pattern will probably also show up with some kinds
of programmable constant notations, if we design those. In such
cases, a sequence of expressions may be implicitly translated into a
builder expression, and the same need will arise, for occasional
invocation of helpers. As a simple example, a packed array constant
with gaps might need occasional helper method calls to reset the
packing location, as with the "designated initializer" syntax in some
versions of C. For example: int a[3] = { [1] = 5 };
. The
designator [1]
could be rendered as a helper method call in a fluent
Java API.
Viewed another way, a complicated array index expression is sort of like a part of a programmable constant expression. In both cases you have a bit of code that only makes sense in a containing context, and you have a natural need to ask questions of that context. It's another form of 'this'.
Syntax bikeshed
There are many syntax options for such a thing:
- A contextual keyword
receiver
orthat
, suppressed if there is a prior binding. - A unary dot operator, so that
.foo
is really__Receiver.foo
, limiting the feature to member references. - A named reference to the enclosing method,
__ReceiverOf.addMT
. - A name assigned by the receiver API itself, limiting the feature to fluent APIs.
An API-assigned name could look like this:
interface Builder<T> {
Builder<T> (Builder<T> b = this) add(T item);
T methodType(Class<?> rt, Class<?> pt);
}
// ...add(b.methodType(p,q))...
Workaround: Pattern variables
Fluent APIs live or die by what may be called tail-recursive syntax, the continual chaining of new method calls on the end of old ones. If there were a tail-recursive syntax for binding temporaries, the chain could be continued by placing the binding inside the chain:
head.a().b()<<var btem>>.c(btem.help()).d()
This may be possible as a side-effect of pattern-match expressions.
Workaround: Helper lambdas
The designer of a fluent API can usually predict the need for helper requests, and provide for them in an ad hoc way for particular API points, by introducing a lambda argument which reifies the appropriate receiver as a lambda argument:
head.a().b().c(x->x.help()).d()
interface Builder<T> {
Builder<T> add(T item);
default <X> Builder<T> add(X item, BiFunction<Builder<T>,X,T> fn) {
add(fn.apply(this, item));
}
T methodType(MethodType mt);
T methodType(Class<?> rt, Class<?> pt);
}
// ...add(b->b.methodType(p,q))...
// ...add(MethodType.methodType(p,q), Builder::methodType)...
We can call this the "lambdafied" version of the receiver helper pattern. It is good because it works in today's language. It is bad because the lambdas inside it require more bytecodes to compile than otherwise simple requests for help to available receivers.
This workaround is strictly more powerful than the receiver binding
pattern, because the x
value can be anything available to the
receiver, not just the receiver itself. On the other hand, it is for
that reason less clear what is the role of x
in that code, and the
documentation for the c
method has to describe what value is passed
in. The receiver-helper pattern is more concrete and specific; in the
end it is approximately as powerful as the workaround, since the
helper method (help
above) is able to pull up any other x
value
needed, without troubling the user to understand it. The main problem
with the receiver helper pattern is that it require the helper methods
to be availble mixed into the receiver API itself, which isn't always
desirable.
In the end, this proposal may be seen as sugar for a common case of
the pattern r.foo(r->r.bar())
, a pattern which can be code-generated
much more efficiently (without a lambda) if there is special syntax
support to dig out the r
value from context.
Workaround: constexpr methods and lambdas
The lambdafied version of the receiver helper pattern can be improved
by evaluating the lambda and the method call at compile time, as if
expanding a macro (but preserving all types, scopes, etc.). In that
case, the required temporary would appear inside the expanded method
body, and used in the expanded lambda body. Presumably the
abstraction of the API would be preserved by further operations inside
the method expansion, such as calling a "master method" that used the
results of the lambda call, passing them through an invokeinterface
instruction.
interface Builder<T> {
Builder<T> add(T item);
__Inline <X> Builder<T> add(X item,
__Inline BiFunction<Builder<T>,X,T> fn) {
add(fn.apply(this, item));
}
T methodType(MethodType mt);
T methodType(Class<?> rt, Class<?> pt);
}
// ...add(b->b.methodType(p,q))...
// ...add(MethodType.methodType(p,q), Builder::methodType)...
Doing this trick requires some sort of "constant expression" qualifier
like a cross beteeen the C++ constexpr
and inline
features. It
would then allow us to perform the required complex constant folding
at compile time. There are two places where this feature neeeds to
take effect. First, the interface method must be statically folded;
this means that it must resolve like a static method and not perform
further method selection at runtime. Second, the lambda argument
itself must be marked and compiled as a constant, so that it too can
be expanded.