JEP draft: New Invoke Bindings

Owner	Erik Österlund
Type	Feature
Scope	Implementation
Status	Draft
Component	hotspot / compiler
Discussion	hotspot dash dev at openjdk dot java dot net
Effort	S
Duration	S
Reviewed by	Vladimir Kozlov
Created	2019/04/02 10:30
Updated	2022/06/13 13:36
Issue	8221828

Summary

Replace the machine code bindings for invoke bytecodes to not use type speculating inline caches and/or racy machine code patching.

Goals

The primary goal of this JEP is to improve long-term maintainability of the code. The secondary goal is to make megamorphic calls fast.

Non-Goals

It is not a goal to improve the overall performance of Java applications with this new dispatch mechanism. For some applications with hot megamorphic calls, that might be a bonus, but it is not an explicit goal.

Success Metrics

Today's dynamically monomorphic calls should not regress noticeably, and the new mechanism has to remove more code complexity than it introduces.

Motivation

Our baseline megamorphic calls are inefficient. To hide the overheads of megamorphic calls, type speculation machinery called inline caches hope for dynamically monomorphic callsites. When we are lucky and type speculation is successful, a more efficient monomorphic call can be made. But dynamically megamorphic calls are still slow, causing unpredictable performance characteristics, as the difference between monomorphic and megamorphic calls is significantly different. This type speculation logic is also very complicated and has caused many bugs over the years. The complicated life cycle of Just-In-Time (JIT) compiled machine code is a consequence of inline caches. It is also a significant complication for class unloading to deal with the various races that inline caches bring.

Description

To replace type speculating inline caches, the baseline megamorphic calls are to be optimized to be nearly as fast as speculatively monomorphic calls. That way, the speculation machinery (~10000 lines of racy low level code, involving racy machine code patching) can be removed.

The proposal for invokevirtual calls is to flatten the vtable to have direct code pointers at a negative offset from the class pointer. This allows dispatching with a single indirect branch (compared to two direct branches for dynamically monomorphic calls today).

The proposal for invokeinterface calls is to give each interface method a unique number called "selector", and create a cuckoo-style hash table mapping selectors to code pointers for the concrete implementations of the interface method. The table is embedded straight into the native class. This allows dispatching the majority of the time with one indirect branch and one direct branch (compared to two direct branches for dynamically monomorphic calls today).

As for direct calls, today they emit a call straight to the target compiled code, when the target method is compiled, but go through a stub that fills in a method register when going into interpreted mode. This elides the cost of setting a register when calling compiled code, but causes many races and introduces a lot of code complexity, involving racy machine code patching. The cost of setting this register unconditionally and remove the stub code is considered minimal, compared to spilling and stack banging that we usually perform when calling a method.

One tricky thing this JEP has to deal with is invokeinterface bytecodes. In particular, today the JVM can not be certain that the receiver implements the given interface, and must instead solve this with a dynamic type check, that can throw IncompatibleClassChangeError when the receiver does not implement the interface. In particular, the JVMS states:

"For assignments, interfaces are treated like Object." - JVMS14 4.10.1.2.

This relaxation of the verifier type system means that we don't know anything about variables with interface types. They could be anything. The impact on invokeinterface is:

"Otherwise, if the class of objectref does not implement the resolved interface, invokeinterface throws an IncompatibleClassChangeError." - JVMS 14 6.5

While bytecodes violating interface type safety are not generated by javac or any other reasonable compiler, it is possible for hand written bytecodes to violate interface type integrity..

This is problematic for multiple reasons.

Problem 1: Performance.

The current design relies on the monomorphic inline caches implicitly also performing this type check. The megamorphic implementation is slow. Therefore, since the idea with this JEP is to optimize megamorphic calls, such that monomorphic calls through the megamorphic code path, is on par with a speculative inline cache, special care needs to be taken around what to do with this dynamic type check. The naive solution of performing the REFC type check for every invokeinterface results in inappropriate performance characteristics; there are noticeable regressions.

Problem 2: Complexity.

An invokeinterface bytecode can results in an itable dispatch, a vtable dispatch (for default methods) or direct calls (when provable through CHA, etc). Therefore, all these types of calls need to perform the REFC check if the raw bytecode is invokeinterface. This causes some unfortunate complexities.

Problem 3: Type integrity.

Allowing bytecodes to pass around non-Foo instances into Foo typed variables, is quite nasty. Other than causing bugs in the JVM because we cannot trust our own type system regarding interfaces, and explicitly opt-out from optimizations because we do not trust the type system, it is also problematic for users of Java. Since Java programmers might not have read the JVMS, they might not be aware that an invokeinterface bytecode might throw ICCE. From a Java language perspective, this seems like an impossible event. But from the JVM perspective, it is fully possible. This mismatch of expectations around type integrity, can hide security bugs in Java code, where a malicious person can alter the control flow in ways that most Java programmers have no idea is possible, and arguably should not be possible.

The current solution under this JEP has been to avoid the REFC check at any cost. Initially, this has taken the form of having the verifier have two modes: sane and insane. In the sane mode, the verifier checks that interface type safety is ensured. As long as the verifier is in the sane mode (valid for pretty much all reasonable programs, except corner case JVMS test cases), the REFC checks are elided from all JIT-compiled invokeinterface bytecodes. In the unlikely event that a class is linked with insane bytecodes that violate basic type safety, a deoptimization event is triggered. This is a big hammer event that deoptimizes all JIT-compiled methods in the process, and enters insane mode. This will cause new compilations to re-instate the dynamic REFC check, so that we can follow the JVMS.

However, the JVMS was specified this way partially so that we can avoid loading interfaces during verification; an optimization. But loading interfaces that is precisely what I do, in order to elide the REFC check from invokeinterface; also an optimization, yet a much more important one. In other words, the JVMS has looser type checking so we can optimize. But if we really want to optimize what matters, we can not utilize that verifier optimization. The result is supporting two modes: a sane mode and an additional insane mode, used only to comply with the JVMS. When you look at it this way, there is little reason left for the JVMS to enforce the weaker type system on the verifier, and disallow us from doing the intuitive thing. Therefore, I propose to change the JVMS to at the very least allow a JVM to implement proper type checks for interfaces in the verifier. This has the following benefits:

Benefit 1: Performance

Now invokeinterface never needs to perform the REFC check, because the verifier has already proven that insane bytecodes do not exist. This allows the megamorphic calls to be very fast with my proposed itable dispatch mechanism.

Benefit 2: Maintenance

No need for compilers to explicitly distrust their own type system and opt out from various optimizations, or run into bugs due to the type system being broken. Code such as DirectMethodHandles (in Java code) can also remove such checks.

Benefit 3: Type integrity

User code can suddenly trust the Java code they write to do something closer to what they think it will do. The more strictly enforced type system used by the verifier, leaves no opportunities for malicious programmers to exploit insane control flows that should not be possible from a Java programmer's perspective.

The major risk with this proposal to change the JVMS is backward compatibility. While unlikely, it is theoretically possible for some user to have custom written bytecodes (not from javac), that intentionally does things that violate basic type safety, and even intentionally throws ICCE (resulting in deoptimizations). I consider this risk small, compared to the possibly larger risk of having more realistic Java code be susceptible to strange runtime conditions that should never happen.

Conclusively, while it is not needed to change the JVMS, and instead maintain both a sane and insane mode of the verifier for JVMS compatibility, I believe it is the right thing to do, to change the JVMS to allow only the sane mode to be used, and completely opting out from the insane mode. It seems more likely that such bytecodes is a serious bug in the program and should be fixed, than it being intentionally used in that way.

Alternatives

An alternative invokeinterface dispatch mechanism performs graph colouring of interface methods to optimize the call further. Currently not considering that, because the primary goal of this JEP is to simplify the code and make it more maintainable.

Testing

The usual jtreg tests must pass, especially the ones concerning invoke bytecodes. Fortunately, there is already a lot of testing for invoke bytecodes, that can be reused in this effort to change its machine code bindings.

Risks and Assumptions

An assumption with this work is that dynamically monomorphic call sites using a well-predicted indirect branch should be as performant as a direct branch, due to branch target buffering in hardware. In other words, the software type speculation trick is already done in the hardware as well nowadays. But it might regress on machines without well performant branch prediction. The assumption is that such hardware is to a large extent out of fashion today.