JEP 171: Fence Intrinsics

Owner	Doug Lea
Type	Feature
Scope	JDK
Status	Closed / Delivered
Release	8
Component	hotspot / runtime
Discussion	hostspot dash dev at openjdk dot java dot net
Endorsed by	Mark Reinhold
Created	2012/11/27 20:00
Updated	2017/06/14 19:19
Issue	8046161

Summary

Add three memory-ordering intrinsics to the sun.misc.Unsafe class.

Motivation

JVMs don't have advertised mechanisms providing memory orderings that were not envisioned originally or in the JSR 133 memory model specs. (But are present for example in the recent C11/C++11 specs.) These include several constructions used in java.util.concurrent and other low-level libraries that currently rely on undocumented (and possibly transient) properties of existing intrinsics.

Adding these methods at the VM level permits use by JDK libraries in support of JDK 8 features, while also opening up the possibility of later exporting the base functionality via new java.util.concurrent APIs. This may become essential to allow people developing non-JDK low-level libraries if upcoming modularity support makes these methods impossible for others to access.

Description

The three methods provide the three different kinds of memory fences that some compilers and processors need to ensure that particular accesses (loads and stores) do not become reordered. In practice, they are identical in effect to existing getXXXVolatile, putXXXVolatile, and putOrderedXXX methods, except that they do not actually perform an access; they just ensure the ordering. However, they conceptually differ in one way: According to the current JMM, some language-level uses of volatile may be reordered with some uses of non-volatile variables. But this would not be allowed here. (It is not allowed in the current intrinsics either, but this is an undocumented difference between intrinsics-based vs language-based volatile access.)

The three methods are:

/**
 * Ensures lack of reordering of loads before the fence
 * with loads or stores after the fence.
 */
void loadFence();

/**
 * Ensures lack of reordering of stores before the fence
 * with loads or stores after the fence.
 */
void storeFence();

/**
 * Ensures lack of reordering of loads or stores before the fence
 * with loads or stores after the fence.
 */
void fullFence();

While there is nothing at all "unsafe" about the three methods, by convention, Unsafe currently houses related methods for per-use volatile and atomic operations, so seems to be the best home for these methods.

The javadoc could be made more rigorous, although cannot be specified in terms of the JMM because it doesn't cover "per-use volatileness". So, leaving them in this simple form probably best conveys intent to target users. Also, it is probably impossible (because of likely existing usage dependencies) to at the same time make explicit and slightly weaken the minimally required reordering properties of existing volatile access intrinsics. However, the presence of fence intrinsics allows users to unambiguously obtain these effects when needed.

Hotspot Implementation

The three methods can be implemented in terms of the existing acquire, release, and volatile fence methods available across c1, c2, and the interpreter/runtime, plus, when applicable, suppressing internal compiler reorderings and value reuse. This requires no new underlying capabailities, but still requires adding to and adapting the code strewn across Hotspot necessary for new intrinsics. Omitting these, here is an implementation sketch:

For c2, implementation amounts to methods that omit all the access code in methods like inline_unsafe_access, leaving only generation of the memory-based fences, plus an internal CPUOrder fence that disables reorderings during optimization. (In the case of fullFence, also cautiously including an acquire fence along with the full volatile fence, to cover the possibility that existing c2 code relies on the presence of MemBarAcquire to detect volatile-ness with respect to loads.)

loadFence: {
  insert_mem_bar(Op_MemBarCPUOrder);
  insert_mem_bar(Op_MemBarAcquire);
}

storeFence: {
  insert_mem_bar(Op_MemBarCPUOrder);
  insert_mem_bar(Op_MemBarRelease);
}

fullFence: {
  insert_mem_bar(Op_MemBarCPUOrder);
  insert_mem_bar(Op_MemBarAcquire);
  insert_mem_bar(Op_MemBarVolatile);
}

For c1, new nodes can be defined with LIRGenerator actions of

loadFence:  { if (os::is_MP()) __ membar_acquire(); }
storeFence: { if (os::is_MP()) __ membar_release(); }
fullFence:  { if (os::is_MP()) __ membar(); }

Plus, for all three, disabling GVN in ValueMap:

xxxxxFence: { kill_memory(); }

And for the C++ runtime versions (in prims/unsafe.cpp), implementing via the existing OrderAccess methods:

loadFence:  { OrderAccess::acquire(); }
storeFence: { OrderAccess::release(); }
fullFence:  { OrderAccess::fence(); }

Testing

The test infrastructure recently set up by Aleksey Shipilev for torture-testing volatiles and atomics is simple to adapt to obtain the same coverage for fence-separated access versus volatiles.

Risks and Assumptions

We assume that Oracle engineers will continue to assist integrating into JDK 8.