JEP draft: Concurrent Monitor Deflation
Authors | Carsten Varming <varming@gmail.com>, Roman Kennke <rkennke@redhat.com>, daniel.daugherty@oracle.com |
Owner | Daniel Daugherty |
Type | Feature |
Scope | Implementation |
Status | Closed / Withdrawn |
Component | hotspot / runtime |
Discussion | hotspot dash runtime dash dev at openjdk dot java dot net |
Effort | M |
Duration | L |
Reviewed by | David Holmes, Mikael Vidstedt |
Created | 2017/07/06 02:49 |
Updated | 2020/05/29 00:04 |
Issue | 8183909 |
Summary
Deflate idle ObjectMonitors concurrently with Java thread execution in order to reduce safepoint pause times.
Non-Goals
-
Removing the Type Stable Memory (TSM) attribute of ObjectMonitors is not a goal of this project.
-
Removing the safepoint based ObjectMonitor deflation mechanism is not a goal of this project; the safepoint based mechanism will be left in place for possible fall back for at least one release cycle.
Motivation
Reducing safepoint pause times reduces system latency and is a "good thing" (TM) for advanced projects like ZGC.
Success Metrics
-
Decrease safepoint pause times caused by the deflation of idle ObjectMonitors (reduce time spent in "cleanup" safepoints).
-
Throughput must not be negatively affected in a statistically significant way as measured by heavy contended ObjectMonitor benchmarks like SPECjbb2015.
-
Startup must not be negatively affected in a statistically significant way as measured by Oracle's fast startup benchmarks.
Description
ObjectMonitors are associated with Java objects on an as-needed basis; this is called "inflation". Operations like Object.wait() or contended synchronization cause inflation. When an ObjectMonitor becomes idle, it is eligible for "deflation". Inflation and deflation of ObjectMonitors are invisible to a Java program.
ObjectMonitors are generally managed on a Java thread's in-use or free list; they can also be managed on a global in-use or free list. The process of deflation moves an ObjectMonitor from an in-use list to the global free list.
In the current system, idle ObjectMonitors are deflated at a safepoint and the time that it takes to do that work contributes to a safepoint's pause time. Over the years work has been done to reduce the total time it takes to deflate ObjectMonitors:
-
adding in-use lists (and free lists) made finding idle ObjectMonitors take less time since already free ObjectMonitors no longer had to be visited.
-
using worker threads to handle per-thread in-use lists in parallel reduced the wall clock time needed to do the work.
The next step in the evolution of the ObjectMonitor subsystem is to deflate idle ObjectMonitors concurrently with Java thread execution instead of doing that work at a safepoint. Of course, deflating an idle ObjectMonitor while not being at a safepoint changes some of the invariants in the ObjectMonitor subsystem and we have to make some adjustments. It also introduces new race conditions that did not previously exist so we also have to account for those.
As you can imagine, there are lots of details in a project that changes a core subsystem like ObjectMonitors so an OpenJDK wiki has been created that covers all the details:
https://wiki.openjdk.java.net/display/HotSpot/Async+Monitor+Deflation
Testing
-
Tests for stressing the ObjectMonitor inflation and deflation mechanisms have been developed and used during this project.
-
SPECjbb2015, DaCapo-h2 and Volano runs in Oracle's Performance lab have been used to verify that throughput and performance has stayed (approximately) the same.
-
Oracle's fast startup testing has been used to verify that startup times have stayed (approximately) the same.
-
SPECjbb2015 runs with logging enabled have been used to verify that safepoint pause times due to deflation have been reduced.
-
The ObjectMonitor subsystem is core to the execution of non-trivial Java programs so just running most of the test suites will exercise this subsystem. We regularly executed Mach5 Tier1-Tier8 during the development of this project.
Risks and Assumptions
The ObjectMonitor subsystem is a critical part of the Java platform. If we break it, then the system can come to a grinding halt, hang or cause data corruption due to broken synchronization. Sounds scary!
Fortunately, like other key subsystems in the Java platform, "just" running the tests reduces the risk of breaking something. We have been running Mach5 Tier1-Tier8 tests for most of the year plus duration of this project. We have requested special test runs from other projects like ZGC. We have asked for test runs from folks outside of Oracle. Due diligence has been a core theme for this project.