JEP draft: Java Thread Sanitizer

Authorjcbeyler
OwnerJean Christophe Beyler
TypeFeature
ScopeJDK
StatusDraft
Releasetbd
Componenthotspot
Created2018/07/30 16:26
Updated2019/03/08 21:40
Issue8208520

Note: This description is still a work in progress :)

Summary

Provide a dynamic data race detector implementation for Java and native JNI code utilizing ThreadSanitizer in OpenJDK.

Goals

Provide a means to detect data races in Java programs and their associated native JNI code:

Non-Goals

Motivation

A data race in a Java program makes the program incorrectly synchronized, leading to erroneous, nondeterministic, and unexpected behaviors that typically only occur rarely. As Java provides JNI to allow interaction between native code and Java code, and programs with data races in C/C++ have undefined behaviors, data races in native code could provoke undefined behaviors in Java programs, violating the memory and type safety guarantees Java would provide. As programs become more and more complex, data race is a major pitfall in the way of Java programs’ robustness, safety and reliability, yet it is extremely difficult to detect, reproduce, fix, or eliminate all data races.

Thus, it is important to provide a stable and maintained data race detection tool for both Java and native JNI code in OpenJDK. Such a tool is a crucial step forward to help developers debug and fix data races in large and small programs, reducing the number of data races and improving reliability. Ideally the tool must be dynamically complete so it can detect all manifested data races at runtime; and precise to be useful to developers, as it is already difficult to reason about a data race.

Instead of relying on research tools that instrument Java bytecode (such as FastTrack2 in RoadRunner), the implementation fully integrates ThreadSanitizer (TSan) into OpenJDK. In this way the tool is able to detect data races in both Java and native JNI code.

Description of TSAN for the LLVM case

TSan utilizes happens-before analysis to dynamically detect data races in C/C++ programs both completely and precisely. The implementation works by using the LLVM compiler to embed callbacks in the user program on events of interest, such as mutex acquire/releases and memory read/writes. The TSan runtime is linked into the executable. When a thread executes a TSan callback, the TSan runtime updates some metadata (stored in a fixed offset from the mutex or variable address, for efficiency) and performs checks on the metadata to determine if a data race exists. If so, a report is issued via stderr. Considerable work has been done to ensure both a low performance overhead (typically ~4x) and a virtually zero false-positive rate -- it is a bug against TSan if it issues a data race report when a race did not manifest.

For the Java case, the end goal of TSan/Java is to provide comprehensive cross-language data race detection for user code. Implementing TSan/Java includes three broad aspects: (1) adding TSan callbacks into the JVM, (2) integrating the TSan runtime to run side-by-side with the JVM, and (3) providing symbolization services when a warning is issued by TSan.

TSan callbacks in the JVM

The TSan runtime must track all Java synchronization and Java field accesses in order to detect data races in Java code. There are three sources of Java-level synchronization (language-level monitors, volatile accesses, java.util.concurrent libraries) and two sources of VM-exported synchronization (raw monitors and JVMTI raw monitors). Java field accesses include loads and stores to scalar and static fields and arrays, and reflective field accesses.The TSan/Java implementation must instrument the right callbacks to the TSan runtime at all above sources of Java synchronization and field accesses.

Because TSan stores metadata based on pointer offsets, TSan needs to have accurate knowledge of where Java objects are placed in the heap. Since TSan cannot know the lifecycle of Java objects, the JVM instrumentation must tell TSan when a Java object has moved or has been collected. TSan provides a callback to handle moved object, and the TSan/Java implementation inserts this callback at appropriate places, such as garbage collectors.

Running the TSan runtime in a JVM process

The TSan runtime was designed to maintain correct memory and synchronization state for a C++ program. Because the JVM is itself a complex C++ program, the two require some careful coordination to co-exist peacefully. The main problem is the extensive use of libc interception by TSan; TSan/Java gets around this by having TSan perform stack introspection and maintain per-thread “called from the JVM” bool state. Most libc functions, when called from the JVM, will be redirected to the underlying libc implementation. One notable exception is the set of memory allocation functions, which are redirected to use TSan’s custom allocator.

Symbolization services for TSan reports

In order to provide useful stack traces to users, TSan maintains a shadow stack for each thread that captures program counters. For TSan/Java, providing the raw “program counter” is not useful, since a C++ symbolizer cannot decode it, and because the code blob may be collected at a later time. Instead, we make use of the GTrace mechanism to pass GTrace method id tokens to the TSan runtime. The tokens are stored and later must be decoded by TSan. The JVM provides a public C++ function (TsanJavaSymbol) that allows a C++ or Java thread to call into the JVM and retrieve the method string associated with that token.

Use-case example

The developer would configure and build the JDK with a config flag --use-tsan. Then, when invoking java, the developer uses the -XX:+TSan to enable TSan tracking. If a race is found, TSan/Java would generate the following report at program exit:

WARNING: ThreadSanitizer: data race (pid=7616)
  Read of size 4 at 0x7fd80270b328 by thread T26:
    #0 C.Get()I (C.java:7)  
    #1 A.Get()I (A.java:6)  
    #2 (Generated Stub)  
    #3 GetValue (Native.cc:18) (Test_native+0x10e330f)
    #4 Java_N_GetNative (Native.cc:23) (Test_native+0x10e330f)
    ...
  Previous write of size 4 at 0x7fd80270b328 by thread T5:
    #0 C.Set(I)V (C.java:4)  
    #1 A.Set(I)V (A.java:3)  
    #2 (Generated Stub)  
    #3 SetValue (Native.cc:7) (Test_native+0x10e31d8)
    #4 Java_N_SetNative (Native.cc:12) (Test_native+0x10e31d8)
    ...
  SUMMARY: ThreadSanitizer: data race  in C.Get()I (C.java:7)

Config flag and New flag

There should be a config flag --use-tsan to compile and link with TSan specifics. There also should be a -XX:+TSan JVM flag to enable TSan at runtime.

Other requirements include:

Implementation details

The current implementation details are:

Alternatives

FastTrack and FastTrack2 are the state-of-the-art dynamic data race detection algorithms that are both complete and precise. Their prototypes were implemented in the RoadRunner framework using bytecode instrumentation. As such they can detect data races in Java programs, but not in native JNI code.

Static data race detection algorithms could detect all data races in a Java program’s source code, but they typically also report a large number of false data races. Static algorithms also have troubles scaling to large amount of Java source code, handling dynamic classloading, reflection, JNI code and synchronization from volatile fields and java.util.concurrent libraries.

Dynamic data race detectors using the lockset algorithm or its variants could detect more true data races in the source code, but they cannot handle all types of synchronization such as volatile field accesses and java.util.concurrent libraries. As such they could report false data races.

Testing

From a functional point of not breaking TSan, testing can be done by running existing JTREG tests using the new flag. Also, a few race condition codes will be added to explicitly test race conditions and show the functional and usefulness of the TSan system.

For example, the following unit-tests should pass:

Risks and Assumptions

There are no performance penalties or risks with the feature disabled. A user who does not enable the system will not perceive a performance difference.

However, there is an expected large overhead when enabling TSan. It is a complex system adding logic to almost all memory reads and writes; it is assumed TSan will not be enabled in production jobs.