JEP draft: Integrity by Default

AuthorsRon Pressler, Alex Buckley
OwnerRon Pressler
TypeInformational
ScopeSE
StatusSubmitted
Relates toJEP 261: Module System
JEP 260: Encapsulate Most Internal APIs
JEP 396: Strongly Encapsulate JDK Internals by Default
JEP 403: Strongly Encapsulate JDK Internals
JEP 451: Prepare to Disallow the Dynamic Loading of Agents
Reviewed byAlan Bateman, Mark Reinhold
Created2023/04/13 16:06
Updated2023/10/22 07:12
Issue8305968

Summary

Java developers expect that their code and data is protected against use that is unwanted or unwise. However, the Java Platform contains unsafe APIs that can undermine this expectation, damaging the correctness, security, and performance of applications. As Java continues to move forward, it is appropriate to restrict unsafe APIs so that libraries cannot use them unless the application as a whole allows it.

Motivation

What is integrity?

Integrity is a guarantee by the Java Platform that a property, once established, applies throughout the program. For example, the statement int[] x = new int[10]; establishes a property about the newly created array: it shall only be accessed at x[0] through x[9]. This property has integrity because the JVM guarantees that it applies throughout the program: accessing x[10] will trigger an exception. Developers can rely on this property without having to analyze every line of code on the class path to confirm that it applies. In contrast, a similar statement in a C program establishes the same property but with no guarantee by the C runtime that it applies throughout the program; it has no integrity. To know that it applies throughout the program, the C developer will have to analyze every line of code in the program.

Integrity, therefore, enables local reasoning about a program's correctness. The local reasoning applies globally, whether the program is 100 lines or 100,000 lines.

We call a property that applies globally an invariant, whether it is guaranteed by the Java Platform, proven by the programmer, or simply holds by chance. We call an invariant an integrity invariant if it is guaranteed by the Java Platform. The property that an array is never accessed beyond its bounds is an integrity invariant; here are some others:

These are all safety properties that prevent "bad things" from happening. As such, we only notice how much we depend on them when things go wrong. The local reasoning enabled by integrity is important not only for developers, but also for the Java Platform's own operation.

Integrity from encapsulation

Java developers routinely wish to establish invariants for their own classes and objects, and have those invariants apply throughout the program. We call such an invariant a domain invariant. The key to having a domain invariant apply throughout the program is encapsulation.

For example, suppose a developer wants to implement counters that are always even, never odd. Imagine if Java had no encapsulation, so that all fields and methods could be accessed from anywhere, as if everything was public. A Counter class might look like this:

/*public*/ final class Counter {
    /*public*/ int x = 0;
    /*public*/ int value() { return x; }
    /*public*/ void incrementByTwo() { x += 2; }
    /*public*/ void decrementByTwo() { x -= 2; }
}

Within the class, the developer can prove their domain invariant: x has an even value initially, and is only modified in a way that preserves evenness, so every Counter is even. Unfortunately, the domain invariant that "every Counter is even" does not apply globally, because without encapsulation, x could be set to an odd value from outside the class. The developer would have to analyze every line in the program to know whether every Counter is even.

Encapsulation is what makes the domain invariant apply globally. The developer uses the private keyword to protect data from intentional or unintentional modification by other parts of the program:

public final class Counter {
    private int x = 0;
    public int value() { return x; }
    public void incrementByTwo() { x += 2; }
    public void decrementByTwo() { x -= 2; }
}

The property that a private field can only be modified by code in the same class is an integrity invariant, guaranteed by the Java compiler and the JVM to apply throughout the program. By declaring x as private, the developer makes the Java Platform guarantee that code outside Counter cannot modify x; this imbues the domain invariant with integrity, as if the Java Platform was built to ensure that every Counter is even. Local reasoning about the domain invariant now applies globally: the developer can trust the evenness of every Counter without having to analyze every line in the program.

In effect, encapsulation upgrades a domain invariant to an integrity invariant, backed by the Java Platform. This provides tremendous value:

Undermining integrity

Encapsulation underpins correctness, maintainability, scalability, security, and performance, but there are four APIs in the JDK that can circumvent encapsulation:

We refer to these APIs as "unsafe" because they can break the integrity invariant that a private field can only be modified by code in the same class. Because of these APIs, encapsulation is not actually strong enough to provide integrity for an application's domain invariants. The private field in Counter could be modified from outside the class via deep reflection, native code, or sun.misc.Unsafe, resulting in something that should never happen: an odd Counter. The public methods could be redefined by an agent to increment the private field by one instead of two, again resulting in an odd Counter.

The lack of integrity destroys the ability to reason locally about the program's correctness. No domain invariant can be trusted. To prove that a domain invariant still applies globally, developers would have to analyze every JAR on the class path to rule out the use of deep reflection, native code, agents, and sun.misc.Unsafe. Since this analysis is not practical, any code that relies on the evenness of Counter objects for its own domain invariants may behave incorrectly, and any client of that code may behave incorrectly, and so on.

Even if a library uses an unsafe API with good intentions, it may harm an application that directly or indirectly uses the library. This happens when circumventing encapsulation breaks the application's domain invariants. For example, a library used to serialize and deserialize objects to and from JSON could deserialize Counter objects by bypassing Counter's public API and using deep reflection to set the value of its private field; if the JSON input contains an odd number, there will be an odd Counter in the program. This problem is especially serious when the code that relies on a domain invariant is a security mechanism: An accidental vulnerability in a library that uses an unsafe API jeopardizes any security mechanism anywhere in the application, and could allow a remote attacker to manipulate input to the program in a way that undermines the security mechanism. The lack of integrity, therefore, allows a library to have a global yet hidden effect on the application.

Unsafe APIs can break other integrity invariants too. Memory safety is a class of integrity invariants that includes two mentioned earlier: an array is never accessed beyond its bounds, and a program never suffers from "use after free". Java developers have relied on memory safety for decades, but it can be violated by unsafe APIs, leading to undefined behavior and even JVM crashes.

To ensure correctness, maintainability, scalability, security, and performance, the Java Platform must prevent encapsulation from being circumvented and memory safety from being violated. How can this be squared with the presence of unsafe APIs which are designed to be "superpowers" for library developers? The answer is that the Java Platform can provide integrity by default.

Integrity By Default

Integrity by default is a vision that the Java Platform has been progressing toward since JDK 9. It means selectively degrading the ability of unsafe APIs to undermine integrity, so that application developers can trust their invariants and be free of undefined behavior.

Integrity by default has three strands:

  1. Strong encapsulation of code in modules. By default, deep reflection cannot circumvent strong encapsulation.
  2. Unsafe APIs that are standard in the Java Platform are restricted. By default, Java code cannot violate memory safety by using JNI or FFM to call native code, nor circumvent encapsulation by using Java Instrumentation to redefine methods.
  3. Unsafe APIs that are non-standard are removed when standard replacement APIs become available. The replacement APIs are designed so that, by default, they cannot undermine integrity.

Strong encapsulation: the antidote to deep reflection

JDK 9 introduced modules to the Java language. A module is a set of packages designed to work together. Some of the packages are exported, which means their public elements can be used outside the module. The remaining packages are unexported, which means their public elements can only be used inside the module.

A module provides strong encapsulation, which means that deep reflection by code outside the module can access only the public elements of exported packages. That is, the setAccessible method respects module boundaries. For example, if the public Counter class lives in an exported package, then its private field cannot be modified by deep reflection initiated from code in a different module.

Integrity by default embraces the idea that the unit of local reasoning is the module. Every module is in charge of maintaining its own domain invariants, and integrity by default means that those domain invariants have integrity.

Restrictions on standard unsafe APIs

Most unsafe APIs – setAccessible, JNI, FFM, and Java Instrumentation – continue to be supported by the Java Platform. While they are rarely used by application code directly, they are essential for a relatively small number of libraries whose core functionality cannot be implemented any other way. Examples include:

Using an unsafe API introduces the possibility that encapsulation will be circumvented or memory safety will be violated, losing integrity. Because integrity supports local reasoning, and because a loss of integrity has a global effect on the application, integrity by default enshrines the idea that a library developer cannot unilaterally decide to use unsafe APIs. In order for code to use unsafe APIs, the Java runtime as a whole must be configured to allow it. If the Java runtime is not suitably configured, then use of an unsafe API will cause an exception.

Configuration of the Java runtime is the responsibility of application developers, for two reasons. First, application developers, not library developers, answer to end users for the behavior of the application's code and that of its dependencies. Second, application developers already expect to configure the Java runtime with options that control global effects, such as thresholds for garbage collection and values for system properties.

There are three options that configure the Java runtime to allow use of unsafe APIs:

Application developers can express these options in several forms:

All these forms have the advantage of configuring the Java runtime when it starts, enabling the JVM to infer which integrity invariants might be undermined and which optimizations should be enabled or disabled. In addition, these forms make it easy for application developers to audit the use of unsafe APIs and understand the risks posed to maintainability, security, and performance. If none of the forms are used, the application developer can be certain that neither the application's code nor that of its dependencies will circumvent encapsulation or violate memory safety.

Removal of non-standard unsafe APIs

The sun.misc.Unsafe class has methods that perform a variety of low-level operations without any safety checks. Since JDK 9, the JDK has added standard APIs that offer safe replacements for sun.misc.Unsafe's functionality. For example, low-level manipulation of objects in the JVM's heap can be done safely through the VarHandle class, and manipulation of data in off-heap memory can be done safely through the MemorySegment class. Accordingly, in a future JDK release, sun.misc.Unsafe will be deprecated for removal.

Embracing integrity by default

Integrity by default means that integrity may be lost, but only with the consent of application developers. When application developers configure the Java runtime to allow use of unsafe APIs, they acquiesce to a possible loss of integrity when consuming test frameworks, serialization libraries, native wrappers, etc. This is usually acceptable because the libraries' functionality is indispensable. However, application developers could be relieved of having to configure the Java runtime if library developers embraced integrity by default.

Beyond integrity in the Java Platform

If code reaches outside the Java runtime, it could smash the integrity of a class's domain invariants without circumventing encapsulation or violating memory safety. For example, it could alter the contents of a class file in the file system before the class is loaded. However, a good principle in matters of integrity is that the integrity of components is best enforced by the infrastructure that provides them. The integrity of file system invariants, including access control, is the responsibility of the OS. Appropriate configuration of mechanisms in the OS, container, etc, should always be used to protect the integrity of the Java runtime (files, memory, etc) regardless of the measures taken by the Java Platform to protect the integrity of Java code.

Why now?

An obvious question: Why is the Java Platform adopting integrity by default, which adds overhead for some application developers, when applications and libraries managed fine without strong encapsulation or restrictions on unsafe APIs for two decades? The answer is that in recent years, both the JDK and the environment in which Java applications run have changed:

In short: The evolution of the JDK caused serious migration issues, there was no practical mechanism that enabled robust security in the current landscape, and new requirements could not be met. Despite the convenience that unsafe APIs with "superpowers" have offered to libraries, the lack of integrity is untenable. Strong encapsulation and the restriction of unsafe APIs are the solution.