JEP draft: Integrity by Default

AuthorsRon Pressler, Alex Buckley, & Mark Reinhold
OwnerRon Pressler
TypeInformational
ScopeSE
StatusDraft
Relates toJEP 261: Module System
JEP 260: Encapsulate Most Internal APIs
JEP 396: Strongly Encapsulate JDK Internals by Default
JEP 403: Strongly Encapsulate JDK Internals
JEP 451: Prepare to Disallow the Dynamic Loading of Agents
Created2023/04/13 16:06
Updated2024/04/23 13:09
Issue8305968

Summary

Developers expect that their code and data is protected against use that is unwanted or unwise. The Java Platform, however, contains unsafe APIs that can undermine this expectation, thereby damaging the correctness, maintainability, scalability, security, and performance of applications. Going forward, we will restrict the unsafe APIs so that, by default, libraries, frameworks, and tools cannot use them. Application authors will have the ability to override this default.

What is integrity?

The Oxford English Dictionary defines “integrity” as “the state of being whole and undivided; the condition of being sound in construction.”

In the context of a computer program, integrity means that the constructs from which we build the program — and ultimately the program itself — are both whole and sound. Such constructs, whether they are low-level language facilities such as for loops or higher-level components such as classes or modules, have both specifications and implementations. Integrity thus requires two things of a computing construct:

In more familiar terms, we say that a computing construct has integrity if, and only if, its specification is complete and its implementation is correct with respect to the specification.

For example, the specification of Java arrays says that an array can only be accessed within the bounds set for it upon creation. This constraint is guaranteed by the JVM, which raises an exception if it is violated.

The specification of Java arrays contains many other statements; e.g., that the length of an array never changes, that the first element of an array always has the index zero, and that accessing an array element after setting that element to some value returns exactly the same value (modulo concurrency). The JVM guarantees all of these statements — hence arrays are correct. Taken together, moreover, these statements capture all that we need to know in order to reason about any particular use of arrays — hence arrays are complete. We do not need to wonder, e.g., whether an array might silently increment all its elements at midnight on alternate Wednesdays, because its specification says nothing about midnight, or Wednesdays, and in fact its specification implies that this absurd situation cannot happen. Thus we can say that Java arrays have integrity.

(Integrity has practical limits, of course; the JVM cannot prevent native code or external debuggers or cosmic rays from modifying array content. When we speak of integrity here, we mean integrity within the context of the Java Platform.)

The Java Platform contains not just arrays but many other useful constructs, in both the language and in its built-in libraries. All of these constructs have both specifications and implementations, which taken together give the Platform itself a specification and an implementation. We intend, naturally, that the overall Platform have integrity: Its specification says all that needs to be said in order to reason effectively about its use (completeness), and its implementation behaves according to its specification (correctness). The integrity of the Platform enables us to reason about the correctness of our own code, starting from the specifications of the Platform's constructs.

Benefits of integrity

The Java Platform's integrity underpins many of its key benefits.

Without integrity, we cannot rely upon any of these valuable properties.

Integrity via encapsulation

The Java language provides built-in constructs which enable us to build our own constructs at higher and higher levels of abstraction, by hiding unnecessary detail. We compose statements into methods, methods and fields into classes, classes into packages, packages into modules, and finally modules into entire programs.

Abstraction enables us to control program complexity: We can show that the implementation of a higher-level construct meets its specification by reasoning solely from the specifications of the lower-level constructs upon which it is built; there is no need to consider the implementation details of the lower-level constructs, nor the specifications of any other constructs. Likewise, users of the higher-level construct need only refer to the specifications of that construct, and of any other constructs that they use, when reasoning about their own code; there is no need to consider the implementation details of the higher-level construct, nor the specifications of any other constructs. Ultimately, we can, in principle, show that an entire program meets its specification via such reasoning.

For all of this to work requires that our higher-level constructs themselves have integrity: They must be complete and correct. A key tool for achieving that is encapsulation.

For example, suppose we want to build a counter abstraction that is always even, never odd. Imagine that the Java language had no encapsulation, so that all fields and methods could be accessed from anywhere, as if everything were public. We might declare an EvenCounter class, like so:

/**
 * Specification:
 *   - value() initially returns zero
 *   - incrementByTwo() increments value() by two
 *   - decrementByTwo() decrements value() by two
 *   - value() is always even, never odd
 */
/*public*/ final class EvenCounter {
    /*public*/ int x = 0;
    /*public*/ int value() { return x; }
    /*public*/ void incrementByTwo() { x += 2; }
    /*public*/ void decrementByTwo() { x -= 2; }
}

We can easily show that the EvenCounter class, in isolation, meets its specification, thus it is correct. Its specification, however, is not complete: It does not say everything that needs to be said in order to make effective use of the class. That is because code external to the class can set the x field to an odd number at any time, thereby causing the value() method to violate the class's specification. To show that a use of the class is correct we must analyze every line of the entire program to ensure that no code external to the class modifies this field. Rather than simple local reasoning about each such use, complex global reasoning is required. It is as if the specification of the EvenCounter class includes the additional requirement that

*   - No code external to this class modifies the x field

With the actual Java language, of course, there is no need for this complexity since the language provides encapsulation constructs — the private and public keywords — which allow us to protect data from intentional or unintentional modification.

public final class EvenCounter {
    private int x = 0;
    public int value() { return x; }
    public void incrementByTwo() { x += 2; }
    public void decrementByTwo() { x -= 2; }
}

Here we use the private keyword to protect the x field from external access. The private keyword has integrity: Its specification says that a private field can be modified only by code in the same class, and the Java compiler and the JVM guarantee this specification throughout the program. Making the x field private thus obviates the need to analyze the entire program when reasoning about the correctness of any use of the EvenCounter class. In other words, local reasoning about each such use is sufficient. The class's original specification, above, is thus complete; we already know that the class is correct with respect to that specification, thus the class has integrity.

Abstraction enables us to create higher-level computing constructs; encapsulation enables us to imbue those constructs, and ultimately entire programs, with integrity. This provides tremendous value.

Undermining integrity

Encapsulation is a key tool for establishing integrity. It underpins correctness, maintainability, scalability, security, and performance. There are, however, four APIs in the JDK which can circumvent it.

We refer to these APIs as unsafe because they violate the integrity of the Java language's encapsulation constructs, thereby violating the integrity not only of the Platform itself but of every component and program built on top of it. The private field in an EvenCounter object could, e.g., be modified from outside the class via deep reflection, sun.misc.Unsafe, or native code, resulting in an odd value, violating the class's specification. The public methods of EvenCounter could be redefined by an agent to increment the private field by one instead of two, again resulting in an odd value.

The fact that the language's encapsulation constructs lack integrity destroys the ability to reason locally about a program's correctness. To show that a use of an encapsulated component is correct we must analyze every class on the class path, on the module path, or loaded dynamically, and either rule out the use of unsafe APIs or else ensure that their use does not violate the component's specification. This analysis is not practical, thus any code that relies on the evenness of EvenCounter objects for its own correctness may behave incorrectly, and any client of that code may behave incorrectly, and so on.

Even if a library uses an unsafe API with good intentions, and does not explicitly violate any other component's specification, it could still enable specification violations in an application that uses it. A JSON serialization library could, e.g., deserialize an EvenCounter object by using deep reflection to set the value of the object's private field, bypassing EvenCounter's public API. This, in itself, does not violate the specification of the EvenCounter class. If the application does not, however, take care to explicitly validate that its JSON input does not contain an odd number, then reading such input will result in an odd EvenCounter. The serialization library does not explicitly violate EvenCounter's specification, but by circumventing EvenCounter's defense mechanism — its encapsulation — it makes it vulnerable to indirect specification violations.

This problem is especially serious with security-sensitive components. A vulnerability in a library that uses an unsafe API jeopardizes the integrity of every component of the application, and could allow an adversary to manipulate input to the application in a way that undermines security.

The unsafe APIs in the JDK violate the integrity of language constructs other than those related to encapsulation. Constructs that access arrays and objects, in particular, are specified so as to ensure memory safety: An array cannot be accessed beyond its bounds, and an object cannot be accessed after its storage is reclaimed. We have relied on the memory safety of the Java Platform for decades, but it can be violated by the unsafe APIs, leading to undefined behavior and even JVM crashes.

The integrity of the Java Platform — and hence the correctness, maintainability, scalability, security, and performance of our programs — requires that we prevent encapsulation from being circumvented and memory safety from being violated. How can we square this with the presence of the unsafe APIs, which are designed to offer library, framework, and tool developers special superpowers for use in rare situations in which there is no other way to solve a problem? The answer is that we must adopt integrity by default.

Integrity by default

Integrity by default means that every construct of the Java Platform has integrity, unless overridden explicitly at the highest level of the program. That is, the developer of an application can choose to give up selected kinds of integrity within the scope of that application; the developer of a library, framework, or tool, however, cannot. An application developer can, e.g., choose to configure the Java runtime to allow a serialization library to use unsafe APIs, knowingly acquiescing to a loss of integrity because the library's functionality is indispensable. Without such explicit permission, however, that library cannot, on its own, violate any aspect of Platform or application integrity.

We have gradually been moving the Java Platform toward integrity by default since JDK 9. We have done so by selectively degrading or gating the ability of the unsafe APIs to undermine integrity. This effort has three strands.

Strong encapsulation: The antidote to deep reflection

JDK 9 introduced modules to the Java language. A module is a set of packages designed to work together and intended for re-use. If a package is exported then its public elements can be used outside the module; if a package is not exported then its public elements can be used only inside the module.

Modules provide strong encapsulation, which means that reflection by code outside of a module cannot access the private elements of any class within the module. That is, the setAccessible method respects module boundaries. If the public EvenCounter class, e.g., is declared in an explicit module, then its private field x cannot be modified by deep reflection initiated by code outside the module.

Restrictions on standard unsafe APIs

Most of the unsafe APIs — setAccessible, JNI, FFM, and Instrumentation — continue to be supported in the Java Platform. While they are rarely used by application code directly, they are essential for a relatively small number of libraries, frameworks, and tools whose core functionality cannot be implemented any other way. Examples include:

A component that uses an unsafe API violates the integrity of the Java Platform: It introduces the possibility that encapsulation will be circumvented or memory safety will be violated, thereby rendering the specification of the Platform incomplete. If the Platform has no integrity then components built on top of it have no integrity, and applications themselves have no integrity. The policy of integrity by default enshrines the idea that the developer of a library, framework, or tool cannot unilaterally decide to violate integrity by using an unsafe API. That power — and the corresponding responsibility — belongs solely to the application's developer (or perhaps deployer, on the advice of the developer). The application's developer answers to end users for the behavior of the application; developers of libraries, frameworks, and tools, by contrast, do not.

We cannot treat the mere inclusion of an unsafe-using library or framework in an application as consent by the application's developer to violate integrity. The developer might not be aware that the component uses an unsafe API. The developer might not even be aware that the component is present, since the component could be an indirect dependency several layers removed from the application itself. The application developer must therefore explicitly configure the Java runtime to allow selected components to use unsafe APIs. If the runtime is not suitably configured then using an unsafe API causes an exception to be thrown.

Various command-line options configure the Java runtime to allow the use of unsafe APIs:

Application developers can specify these options in multiple ways:

No matter how they are specified, these options configure the Java runtime when it starts, enabling the JVM to determine how integrity will be undermined and which optimizations should be enabled or disabled. These options also make it easy for application developers to audit the use of unsafe APIs and understand the risks posed to correctness, maintainability, scalability, security, and performance. If none of these options is used then the application developer can be certain that neither the application nor its dependencies violate the integrity of the Platform, the application's dependencies, or the application itself.

Removing non-standard unsafe APIs

The sun.misc.Unsafe class includes methods that perform a variety of low-level operations without any safety checks. Since JDK 9 we have been adding standard APIs that offer safer replacements for this functionality. The low-level manipulation of objects in the JVM's heap, e.g., can now be done more safely via the VarHandle API, and manipulation of data in off-heap memory can now be done more safely via FFM's MemorySegment API.

We have already deprecated for removal and, later, removed some elements of sun.misc.Unsafe which now have standard API replacements. We will continue to do so in future releases. Ultimately, we will deprecate sun.misc.Unsafe itself for removal, and then remove it.

Embracing integrity by default

Libraries, frameworks, and tools can relieve application developers of some of the effort of configuring the Java runtime in many situations.

More elaborate frameworks and applications that wish to control the initialization and bootstrapping of the runtime and/or of components can programmatically grant code permission to use unsafe APIs:

Integrity beyond the Java Platform

Java code can use standard facilities of the Platform to reach outside the Java runtime and violate integrity. Java code can, e.g., alter the content of a class file in the file system before the class is loaded. However, a good principle in matters of integrity is that

The integrity of components is best enforced by the infrastructure that provides them.

The integrity of the file system and its content is the responsibility of the operating system, not the Java runtime. The OS or, if appropriate, an OS-level container, should always be configured so as to protect the integrity of the Java runtime's files and memory, and the integrity of the application’s files, regardless of the measures taken by the Java runtime to protect its own integrity and that of the application it is running.

Why now?

The Java ecosystem has managed just fine without strong encapsulation or restrictions on unsafe APIs for nearly three decades. Why are we now adopting integrity by default, which adds overhead for some library, framework, tool, and application developers?

The answer is that, in recent years, both the JDK and the environment in which Java applications run have changed.

In short: The use of JDK-internal APIs caused serious migration issues, there was no practical mechanism that enabled robust security in the current landscape, and new requirements could not be met. Despite the value that the unsafe APIs offer to libraries, frameworks, and tools, the ongoing lack of integrity is untenable. Strong encapsulation and the restriction of the unsafe APIs — by default — are the solution.