JEP draft: Integrity and Strong Encapsulation

AuthorsRon Pressler, Alex Buckley
OwnerRon Pressler
TypeInformational
ScopeSE
StatusSubmitted
Relates toJEP 261: Module System
JEP 260: Encapsulate Most Internal APIs
JEP 396: Strongly Encapsulate JDK Internals by Default
JEP 403: Strongly Encapsulate JDK Internals
JEP 451: Prepare to Disallow the Dynamic Loading of Agents
Reviewed byAlan Bateman, Mark Reinhold
Created2023/04/13 16:06
Updated2023/05/22 15:55
Issue8305968

Summary

The Java Platform assures the integrity of code and data with a variety of features that are on by default. Strong encapsulation is one such feature, but it can be circumvented by some APIs, causing headaches for maintenance and performance. As Java continues to move forward, it is appropriate to restrict all APIs so that they cannot break strong encapsulation without explicit user permission, while still accommodating use cases that need to operate beyond encapsulation boundaries.

Goals

Non-Goals

Motivation

Integrity in the Java Platform

Over the past few years, the Java Platform has been inching toward a vision of greater integrity. Integrity is the guarantee that a property established at one point in the program applies at all points in the program. For example, the array creation new int[10] establishes a property about the array — it is never read or written past its tenth element — and this property has integrity because the JVM guarantees the array bound is respected. Developers can rely on this property (which we call an integrity invariant) without having to analyze every line of code to confirm that it applies, as they would for a C program with a similar array creation. Integrity, therefore, enables local reasoning about a program's correctness.

Here are some other integrity invariants offered by the Java Platform:

Integrity invariants are safety properties that prevent "bad things" from happening. As such, we only notice how much we depend on integrity when things go wrong. The local reasoning enabled by integrity is important not only for developers, but also for the JVM as it analyzes and optimizes running code (to be discussed later). Integrity, therefore, is essential for the Java Platform's own operation.

Encapsulation as the Foundation of Integrity

The integrity invariants listed above are established by the Java Platform; they ensure that Java, unlike C, does not have undefined behavior. Java developers also want to establish integrity invariants: properties of their own code, and their own data, which are guaranteed to apply throughout the program. To do this, developers use access control modifiers – public , private , protected , and the default "package" access – to protect code and data declared in one part of the program from other parts. For example, suppose a developer wants to establish the invariant that the state of a counter-like object is always even, never odd. The class could be written as follows:

public final class Even {
    private int x = 0;
    public int value() { return x; }
    public void incrementByTwo() { x += 2; }
    public void decrementByTwo() { x -= 2; }
}

By declaring x as private and having all the public methods preserve the parity of x, the developer has used encapsulation to establish the invariant that every Even object in the program has even state. The integrity of this domain-specific invariant (the state is always even) relies on the integrity of encapsulation (when x is private, absolutely no code outside Even can touch it).

Encapsulation is a cornerstone of programming in the large because it allows a program to be constructed from independently-developed components that interact only through their public APIs, each of them can be reasoned about in isolation. It is this ability that allows both individual Java programs and the entire Java ecosystem to scale as collections of independent, interoperating components.

From Encapsulation To Strong Encapsulation

Unfortunately, the parity invariant above does not have the integrity that the developer might hope for. This is because any code on the class path could employ deep reflection to override access control (the private modifier on x) and assign an odd value to x directly. Deep reflection has existed since JDK 1.1, when the method java.lang.reflect.AccessibleObject.setAccessible was introduced.

Given the possibility of some code calling this method, it would take a global analysis of the codebase to ensure that Even's parity is, indeed, an invariant. The developer might assume that no code would intentionally break the invariant, but it could be broken unintentionally. For example, another developer in the organization could decide to serialize and deserialize instances of Even to and from JSON using a library. When deserializing JSON input, the library would bypass Even's public API and use deep reflection to set the value of x. If the JSON input contains an odd number, the invariant will be broken.

As a result of deep reflection – and other mechanisms that disregard or bypass encapsulation, to be discussed later – the meaning of Java code is provisional, and the encapsulation merely advisory. A method or field is private, unless other code really wants to access it. A final field is assigned once, unless other code wants to assign it again later. The meaning of a method is defined by a block of code, unless other code decides to redefine the method later (this involves an agent, which is a class with access to a special API that allows it to change other Java code). This provisionality is not hypothetical: Some libraries change the meaning of code outside them in arbitrary ways, so that neither a person reading the code nor the Java Platform itself can believe that the code does what it says or that its meaning does not change over time as the program runs.

To allow developers to use encapsulation to truly establish integrity invariants, JDK 9 introduced modules. A module is a set of packages, some of which are designed to be used outside the module (they are exported), while others are designed to be used only inside the module (they are unexported). Everything in an unexported package is strongly encapsulated – deep reflection cannot break in. Similarly, the non-public elements of exported packages are also strongly encapsulated. Since x is a private field, strong encapsulation allows the parity invariant that is established locally in the Even class to be trusted globally.

Strong encapsulation gives integrity to encapsulation – it guarantees no one outside the class can assign x – and in so doing it gives integrity to the invariant that x is always even. Strong encapsulation offers a solid foundation to build on. Without it, code is a castle in the sand.

Other than making it easier to establish business-logic invariants important for a program's correctness, strong encapsulation is beneficial for three general reasons:

Strong Encapsulation by Default

Because Java since JDK 1.1 had allowed encapsulation to be broken via deep reflection, a number of libraries intended for use in production came to depend on the ability to break it. The reasons for breaking encapsulation were varied:

Libraries that broke encapsulation supplied needed and useful functionality to the Java ecosystem. Application developers benefited from the functionality of libraries deep in their dependency tree that broke encapsulation, and at the same time enjoyed the encapsulation in their own code. But that encapsulation was illusory. If even a single library used by an application could arbitrarily bypass encapsulation, none of the integrity invariants established through encapsulation in the entire application could be relied upon. These benefits are, unfortunately, contradictory, and applications must be allowed to choose between them.

Because deep dependency trees are common, the chances are high that an application would unknowingly depend on an encapsulation-breaking library. Consequently, if applications had to opt into strong encapsulation, few would be able to do so. The platform must, therefore, exert pressure on the ecosystem to minimize the proliferation of libraries that bypass strong encapsulation by making strong encapsulation opt out rather than opt in.

Strong encapsulation must be the default. This is the goal the Java Platform is approaching.

JDK 9 accommodated those libraries that broke encapsulation by only enforcing strong encapsulation at compile time; meanwhile, at run time, deep reflection was permitted, with "illegal reflective access" warnings to encourage maintainers to prepare libraries for strong encapsulation. Official replacements for the internal JDK classes above were added to the JDK, massively reducing the need to break encapsulation on modern JDKs. The VarHandle API and the ongoing work on Foreign Function & Memory API make uses of sun.misc.Unsafe obsolete. Legacy bugs have been fixed so it is exceptionally rare to need to break encapsulation to work around them. Library developers wishing to target both new and old JDKs can easily do so using a Multi-Release JAR.

In 2021, JDK 16 began enforcing strong encapsulation at run time, turning the warnings into errors. Applications that encounter access errors due to encapsulation-breaking libraries must update them to versions that don't access JDK internals.

Disabling Strong Encapsulation

As a practical matter, some libraries haven't been updated to run on JDK 16 and above, but it's necessary to run them on JDK 16 and above anyway. The circumstances for breaking encapsulation, unfortunately, persist.

In addition, there are some tools and libraries whose functionality fundamentally operates beyond encapsulation boundaries. Here are a few examples:

To balance the need for integrity with both the circumstantial, convenience uses of JDK internals and the essential uses, Java gives the user – the application's owner (typically its author, maintainer, or deployer) – the final say on which strong encapsulation boundaries are in place and which should be ignored. This freedom is offered under the guiding principle that the ability of one component to encroach on the boundaries of another must be explicitly granted by the application. Libraries cannot choose to obtain encapsulation-busting "superpowers" without the knowledge and consent of the application's owner.

Integrity by default, therefore, means that integrity may be broken – but only with the user's consent.

This consent can be granted as follows:

Integrity requires that libraries must not encroach on other components without the application's consent; otherwise, the boundaries on the map -- and so the attack surface area of the application, its maintenance risk, and the optimizations that can be performed -- would be unknowable. When only the application is permitted to explicitly grant "superpower" privileges, the application's authors are able to better judge what risks affect them and to better control the attack surface area of the application. The command line serves as an auditable map of the codebase and its internal encapsulation boundaries that the application draws as it wishes.

Disabling strong encapsulation imposes risks:

Overall, the burden of responsibility imposed on application maintainers who find themselves having to maintain encapsulation-disabling permissions is nowhere near as high as the cost that lacking integrity by default places on the platform and the ecosystem. A palpable demonstration of that cost was the difficulty many applications experienced when migrating from JDK 8 to later versions, which was predominantly caused by non-portable libraries.

The experience of the past few years has shown that the ecosystem is able to adapt to strong encapsulation -- at least of the JDK itself. Most Java code, which resides in applications, has never had much need to directly access JDK internals; high-level libraries and frameworks have similarly rarely reached into the innards of the JDK. Code that breaks encapsulation is usually found in low-level libraries that would normally be transitive dependencies of applications, and many libraries that had previously depended on JDK internals have stopped doing so. The impact on the ecosystem has mostly been that applications were required to upgrade their dependencies. Simultaneously, the burden placed on applications to grant libraries "superpower" privileges has put pressure on libraries to reduce their reliance on deep reflection and similar capabilities.

Beyond Deep Reflection

Integrity by default has not yet been achieved because strong encapsulation is not yet universal in the Java Platform. Some APIs allow any library to surreptitiously claim integrity-violating superpowers for itself, without the application's explicit consent, and use these superpowers to break encapsulation. Any library can:

It is worth mentioning that sun.misc.Unsafe is able to break not only strong encapsulation but even Java's most foundational integrity mechanisms, mentioned earlier. For example, a library using Unsafe can access arrays without bounds checking, and can access an object that has been deallocated by the garbage collector; accordingly, a program utilizing Unsafe may have undefined behavior. Much of the same applies to programs which make use of native code via JNI or the "Linker" component of the Foreign Function & Memory API, although that undefined behavior is caused not by Java code but by native code.

These APIs mean that Java does not yet provide integrity by default. Invariants can be relied upon neither by people nor by the platform itself. In particular, security can only be achieved with a difficult, often infeasible, global analysis of the application and its dependencies, as a vulnerability in any direct or transitive dependency could potentially be exploited and turned into a gadget that circumvents any authorization check in the application. Additionally, application authors are unable to know whether one of their dependencies relies on internal implementation details of the JDK, making the application unable to easily upgrade a JDK version.

To attain our goal of integrity by default, we will gradually restrict these APIs and close all loopholes in a series of upcoming JEPs, ensuring that no library can assume superpowers without the application's consent. Libraries that rely on these APIs should spend the time remaining until they are restricted to prepare their users for any necessary changes.

Why Now?

An obvious question: Why has the Java Platform been progressing toward integrity by default over the past few years, putting obstacles in the path of some clever, occasionally-useful tricks, when applications managed fine without strong encapsulation for two decades?

The answer is that Java must adapt to changing circumstances and requirements:

In short: The evolution of the JDK caused serious migration issues, there was no practical mechanism that enabled robust security in the current landscape, and new requirements could not be met.

Despite the convenience that lack of integrity has offered to "superpowered" libraries, the situation is untenable. Strong encapsulation is the linchpin of the solutions. The effort to add strong encapsulation to Java began in the 2010's, but its importance is becoming clearer with every passing year, so the effort continues.

Conclusion