JEP draft: Integrity by Default
Authors | Ron Pressler, Alex Buckley |
Owner | Ron Pressler |
Type | Informational |
Scope | SE |
Status | Submitted |
Relates to | JEP 261: Module System |
JEP 260: Encapsulate Most Internal APIs | |
JEP 396: Strongly Encapsulate JDK Internals by Default | |
JEP 403: Strongly Encapsulate JDK Internals | |
JEP 451: Prepare to Disallow the Dynamic Loading of Agents | |
Reviewed by | Alan Bateman, Mark Reinhold |
Created | 2023/04/13 16:06 |
Updated | 2023/10/22 07:12 |
Issue | 8305968 |
Summary
Java developers expect that their code and data is protected against use that is unwanted or unwise. However, the Java Platform contains unsafe APIs that can undermine this expectation, damaging the correctness, security, and performance of applications. As Java continues to move forward, it is appropriate to restrict unsafe APIs so that libraries cannot use them unless the application as a whole allows it.
Motivation
What is integrity?
Integrity is a guarantee by the Java Platform that a property, once established, applies throughout the program. For example, the statement int[] x = new int[10];
establishes a property about the newly created array: it shall only be accessed at x[0]
through x[9]
. This property has integrity because the JVM guarantees that it applies throughout the program: accessing x[10]
will trigger an exception. Developers can rely on this property without having to analyze every line of code on the class path to confirm that it applies. In contrast, a similar statement in a C program establishes the same property but with no guarantee by the C runtime that it applies throughout the program; it has no integrity. To know that it applies throughout the program, the C developer will have to analyze every line of code in the program.
Integrity, therefore, enables local reasoning about a program's correctness. The local reasoning applies globally, whether the program is 100 lines or 100,000 lines.
We call a property that applies globally an invariant, whether it is guaranteed by the Java Platform, proven by the programmer, or simply holds by chance. We call an invariant an integrity invariant if it is guaranteed by the Java Platform. The property that an array is never accessed beyond its bounds is an integrity invariant; here are some others:
- A program's initial state is well defined, because the JVM guarantees that variables and arrays are initialized before use.
- A program never suffers from "use after free", because the JVM provides automatic memory management.
- A program cannot perform invalid operations on data, such as treating a
String
as aSocket
, because Java programs are type-safe. - A program that works with
File
objects never sees a file's path change suddenly, because thejava.io
API does not offer achdir
method. - Since JDK 20, a multi-threaded program never sees damaged objects, because Thread.stop() can no longer cause threads to stop so suddenly that their data becomes inconsistent.
These are all safety properties that prevent "bad things" from happening. As such, we only notice how much we depend on them when things go wrong. The local reasoning enabled by integrity is important not only for developers, but also for the Java Platform's own operation.
Integrity from encapsulation
Java developers routinely wish to establish invariants for their own classes and objects, and have those invariants apply throughout the program. We call such an invariant a domain invariant. The key to having a domain invariant apply throughout the program is encapsulation.
For example, suppose a developer wants to implement counters that are always even, never odd. Imagine if Java had no encapsulation, so that all fields and methods could be accessed from anywhere, as if everything was public
. A Counter
class might look like this:
/*public*/ final class Counter {
/*public*/ int x = 0;
/*public*/ int value() { return x; }
/*public*/ void incrementByTwo() { x += 2; }
/*public*/ void decrementByTwo() { x -= 2; }
}
Within the class, the developer can prove their domain invariant: x
has an even value initially, and is only modified in a way that preserves evenness, so every Counter
is even. Unfortunately, the domain invariant that "every Counter
is even" does not apply globally, because without encapsulation, x
could be set to an odd value from outside the class. The developer would have to analyze every line in the program to know whether every Counter
is even.
Encapsulation is what makes the domain invariant apply globally. The developer uses the private
keyword to protect data from intentional or unintentional modification by other parts of the program:
public final class Counter {
private int x = 0;
public int value() { return x; }
public void incrementByTwo() { x += 2; }
public void decrementByTwo() { x -= 2; }
}
The property that a private
field can only be modified by code in the same class is an integrity invariant, guaranteed by the Java compiler and the JVM to apply throughout the program. By declaring x
as private
, the developer makes the Java Platform guarantee that code outside Counter
cannot modify x
; this imbues the domain invariant with integrity, as if the Java Platform was built to ensure that every Counter
is even. Local reasoning about the domain invariant now applies globally: the developer can trust the evenness of every Counter
without having to analyze every line in the program.
In effect, encapsulation upgrades a domain invariant to an integrity invariant, backed by the Java Platform. This provides tremendous value:
-
Correctness The correctness of the program relies on the domain invariant that a
Counter
is always even. For example, an application may be tracking business activity where every purchase needs to match a sale, resulting in an even number of transactions. Using encapsulation to imbue the domain invariant with integrity means that correctness cannot be undermined. -
Maintainability Encapsulation protects code as it evolves. Developers assume that
private
fields and methods are implementation details, able to be safely changed without breaking clients. For example, developers assume that changing the signatures ofprivate
methods, or removingprivate
fields, does not impact clients. -
Scalability Encapsulation is a cornerstone of programming in the large because it provides the integrity that enables local reasoning about the behavior of classes and objects. Programs can then be constructed from independently-developed components that interact only through their public APIs. It is this ability that allows both individual Java programs and the entire Java ecosystem to scale as collections of independent, interoperating components.
-
Security Encapsulation is essential for constructing any kind of robust security. For example, suppose that a class in the JDK restricts a sensitive operation as follows:
if (isAuthorized()) doSensitiveOperation();
The restriction is robust only if we can guarantee that
doSensitiveOperation
is only ever invoked after a successfulisAuthorized
check. This invariant could be established by the enclosing class declaringdoSensitiveOperation
asprivate
. Because no other class can directly invokedoSensitiveOperation
, reviewers need only to ensure that all calls to it within the declaring class are preceded by anisAuthorized
check, and can ignore all other code in the program. -
Performance In the Java runtime, numerous optimizations assume that the conditions which hold at the time the optimization is made will hold forever. For example, the JVM can perform constant-folding optimizations when it knows, from inspection of a class, that the value of a
private
field will never change. Furthermore, a tool likejlink
could remove unusedprivate
methods at link time to reduce image size and class loading time. The guarantee that domain invariants will not change over time even opens the door to ahead-of-time compilation (AOT).
Undermining integrity
Encapsulation underpins correctness, maintainability, scalability, security, and performance, but there are four APIs in the JDK that can circumvent encapsulation:
- The
setAccessible
method performs "deep reflection", which is reflection over fields and methods without regard for encapsulation. It was introduced in JDK 1.2 to allow the JDK to serialize and deserialize objects, but in practice any code on the class path can use it to invoke theprivate
methods of any class, read and write theprivate
fields of any object, and even modifyfinal
fields. - The Java Native Interface (JNI) allows native code to interact with Java objects without regard for encapsulation. Similar to deep reflection, native code can assign to
private
and evenfinal
fields. - The Java Instrumentation API allows classes called agents to modify the bytecode of any method in any class at any time.
- The
sun.misc.Unsafe
class has methods that can read and write theprivate
fields of any object, similar to deep reflection.
We refer to these APIs as "unsafe" because they can break the integrity invariant that a private
field can only be modified by code in the same class. Because of these APIs, encapsulation is not actually strong enough to provide integrity for an application's domain invariants. The private
field in Counter
could be modified from outside the class via deep reflection, native code, or sun.misc.Unsafe
, resulting in something that should never happen: an odd Counter
. The public
methods could be redefined by an agent to increment the private
field by one instead of two, again resulting in an odd Counter
.
The lack of integrity destroys the ability to reason locally about the program's correctness. No domain invariant can be trusted. To prove that a domain invariant still applies globally, developers would have to analyze every JAR on the class path to rule out the use of deep reflection, native code, agents, and sun.misc.Unsafe
. Since this analysis is not practical, any code that relies on the evenness of Counter
objects for its own domain invariants may behave incorrectly, and any client of that code may behave incorrectly, and so on.
Even if a library uses an unsafe API with good intentions, it may harm an application that directly or indirectly uses the library. This happens when circumventing encapsulation breaks the application's domain invariants. For example, a library used to serialize and deserialize objects to and from JSON could deserialize Counter
objects by bypassing Counter
's public API and using deep reflection to set the value of its private field; if the JSON input contains an odd number, there will be an odd Counter
in the program. This problem is especially serious when the code that relies on a domain invariant is a security mechanism: An accidental vulnerability in a library that uses an unsafe API jeopardizes any security mechanism anywhere in the application, and could allow a remote attacker to manipulate input to the program in a way that undermines the security mechanism. The lack of integrity, therefore, allows a library to have a global yet hidden effect on the application.
Unsafe APIs can break other integrity invariants too. Memory safety is a class of integrity invariants that includes two mentioned earlier: an array is never accessed beyond its bounds, and a program never suffers from "use after free". Java developers have relied on memory safety for decades, but it can be violated by unsafe APIs, leading to undefined behavior and even JVM crashes.
- JNI allows the execution of native code that may violate memory safety. Native code can also produce a byte buffer that wraps arbitrary memory locations, which means any Java code that accesses the buffer is liable to cause undefined behavior.
- The Foreign Function & Memory API (FFM) allows the execution of native code that may violate memory safety. The FFM API also allows Java code to produce a memory segment that wraps arbitrary memory locations, which means any Java code that accesses the segment is liable to cause undefined behavior.
- The
sun.misc.Unsafe
class has methods that can read and write arbitrary memory locations, both on and off the JVM's heap. This means that arrays can be accessed beyond their bounds, and that memory used to store objects may be accessed long after the objects are deallocated by the garbage collector (that is, a use-after-free).
To ensure correctness, maintainability, scalability, security, and performance, the Java Platform must prevent encapsulation from being circumvented and memory safety from being violated. How can this be squared with the presence of unsafe APIs which are designed to be "superpowers" for library developers? The answer is that the Java Platform can provide integrity by default.
Integrity By Default
Integrity by default is a vision that the Java Platform has been progressing toward since JDK 9. It means selectively degrading the ability of unsafe APIs to undermine integrity, so that application developers can trust their invariants and be free of undefined behavior.
Integrity by default has three strands:
- Strong encapsulation of code in modules. By default, deep reflection cannot circumvent strong encapsulation.
- Unsafe APIs that are standard in the Java Platform are restricted. By default, Java code cannot violate memory safety by using JNI or FFM to call native code, nor circumvent encapsulation by using Java Instrumentation to redefine methods.
- Unsafe APIs that are non-standard are removed when standard replacement APIs become available. The replacement APIs are designed so that, by default, they cannot undermine integrity.
Strong encapsulation: the antidote to deep reflection
JDK 9 introduced modules to the Java language. A module is a set of packages designed to work together. Some of the packages are exported, which means their public elements can be used outside the module. The remaining packages are unexported, which means their public elements can only be used inside the module.
A module provides strong encapsulation, which means that deep reflection by code outside the module can access only the public elements of exported packages. That is, the setAccessible
method respects module boundaries. For example, if the public Counter
class lives in an exported package, then its private field cannot be modified by deep reflection initiated from code in a different module.
Integrity by default embraces the idea that the unit of local reasoning is the module. Every module is in charge of maintaining its own domain invariants, and integrity by default means that those domain invariants have integrity.
Restrictions on standard unsafe APIs
Most unsafe APIs – setAccessible
, JNI, FFM, and Java Instrumentation – continue to be supported by the Java Platform. While they are rarely used by application code directly, they are essential for a relatively small number of libraries whose core functionality cannot be implemented any other way. Examples include:
- Frameworks for unit testing and dependency injection (DI), which use deep reflection to access
private
fields and methods of application classes. - Serialization libraries, which use deep reflection to access
private
fields of application classes, and theprivate
fields of those fields, and so on. - Mocking libraries, which use Java Instrumentation to redefine methods of application classes.
- Native wrapper libraries, which use JNI to call
native
methods or FFM to invoke downcall method handles. - Application Performance Monitoring (APM) tools, which use agents to inject logging and performance counters into application code.
Using an unsafe API introduces the possibility that encapsulation will be circumvented or memory safety will be violated, losing integrity. Because integrity supports local reasoning, and because a loss of integrity has a global effect on the application, integrity by default enshrines the idea that a library developer cannot unilaterally decide to use unsafe APIs. In order for code to use unsafe APIs, the Java runtime as a whole must be configured to allow it. If the Java runtime is not suitably configured, then use of an unsafe API will cause an exception.
Configuration of the Java runtime is the responsibility of application developers, for two reasons. First, application developers, not library developers, answer to end users for the behavior of the application's code and that of its dependencies. Second, application developers already expect to configure the Java runtime with options that control global effects, such as thresholds for garbage collection and values for system properties.
There are three options that configure the Java runtime to allow use of unsafe APIs:
--add-opens
allows code in specified modules to usesetAccessible
on theprivate
elements of other modules. A related option, --add-exports, allows code in specified modules to accesspublic
elements of otherwise unexported packages.--enable-native-access
allows code in specified modules to use JNI and FFM by find and invoke native code.-javaagent
/-agentlib
allows an agent to use Java Instrumentation. A related option,-XX:+EnableDynamicAgentLoading
, allows tools to load an agent dynamically.
Application developers can express these options in several forms:
- Pass options directly to the
java
launcher in the script that starts the application. - Pass options indirectly to the
java
launcher by setting theJDK_JAVA_OPTIONS
environment variable. - Put options in an argument file that is passed to the
java
launcher (java @myapp
) by a script or an end user. - Put entries that correspond to options in the manifest of the application's executable JAR.
- Pass options to
jlink
when creating a runtime image that contains the application. - Pass options via the JNI Invocation API, if the application is started from native code.
All these forms have the advantage of configuring the Java runtime when it starts, enabling the JVM to infer which integrity invariants might be undermined and which optimizations should be enabled or disabled. In addition, these forms make it easy for application developers to audit the use of unsafe APIs and understand the risks posed to maintainability, security, and performance. If none of the forms are used, the application developer can be certain that neither the application's code nor that of its dependencies will circumvent encapsulation or violate memory safety.
Removal of non-standard unsafe APIs
The sun.misc.Unsafe
class has methods that perform a variety of low-level operations without any safety checks. Since JDK 9, the JDK has added standard APIs that offer safe replacements for sun.misc.Unsafe
's functionality. For example, low-level manipulation of objects in the JVM's heap can be done safely through the VarHandle
class, and manipulation of data in off-heap memory can be done safely through the MemorySegment
class. Accordingly, in a future JDK release, sun.misc.Unsafe
will be deprecated for removal.
Embracing integrity by default
Integrity by default means that integrity may be lost, but only with the consent of application developers. When application developers configure the Java runtime to allow use of unsafe APIs, they acquiesce to a possible loss of integrity when consuming test frameworks, serialization libraries, native wrappers, etc. This is usually acceptable because the libraries' functionality is indispensable. However, application developers could be relieved of having to configure the Java runtime if library developers embraced integrity by default.
- Dependency injection framework developers should ask application developers to grant access to the application's
private
fields and methods. One approach is to have application developers open the packages of their modules to the framework module, e.g.,opens com.example.app to framework;
Deep reflection can access every element in an open package, evenprivate
elements; in addition, a framework can transfer its access rights viaModule.addOpens
. Another approach is to have application developers create lookup objects and pass them to the framework, e.g.,framework.grantAccess(MethodHandles.lookup());
A lookup object grants access into theprivate
elements of whichever code created it; the framework can use the lookup object to perform deep reflection on application code without any packages being open. - Unit testing frameworks and mocking libraries should integrate with build tools such as Maven and Gradle to configure the Java runtime automatically. This would involve starting a test run or a mocking session with options that circumvent encapsulation
(--add-opens
,--add-exports
), patch modules (--patch-module
), and install agents (-javaagent
). - Serialization libraries have caused many security vulnerabilities by using deep reflection to access private fields of application classes. In general, it is a mistake for libraries to serialize and deserialize objects without cooperation from their classes. Objects such as strings, records, enums, and Java Collections are easy to serialize and deserialize because their classes provide
public
accessors and constructors. For other objects, libraries should specify protocols by which classes can expose their state during serialization and accept new state during deserialization. (Application developers may need to grant access to classes in non-exported packages, by opening packages or passing lookup objects.) Some classes already take responsibility for their serialization and deserialization by implementingjava.io.Serializable
; libraries can take advantage by invokingwriteObject
andreadObject
on the classes viasun.reflect.ReflectionFactory
, which is supported for this purpose. In the long term, the Java Platform will offer Better Serialization.
Beyond integrity in the Java Platform
If code reaches outside the Java runtime, it could smash the integrity of a class's domain invariants without circumventing encapsulation or violating memory safety. For example, it could alter the contents of a class
file in the file system before the class is loaded. However, a good principle in matters of integrity is that the integrity of components is best enforced by the infrastructure that provides them. The integrity of file system invariants, including access control, is the responsibility of the OS. Appropriate configuration of mechanisms in the OS, container, etc, should always be used to protect the integrity of the Java runtime (files, memory, etc) regardless of the measures taken by the Java Platform to protect the integrity of Java code.
Why now?
An obvious question: Why is the Java Platform adopting integrity by default, which adds overhead for some application developers, when applications and libraries managed fine without strong encapsulation or restrictions on unsafe APIs for two decades? The answer is that in recent years, both the JDK and the environment in which Java applications run have changed:
- Correctness: The Java Platform is able to provide integrity invariants for Java programs because they are enforced by native code inside the JVM, beyond the reach of unsafe APIs. However, more and more of the Java runtime is being written or rewritten in Java, such as the scheduler for virtual threads. This system-level code has domain invariants, just as application code does, and the integrity of its domain invariants – that is, their "upgrade" to integrity invariants – relies on encapsulation.
- Maintainability: In order to be able to add new features without drowning in maintenance, we need to remove obsolete packages from the JDK and refactor its implementation at will. Unfortunately, over time, the JDK had become a Big Ball of Mud and libraries had come to depend on clumps of its implementation that they assumed were stable. As a result, the ecosystem ossified around JDK 8 and faced serious difficulties migrating from JDK 8 to later releases. Continuing to evolve the JDK, let alone at a faster pace, would have created such difficulties with every release, forever. The choice was between stopping the evolution of Java, versus inflicting migration pain just once more by encapsulating JDK internals.
- Security: Java's primary security threats have shifted from untrusted code running in the client to remote attacks on servers, which made the Security Manager an ill-suited solution. The JDK needs a mechanism to allow the construction of robust security in layers above the JDK.
- Performance: There is a growing demand for optimization of startup time and image size which are important for deploying Java applications in modern environments. Such optimizations require that code does not change its meaning between build time and run time.
In short: The evolution of the JDK caused serious migration issues, there was no practical mechanism that enabled robust security in the current landscape, and new requirements could not be met. Despite the convenience that unsafe APIs with "superpowers" have offered to libraries, the lack of integrity is untenable. Strong encapsulation and the restriction of unsafe APIs are the solution.