JEP 201: Modular Source Code

AuthorMark Reinhold
OwnerAlan Bateman
TypeFeature
ScopeImplementation
StatusClosed / Delivered
Release9
Discussionjigsaw dash dev at openjdk dot java dot net
EffortL
DurationL
BlocksJEP 200: The Modular JDK
Relates toJEP 220: Modular Run-Time Images
Reviewed byAlan Bateman, Alex Buckley, Mandy Chung, Paul Sandoz
Endorsed byBrian Goetz
Created2014/07/22 14:08
Updated2020/12/07 14:29
Issue8051619

Summary

Reorganize the JDK source code into modules, enhance the build system to compile modules, and enforce module boundaries at build time.

Non-Goals

This JEP does not change the structure of the JRE and JDK binary images, nor does it introduce a module system. That work is covered by the related JEPs 220 and 261.

This JEP defines a new source-code layout for the JDK. This layout may be used outside of the JDK, but it is not a goal of this JEP to design a broadly-accepted universal modular source-code layout.

Motivation

Project Jigsaw aims to design and implement a standard module system for the Java SE Platform and to apply that system to the Platform itself, and to the JDK. Its primary goals are to make implementations of the Platform more easily scalable down to small devices, improve security and maintainability, enable improved application performance, and provide developers with better tools for programming in the large.

The motivations to reorganize the source code include:

  1. Give JDK developers the opportunity to become familiar with the modular structure of the system;

  2. Preserve that structure going forward by enforcing module boundaries in the build, even prior to the introduction of a module system; and

  3. Enable development of Project Jigsaw to proceed without always having to "shuffle" the present non-modular source code into modular form.

Description

Current scheme

Most of the JDK source code is today organized, roughly, in a scheme that dates back to 1997. In abbreviated form:

src/{share,$OS}/{classes,native}/$PACKAGE/*.{java,c,h,cpp,hpp}

where:

To take a simple example, the source code for the java.lang.Object class in the jdk repository resides in two files, one in Java and the other in C:

src/share/classes/java/lang/Object.java
          native/java/lang/Object.c

For a less trivial example, the source code for the package-private java.lang.ProcessImpl and ProcessEnvironment classes is operating-system-specific; for Unix-like systems it resides in three files:

src/solaris/classes/java/lang/ProcessImpl.java
                              ProcessEnvironment.java
            native/java/lang/ProcessEnvironment_md.c

(Yes, the second-level directory is named solaris even though this code is relevant to all Unix derivatives; more on this below.)

There are a handful of directories under src/{share,$OS} that don't match the current structure, including:

Directory                     Content
--------------------------    --------------------------
src/{share,$OS}/back          JDWP back end
                bin           Java launcher
                instrument    Instrumentation support
                javavm        Exported JVM include files
                lib           Files for $JAVA_HOME/lib
                transport     JDWP transports

New scheme

The modularization of the JDK presents a rare opportunity to completely restructure the source code in order to make it easier to maintain. We implement the following scheme in every repository in the JDK forest except for hotspot. In abbreviated form:

src/$MODULE/{share,$OS}/classes/$PACKAGE/*.java
                        native/include/*.{h,hpp}
                               $LIBRARY/*.{c,cpp}
                        conf/*
                        legal/*

where:

To recast the previous examples, the source code for the java.lang.Object class is laid out as follows:

src/java.base/share/classes/java/lang/Object.java
                    native/libjava/Object.c

The source code for the package-private java.lang.ProcessImpl and ProcessEnvironment classes is laid out this way:

src/java.base/unix/classes/java/lang/ProcessImpl.java
                                     ProcessEnvironment.java
                   native/libjava/ProcessEnvironment_md.c

(We took the opportunity here, finally, to rename the solaris directory to unix.)

The content of the directories currently under src/{share,$OS} that don't match the current structure is now in appropriate modules:

Directory                     Module
--------------------------    --------------------------
src/{share,$OS}/back          jdk.jdwp.agent
                bin           java.base
                instrument    java.instrument
                javavm        java.base
                lib           $MODULE/{share,$OS}/conf
                transport     jdk.jdwp.agent

Files in the current lib directory that are not intended to be edited by end users are now resource files.

Build-system changes

The build system now compiles one module at a time rather than one repository at a time, and it compiles modules according to a reverse topological sort of the module graph. Modules that do not depend on each other, directly or indirectly, are compiled concurrently when possible.

A side benefit of compiling modules rather than repositories is that code in the corba, jaxp, and jaxws repositories can make use of new Java language features and APIs. This was previously forbidden, since those repositories were compiled before the jdk repository.

The compiled classes in an intermediate (i.e., non-image) build are divided into modules. Where today we have:

jdk/classes/*.class

the revised build system produces:

jdk/modules/$MODULE/*.class

The structure of image builds, as noted, does not change; there are very minor differences in their content.

Module boundaries are enforced at build time, insofar as possible, by the build system. If a module boundary is violated then the build will fail.

Alternatives

There are numerous other possible source-layout schemes, including:

  1. Keep {share,$OS} at the top, with a modules directory to contain module class files:

    src/{share,$OS}/modules/$MODULE/$PACKAGE/*.java
                    native/include/*.{h,hpp}
                           $LIBRARY/*.{c,cpp}
                    conf/*
  2. Put everything under the appropriate $MODULE directory, but keep {share,$OS} at the top:

    src/{share,$OS}/$MODULE/classes/$PACKAGE/*.java
                            native/include/*.{h,hpp}
                                   $LIBRARY/*.{c,cpp}
                            conf/*
  3. Push {share,$OS} down into the $MODULE directories, as in the present proposal, but remove the intermediate classes directory and prefix the names of the native and conf directories with an underscore, all so as to simplify the common case of pure Java modules:

    src/$MODULE/{share,$OS}/$PACKAGE/*.java
                            _native/include/*.{h,hpp}
                                    $LIBRARY/*.{c,cpp}
                            _conf/*
  4. A variant of scheme 3, but with {share,$OS} at the top:

    src/{share,$OS}/$MODULE/$PACKAGE/*.java
                            _native/include/*.{h,hpp}
                                    $LIBRARY/*.{c,cpp}
                            _conf/*
  5. Another variant of scheme 3, pushing {share,$OS} deeper down so as to further simplify the case of pure Java modules with no $OS-specific code:

    src/$MODULE/$PACKAGE/*.java
                _native/include/*.{h,hpp}
                        $LIBRARY/*.{c,cpp}
                _conf/*
                _$OS/$PACKAGE/*.java
                    _native/include/*.{h,hpp}
                            $LIBRARY/*.{c,cpp}
                    _conf/*

We rejected the schemes involving underscores (3–5) as too unfamiliar and difficult to navigate. We prefer the present proposal over schemes 1 and 2 because it entails the least change from the current scheme while placing all of the source code for a module under a single directory. Tools and scripts that depend upon the current scheme must be revised, but at least for Java source code the structure underneath each $MODULE directory is the same as before.

Additional issues which we considered:

Testing

As stated, this JEP does not change the structure of the JRE and JDK binary images, and makes only minor changes to the content. We therefore validated this change by comparing images built with it against images built without it, and running tests to validate the actual minor changes.

Risks and Assumptions

We assumed that Mercurial would be able to handle the massive number of file-rename operations that would be necessary to implement this change, and to preserve all historical information in the process. Early testing showed Mercurial to be capable of this, but there is still a minor risk that the relationships between the new and old locations of some files were not properly recorded. In that case the history of the file in its old location will still be in the repository; it will just be more difficult to find.

It is impossible to apply a patch created against a repository using the old scheme directly to a repository using the new scheme, and vice versa. To mitigate this we developed a script to translate the file names in a patch from their old locations to their new locations.

Dependences

This JEP is the second of several JEPs for Project Jigsaw. It incorporates the definition of the modular structure of the JDK from JEP 200, but it does not explicitly depend upon that JEP.