Compilation Overview

The process of compiling a set of source files into a corresponding set of class files is not a simple one, but can be generally divided into three stages. Different parts of source files may proceed through the process at different rates, on an "as needed" basis.

javac flow

This process is handled by the JavaCompiler class.

  1. All the source files specified on the command line are read, parsed into syntax trees, and then all externally visible definitions are entered into the compiler's symbol tables.
  2. All appropriate annotation processors are called. If any annotation processors generate any new source or class files, the compilation is restarted, until no new files are created.
  3. Finally, the syntax trees created by the parser are analyzed and translated into class files. During the course of the analysis, references to additional classes may be found. The compiler will check the source and class path for these classes; if they are found on the source path, those files will be compiled as well, although they will not be subject to annotation processing.

Parse and Enter

Source files are processed for Unicode escapes and converted into a stream of tokens by the Scanner.

The token stream is read by the Parser, to create syntax trees, using a TreeMaker. Syntax trees are built from subtypes of JCTree which implement com.sun.source.Tree and its subtypes.

Each tree is passed to Enter, which enters symbols for all the definitions encountered into the symbols. This has to done before analysis of trees which might reference those symbols. The output from this phase is a To Do list, containing trees that need to be analyzed and have class files generated.

Enter consists of phases; classes migrate from one phase to the next via queues.

class enter Enter.uncompleted MemberEnter (1)
MemberEnter.halfcompleted MemberEnter (2)
To Do (Attribute and Generate)
  1. In the first phase, all class symbols are entered into their enclosing scope, descending recursively down the tree for classes which are members of other classes. The class symbols are given a MemberEnter object as completer.

    In addition, if any package-info.java files are found, containing package annotations, then the top level tree node for the file is put on the To Do list as well.

  2. In the second phase, classes are completed using MemberEnter.complete(). Completion might occur on demand, but any classes that are not completed that way will be eventually completed by processing the uncompleted queue. Completion entails

    • (1) determination of a class's parameters, supertype and interfaces.
    • (2) entering all symbols defined in the class into its scope, with the exception of class symbols which have been entered in phase 1.
    (2) depends on (1) having been completed for a class and all its superclasses and enclosing classes. That's why, after doing (1), we put classes in a halfcompleted queue. Only when we have performed (1) for a class and all its superclasses and enclosing classes, we proceed to (2).
  3. After all symbols have been entered, any annotations that were encountered on those symbols will be analyzed and validated.

Whereas the first phase is organized as a sweep through all compiled syntax trees, the second phase is on demand. Members of a class are entered when the contents of a class are first accessed. This is accomplished by installing completer objects in class symbols for compiled classes which invoke the MemberEnter phase for the corresponding class tree.

Annotation Processing

This part of the process is handled by JavacProcessingEnvironment.

Conceptually, annotation processing is a preliminary step before compilation. This preliminary step consists of a series of rounds, each to parse and enter source files, and then to determine and invoke any appropriate annotation processors. After an initial round, subsequent rounds will be performed if any of the annotation processors that are called generate any new source files or class files that need to be part of the eventual compilation. Finally, when all necessary rounds have been completed, the actual compilation is performed.

Conceptual process for annotation processing

In practice, the need to call any annotation processors may not be known until after the files to be compiled have been parsed and the declarations they contain have been determined. Therefore, to avoid parsing and entering the source files unnecessarily in the case where no annotation processing is performed, JavacProcessingEnvironment executes somewhat "out of phase" with the conceptual model, while still fulfilling the conceptual requirement that annotation processing as a whole happens before the actual compilation.

Annotation processing with JavacProcessingEnvironment

JavacProcessingEnvironment is invoked after the files to be compiled have already been parsed and entered. It determines whether any annotation processors need to be loaded and called for any of the files being compiled. Normally, if any errors occur during the overall compilation process, the process is stopped at the next convenient point. However, an exception is made if any missing symbols were detected during the Enter phase, because definitions for these symbols may be generated as a result of calling annotation processors.

If annotation processors are to be run, they are loaded and run in a separate class loader.

When the annotation processors have been run, JavacProcessingEnvironment determines if another round of annotation processing is required. If so, it creates a new JavaCompiler object, reads any newly generated source files that need to be parsed, and reuses any previously parsed syntax trees. All these trees are entered into the symbol tables for this new compiler instance, and annotation processors are called as necessary. This step is repeated until no more rounds of annotation processing are required.

Finally, JavacProcessingEnvironment returns the JavaCompiler object to be used for the remainder of the compilation. This will either be the original instance used to parse and enter the initial set of files, or it will be the latest instance created by JavacProcessingEnvironment used to start the final round of compilation.

Analyse and Generate

Once all the files specified on the command line have been parsed and entered into the compiler's symbol tables, and after any annotation processing has occurred, JavaCompiler can proceed to analyse the syntax trees that were parsed with a view to generating the corresponding class files.

While analysing the tree, references may be found to classes which are required for successful compilation, but which were not explicitly specified for compilation. Depending on the compilation options, the source path and class path will be searched for such class definitions. If the definition is found in a class file, the class file will be read to determine the definitions in that class; if the definition is found in a source file, the source file will be automatically parsed, entered and put on the To Do list. This is done by registering JavaCompiler as an implementation of Attr.SourceCompleter.

The work to analyse the tree and generate class files is performed by a series of visitors that process the entries on the compiler's To Do list. There is no requirement that these visitors should be applied in step for all source files, and indeed, memory issues would make that extremely undesireable. The only requirement is that each entry on the To Do list should should eventually be processed by each of these visitors, unless the compilation is terminated early because of errors.

Attr

The top level classes are "attributed", using Attr, meaning that names, expressions and other elements within the syntax tree are resolved and associated with the corresponding types and symbols. Many semantic errors may be detected here, either by Attr, or by Check.

Flow

If there are no errors so far, flow analysis will be done for the class, using Flow. Flow analysis is used to check for definite assignment to variables, and unreachable statements, which may result in additional errors.

TransTypes

Code involving generic types is translated to code without generic types, using TransTypes.

Lower

"Syntactic sugar" is processed, using Lower, which rewrites syntax trees to eliminate particular types of subtree by substituting equivalent, simpler trees. This takes care of nested and inner classes, class literals, assertions, foreach loops, and so on. For each class that is processed, Lower returns a list of trees for the translated class and all its translated nested and inner classes.

Although Lower normally processes top level classes, it will also process top level trees for package-info.java. For such trees, Lower will create a synthetic class to contain any annotations for the package.

Gen

Code for methods is generated by Gen, which creates the Code attributes containing the bytecodes needed by a JVM to execute the method. If that step is successful, the class is written out by ClassWriter.

Once a class has been written out as a class file, much of its syntax tree and the bytecodes that were generated will no longer be required. To save memory, references to these parts of the tree and symbols will be nulled out, to allow the memory to be recovered by the garbage collector.