Analyzing Documentation Comments

Jonathan Gibbons

The JDK Java compiler, javac, supports a variety of Java SE and JDK APIs that together can be used to analyze Java programs, including API documentation comments.

The Java Compiler API, Language Model API, and Annotation Processing API, all defined in the java.compiler module.
The Compiler Tree API, defined in the jdk.compiler module.

This note uses a series of simple examples to demonstrate how these APIs go together.

See Also: Processing Code

Introduction

The examples that follow demonstrate how to use the various APIs supported by javac to analyze source code.

The examples follow a common theme of scanning documentation comments to find instances of URLs in the href attribute of HTML a elements, so that you could (for example) check the links for validity. The theme of the examples is deliberately simple and somewhat artificial, but the examples are sufficient to serve to illustrate the underlying techniques for analyzing code.

To scan documentation comments, the work can generally be split into three parts:

Configure javac, specifying the files to be read, any other options that may be required, and arrange to be notified at an appropriate point in the compilation pipeline.
Scan the declarations in the source code, looking for those that might have documentation comments of interest.
Scan the documentation comments.

The examples can all be compiled and run with JDK 17.

Example 1: Analyzing Syntax Trees

The first example shows how you can perform some simple analysis by just examining the basic syntax trees created by javac. To do this, you only need to pass the source files to javac, typically without needing any additional options that you might otherwise need to compile those files. The limitation is that you cannot perform any analysis that involves knowing what any of the names in the source code actually refer to.

Configuring `javac`

This example uses an instance of javax.tools.JavaCompiler obtained by calling ToolProvider.getSystemJavaCompiler.

The primary arguments for the compiler are any options that may be required and the files to be read. In this example, no options are required, and an empty list is passed for the appropriate argument. To set up the files to be read, the example uses the command-line arguments for the example itself, converts them to Path objects, and uses Files.walkFileTree to find the source files in any directories that may be given. Finally, it converts the resulting series of Path objects into the file objects required by the compiler.

The result of calling JavaCompiler.getTask is a CompilationTask which can be used perform a complete compilation, but in this example, the result is cast down to the JDK-specific class, JavacTask, to access some JDK-specific additional functionality. First, the code registers a TaskListener to be invoked at various points in the compilation lifecycle. In this example, the task listener will call the locally defined processFile method when the compiler has finished parsing each source file. Then, after obtaining some utility objects from the JavacTask object, the compiler is instructed to parse the source files. Subsequent activity happens within the callbacks to the task listener, until all the files have been parsed and the parse method returns.

Analyzing Declarations

As a result of the task listener registered with the compiler, and the subsequent call to parse the source files provided on the command line, the processFile method will be called for each file that has been parsed by the compiler. This method uses a custom TreeScanner, called DeclScanner, to scan the declarations in the source file that was just read, passing in a subsidiary scanner that can be used to scan any documentation comments that may be found in the file. When the scanners have completed their work, processFile prints out the URLs that were found in the documentation comments for the declarations in the source file.

DeclScanner overrides the visit... methods for declarations, to call one of the forms of the processCurrentPath method, depending on whether the declaration may contain modifiers. In this example, if the declaration may have modifiers, the code checks to see if either a public or protected keyword is present, and ignores the declaration if neither of those keywords are found. For field declarations (visitVariable) and method and constructor declarations (visitExecutable) the code suppresses the normal action to scan the initializer or method body by not calling the appropriate super method. This is because it is uncommon to use documentation comments inside field initializers and method bodies, and even if they were present, they would not be included in any API documentation generated by the javadoc tool.

In processCurrentPath(), the code uses the instance of the DocTrees utility class obtained from the original JavacTask object, to see if the declaration has a corresponding documentation comment. If it does, it is scanned with the LinkScanner object created for this file.

Analyzing Documentation Comments

When the declaration scanner finds a declaration of interest that contains a documentation comment, it scans the comment with a custom DocTreeScanner called LinkScanner. Primarily, this scanner is looking to record URLs found in href attributes in HTML a elements, and since href can be an attribute in other HTML elements (like link), the code is careful to filter the elements that are searched.

Any URLs that are found are saved in a list, and reported by the processFile method when the scanners have finished analyzing each file.

Demonstration

The example can be run passing any set of files on the command line, such as the following collection of trivial files which provide documentation comments in various different syntactic positions.

Demonstration Files

module-info.java

/**
 * Module m.
 * <a href="http://www.example.com/module/m">module m</a>
 */
module m {
    exports p1;
}

package-info.java

/**
 * Package p1.
 * <a href="http://www.example.com/package/p1">package p1</a>
 */
package p1;

P1C.java

package p1;

/**
 * Class P1C.
 * <a href="http://www.example.com/class/P1C">class P1C</a>
 */
public class P1C {
    /**
     * Constructor P1C.
     * <a href="http://www.example.com/class/P1C/constructor">class P1C constructor</a>
     */
    public P1C() { }
}

package-info.java

/**
 * Package p2.
 * <a href="http://www.example.com/package/p2">package p2</a>
 */
package p2;

P2C.java

package p2;

/**
 * Class P2C.
 * <a href="http://www.example.com/class/P2C">class P2C</a>
 */
public class P2C {
    /**
     * Constructor P2C.
     * <a href="http://www.example.com/class/P2C/constructor">class P2C constructor</a>
     */
    public P2C() { }
}

The example is a single class and so can be run directly by the Java launcher, with a command such as the following:

$ java examples/p/Example1.java demo-files

The output from that command is as follows, showing the URLs found in the documentation comments in each file:

*** file demo-files/m/module-info.java
   http://www.example.com/module/m
*** file demo-files/m/p1/P1C.java
   http://www.example.com/class/P1C
   http://www.example.com/class/P1C/constructor
*** file demo-files/m/p1/package-info.java
   http://www.example.com/package/p1
*** file demo-files/m/p2/P2C.java
   http://www.example.com/class/P2C
   http://www.example.com/class/P2C/constructor
*** file demo-files/m/p2/package-info.java
   http://www.example.com/package/p2

Example 2: Using the Language Model API

In the previous example, all documentation comments in all public API were scanned, including the comments in both packages p1 and p2. However, it did not filter out packages that were not exported from the enclosing module. There is no way to do that directly from the parsed syntax trees, without duplicating the semantic analysis is done by javac. So, to filter out non-exported packages, it is better to adapt the example to leverage the information available from the Language Model API.

This example shows how you can modify the previous example to leverage the Language Model API. There are three noteworthy differences:

the call to parse is replaced by call to analyze, which instructs the compiler to analyze the source code, after (implicitly) parsing it,
a reference to the Elements utility object is obtained, which provides various utility support methods, and
the task listener is updated to filter out all source files containing a package declaration that references a package that is in a module but not exported from it.

The rest of the example is the same as before.

The example can again be run directly by the source launcher. The output is the same as before, except that this time there is no output for any source files in the non-exported package p2.

*** file demo-files/m/module-info.java
   http://www.example.com/module/m
*** file demo-files/m/p1/P1C.java
   http://www.example.com/class/P1C
   http://www.example.com/class/P1C/constructor
*** file demo-files/m/p1/package-info.java
   http://www.example.com/package/p1

Example 3: Using the Annotation Processing API

The preceding two examples used a task listener to be invoked at specific times during a compilation. There is also another way for user code to be invoked during a compilation, and that is by using an annotation processor. Despite its name, annotation processors can be used for more than just annotation processing, and can provide a way to be invoked at a specific point in the compilation, after the source files have been parsed and the primary declarations have been recognized, but before any analysis of the contents of any method bodies.

An annotation processor can also be specified on the main javac command line, if so desired, without having to use the Compiler API.

This example shows how you can use an annotation processor instead of a task listener.

The noteworthy changes are:

An annotation processor is registered instead of a task listener. As in the preceding example, we filter out the contents of non-exported packages. The processor will be invoked with root elements corresponding to the source files specified on the command-line.
The DeclScanner class is rewritten to use the Language Model API, and to scan declarations in the form of elements instead of syntax trees, as was previously the case. In the context of this example, it is best to just subtype one of the "simple element visitor" series of classes, instead of one of the "element scanner" series of classes, because we only want to scan the enclosed elements for a type element and not for module and package elements. The primary work, to scan any documentation comments, is performed in the defaultAction method, used by default for all elements except type elements, which are handled by visitType. That method first calls defaultAction and then visits all the enclosed elements, which in this case will be all the members of the class or interface.
The -proc:only compiler option is specified to instruct the compiler to stop after annotation processing has been completed.
As of JDK 17, it is not convenient to determine the source file for any element, which includes the root elements passed to a processor. So, for simplicity in this example, the file names provided in the output of the prior examples are replaced here with the names of the root elements. (JDK 18 provides Elements.getFileObjectOf to obtain the file object for an element, which would make it easier for the example to revert to specifying the name of the file.)

The rest of the example is the same as before.

The example can again be run directly by the source launcher. The output is essentially the same as before, except for the order the files were examined, and the use of the names of the root elements instead of the file names.

*** root element p1.P1C
   http://www.example.com/class/P1C
   http://www.example.com/class/P1C/constructor
*** root element p1
   http://www.example.com/package/p1
*** root element m
   http://www.example.com/module/m

Introduction

Example 1: Analyzing Syntax Trees

Configuring javac

Analyzing Declarations

Analyzing Documentation Comments

Demonstration

Example 2: Using the Language Model API

Example 3: Using the Annotation Processing API

Configuring `javac`