Analyzing Documentation Comments

Jonathan Gibbons

The JDK Java compiler, javac, supports a variety of Java SE and JDK APIs that together can be used to analyze Java programs, including API documentation comments.

This note uses a series of simple examples to demonstrate how these APIs go together.

See Also: Processing Code


The examples that follow demonstrate how to use the various APIs supported by javac to analyze source code.

The examples follow a common theme of scanning documentation comments to find instances of URLs in the href attribute of HTML a elements, so that you could (for example) check the links for validity. The theme of the examples is deliberately simple and somewhat artificial, but the examples are sufficient to serve to illustrate the underlying techniques for analyzing code.

To scan documentation comments, the work can generally be split into three parts:

  1. Configure javac, specifying the files to be read, any other options that may be required, and arrange to be notified at an appropriate point in the compilation pipeline.

  2. Scan the declarations in the source code, looking for those that might have documentation comments of interest.

  3. Scan the documentation comments.

The examples can all be compiled and run with JDK 17.

Example 1: Analyzing Syntax Trees

The first example shows how you can perform some simple analysis by just examining the basic syntax trees created by javac. To do this, you only need to pass the source files to javac, typically without needing any additional options that you might otherwise need to compile those files. The limitation is that you cannot perform any analysis that involves knowing what any of the names in the source code actually refer to.

Configuring javac

This example uses an instance of obtained by calling ToolProvider.getSystemJavaCompiler.

The primary arguments for the compiler are any options that may be required and the files to be read. In this example, no options are required, and an empty list is passed for the appropriate argument. To set up the files to be read, the example uses the command-line arguments for the example itself, converts them to Path objects, and uses Files.walkFileTree to find the source files in any directories that may be given. Finally, it converts the resulting series of Path objects into the file objects required by the compiler.

The result of calling JavaCompiler.getTask is a CompilationTask which can be used perform a complete compilation, but in this example, the result is cast down to the JDK-specific class, JavacTask, to access some JDK-specific additional functionality. First, the code registers a TaskListener to be invoked at various points in the compilation lifecycle. In this example, the task listener will call the locally defined processFile method when the compiler has finished parsing each source file. Then, after obtaining some utility objects from the JavacTask object, the compiler is instructed to parse the source files. Subsequent activity happens within the callbacks to the task listener, until all the files have been parsed and the parse method returns.

Analyzing Declarations

As a result of the task listener registered with the compiler, and the subsequent call to parse the source files provided on the command line, the processFile method will be called for each file that has been parsed by the compiler. This method uses a custom TreeScanner, called DeclScanner, to scan the declarations in the source file that was just read, passing in a subsidiary scanner that can be used to scan any documentation comments that may be found in the file. When the scanners have completed their work, processFile prints out the URLs that were found in the documentation comments for the declarations in the source file.

DeclScanner overrides the visit... methods for declarations, to call one of the forms of the processCurrentPath method, depending on whether the declaration may contain modifiers. In this example, if the declaration may have modifiers, the code checks to see if either a public or protected keyword is present, and ignores the declaration if neither of those keywords are found. For field declarations (visitVariable) and method and constructor declarations (visitExecutable) the code suppresses the normal action to scan the initializer or method body by not calling the appropriate super method. This is because it is uncommon to use documentation comments inside field initializers and method bodies, and even if they were present, they would not be included in any API documentation generated by the javadoc tool.

In processCurrentPath(), the code uses the instance of the DocTrees utility class obtained from the original JavacTask object, to see if the declaration has a corresponding documentation comment. If it does, it is scanned with the LinkScanner object created for this file.

Analyzing Documentation Comments

When the declaration scanner finds a declaration of interest that contains a documentation comment, it scans the comment with a custom DocTreeScanner called LinkScanner. Primarily, this scanner is looking to record URLs found in href attributes in HTML a elements, and since href can be an attribute in other HTML elements (like link), the code is careful to filter the elements that are searched.

Any URLs that are found are saved in a list, and reported by the processFile method when the scanners have finished analyzing each file.


The example can be run passing any set of files on the command line, such as the following collection of trivial files which provide documentation comments in various different syntactic positions.

Demonstration Files

The example is a single class and so can be run directly by the Java launcher, with a command such as the following:

$ java examples/p/ demo-files

The output from that command is as follows, showing the URLs found in the documentation comments in each file:

*** file demo-files/m/
*** file demo-files/m/p1/
*** file demo-files/m/p1/
*** file demo-files/m/p2/
*** file demo-files/m/p2/

Example 2: Using the Language Model API

In the previous example, all documentation comments in all public API were scanned, including the comments in both packages p1 and p2. However, it did not filter out packages that were not exported from the enclosing module. There is no way to do that directly from the parsed syntax trees, without duplicating the semantic analysis is done by javac. So, to filter out non-exported packages, it is better to adapt the example to leverage the information available from the Language Model API.

This example shows how you can modify the previous example to leverage the Language Model API. There are three noteworthy differences:

The rest of the example is the same as before.

The example can again be run directly by the source launcher. The output is the same as before, except that this time there is no output for any source files in the non-exported package p2.

*** file demo-files/m/
*** file demo-files/m/p1/
*** file demo-files/m/p1/

Example 3: Using the Annotation Processing API

The preceding two examples used a task listener to be invoked at specific times during a compilation. There is also another way for user code to be invoked during a compilation, and that is by using an annotation processor. Despite its name, annotation processors can be used for more than just annotation processing, and can provide a way to be invoked at a specific point in the compilation, after the source files have been parsed and the primary declarations have been recognized, but before any analysis of the contents of any method bodies.

An annotation processor can also be specified on the main javac command line, if so desired, without having to use the Compiler API.

This example shows how you can use an annotation processor instead of a task listener.

The noteworthy changes are:

The rest of the example is the same as before.

The example can again be run directly by the source launcher. The output is essentially the same as before, except for the order the files were examined, and the use of the names of the root elements instead of the file names.

*** root element p1.P1C
*** root element p1
*** root element m