Analyzing Documentation Comments
The JDK Java compiler,
javac
, supports a variety of Java SE and JDK APIs
that together can be used to analyze Java programs, including API
documentation comments.
- The Java Compiler API, Language Model API,
and Annotation Processing API, all defined in the
java.compiler
module. - The Compiler Tree API, defined in the
jdk.compiler
module.
This note uses a series of simple examples to demonstrate how these APIs go together.
See Also: Processing Code
Introduction
The examples that follow demonstrate how to use the various APIs
supported by javac
to analyze source code.
The examples follow a common theme of scanning documentation
comments to find instances of URLs in the href
attribute of HTML a
elements, so that you could (for
example) check the links for validity. The theme of the examples is
deliberately simple and somewhat artificial, but the examples are
sufficient to serve to illustrate the underlying techniques for
analyzing code.
To scan documentation comments, the work can generally be split into three parts:
-
Configure
javac
, specifying the files to be read, any other options that may be required, and arrange to be notified at an appropriate point in the compilation pipeline. -
Scan the declarations in the source code, looking for those that might have documentation comments of interest.
-
Scan the documentation comments.
The examples can all be compiled and run with JDK 17.
Example 1: Analyzing Syntax Trees
The first example shows how you can
perform some simple analysis by just examining the basic syntax
trees created by javac
. To do this, you only need to
pass the source files to javac, typically without needing any
additional options that you might otherwise need to compile those
files. The limitation is that you cannot perform any analysis that
involves knowing what any of the names in the source code actually
refer to.
Configuring javac
This example uses an instance of
javax.tools.JavaCompiler
obtained by calling
ToolProvider.getSystemJavaCompiler
.
The primary arguments for the compiler are any options that may
be required and the files to be read. In this example, no options
are required, and an empty list is passed for the appropriate
argument. To set up the files to be read, the example uses the command-line arguments
for the example itself, converts them to
Path
objects, and uses
Files.walkFileTree
to find the source files in any
directories that may be given. Finally, it converts the resulting series of
Path
objects into the
file objects required by the compiler.
The result of calling JavaCompiler.getTask
is a
CompilationTask
which can be used perform a complete
compilation, but in this example, the result is cast down to the
JDK-specific class,
JavacTask
, to access some JDK-specific additional
functionality. First, the code registers a
TaskListener
to be invoked at various points in
the compilation lifecycle. In this example, the task listener will
call the locally defined
processFile
method when the compiler has finished
parsing each source file. Then, after obtaining some utility
objects from the JavacTask
object, the compiler is
instructed to parse the
source files. Subsequent activity happens within the callbacks to
the task listener, until all the files have been parsed and the
parse
method returns.
Analyzing Declarations
As a result of the task listener registered with the compiler,
and the subsequent call to parse the source files provided on the
command line, the processFile
method will be called for each file that has been parsed by the
compiler. This method uses a custom
TreeScanner
, called DeclScanner
, to
scan the declarations in
the source file that was just read, passing in a subsidiary scanner
that can be used to scan any documentation comments that may be
found in the file. When the scanners have completed their work,
processFile
prints out the URLs that were found in the
documentation comments for the declarations in the source file.
DeclScanner
overrides the
visit...
methods for declarations, to call
one of the forms of the processCurrentPath
method, depending on whether the declaration may contain modifiers.
In this example, if the declaration may have modifiers, the code
checks to see if either a public
or
protected
keyword is present, and ignores the
declaration if neither of those keywords are found. For field
declarations (visitVariable
) and method and
constructor declarations (visitExecutable
) the code
suppresses the normal action to scan the initializer or method body
by not calling the appropriate super
method.
This is because it is uncommon to use documentation comments inside
field initializers and method bodies, and even if they were
present, they would not be included in any API documentation
generated by the javadoc
tool.
In processCurrentPath()
, the code uses the instance
of the
DocTrees
utility class obtained from the original
JavacTask
object, to see if the declaration has a
corresponding documentation comment. If it does, it is scanned with
the LinkScanner
object created for this file.
Analyzing Documentation Comments
When the declaration scanner finds a declaration of interest
that contains a documentation comment, it scans the comment with a
custom
DocTreeScanner
called LinkScanner
.
Primarily, this scanner is looking to record URLs found in
href
attributes in HTML a
elements, and
since href
can be an attribute in other HTML elements
(like link
), the code is careful to filter the
elements that are searched.
Any URLs that are found are saved in a list, and reported by the
processFile
method when the scanners have finished
analyzing each file.
Demonstration
The example can be run passing any set of files on the command line, such as the following collection of trivial files which provide documentation comments in various different syntactic positions.
Demonstration Files
-
m
-
module-info.java
/** * Module m. * <a href="http://www.example.com/module/m">module m</a> */ module m { exports p1; }
-
p1
-
package-info.java
/** * Package p1. * <a href="http://www.example.com/package/p1">package p1</a> */ package p1;
-
P1C.java
package p1; /** * Class P1C. * <a href="http://www.example.com/class/P1C">class P1C</a> */ public class P1C { /** * Constructor P1C. * <a href="http://www.example.com/class/P1C/constructor">class P1C constructor</a> */ public P1C() { } }
-
-
p2
-
package-info.java
/** * Package p2. * <a href="http://www.example.com/package/p2">package p2</a> */ package p2;
-
P2C.java
package p2; /** * Class P2C. * <a href="http://www.example.com/class/P2C">class P2C</a> */ public class P2C { /** * Constructor P2C. * <a href="http://www.example.com/class/P2C/constructor">class P2C constructor</a> */ public P2C() { } }
-
-
The example is a single class and so can be run directly by the Java launcher, with a command such as the following:
$ java examples/p/Example1.java demo-files
The output from that command is as follows, showing the URLs found in the documentation comments in each file:
*** file demo-files/m/module-info.java
http://www.example.com/module/m
*** file demo-files/m/p1/P1C.java
http://www.example.com/class/P1C
http://www.example.com/class/P1C/constructor
*** file demo-files/m/p1/package-info.java
http://www.example.com/package/p1
*** file demo-files/m/p2/P2C.java
http://www.example.com/class/P2C
http://www.example.com/class/P2C/constructor
*** file demo-files/m/p2/package-info.java
http://www.example.com/package/p2
Example 2: Using the Language Model API
In the previous example, all documentation comments in all
public API were scanned, including the comments in both packages
p1
and p2
. However, it did not filter out
packages that were not exported from the enclosing module. There is
no way to do that directly from the parsed syntax trees, without
duplicating the semantic analysis is done by javac
.
So, to filter out non-exported packages, it is better to adapt the
example to leverage the information available from the Language
Model API.
This example shows how you can modify the previous example to leverage the Language Model API. There are three noteworthy differences:
- the call to
parse
is replaced by call toanalyze
, which instructs the compiler to analyze the source code, after (implicitly) parsing it, - a reference to the
Elements
utility object is obtained, which provides various utility support methods, and - the task listener is updated to filter out all source files containing a package declaration that references a package that is in a module but not exported from it.
The rest of the example is the same as before.
The example can again be run directly by the source launcher.
The output is the same as before, except that this time there is no
output for any source files in the non-exported package
p2
.
*** file demo-files/m/module-info.java
http://www.example.com/module/m
*** file demo-files/m/p1/P1C.java
http://www.example.com/class/P1C
http://www.example.com/class/P1C/constructor
*** file demo-files/m/p1/package-info.java
http://www.example.com/package/p1
Example 3: Using the Annotation Processing API
The preceding two examples used a task listener to be invoked at specific times during a compilation. There is also another way for user code to be invoked during a compilation, and that is by using an annotation processor. Despite its name, annotation processors can be used for more than just annotation processing, and can provide a way to be invoked at a specific point in the compilation, after the source files have been parsed and the primary declarations have been recognized, but before any analysis of the contents of any method bodies.
An annotation processor can also be specified on the main
javac
command line, if so desired, without having to
use the Compiler API.
This example shows how you can use an annotation processor instead of a task listener.
The noteworthy changes are:
-
An annotation processor is registered instead of a task listener. As in the preceding example, we filter out the contents of non-exported packages. The processor will be invoked with root elements corresponding to the source files specified on the command-line.
-
The
DeclScanner
class is rewritten to use the Language Model API, and to scan declarations in the form of elements instead of syntax trees, as was previously the case. In the context of this example, it is best to just subtype one of the "simple element visitor" series of classes, instead of one of the "element scanner" series of classes, because we only want to scan the enclosed elements for a type element and not for module and package elements. The primary work, to scan any documentation comments, is performed in the defaultAction method, used by default for all elements except type elements, which are handled byvisitType
. That method first callsdefaultAction
and then visits all the enclosed elements, which in this case will be all the members of the class or interface. -
The
-proc:only
compiler option is specified to instruct the compiler to stop after annotation processing has been completed. -
As of JDK 17, it is not convenient to determine the source file for any element, which includes the root elements passed to a processor. So, for simplicity in this example, the file names provided in the output of the prior examples are replaced here with the names of the root elements. (JDK 18 provides
Elements.getFileObjectOf
to obtain the file object for an element, which would make it easier for the example to revert to specifying the name of the file.)
The rest of the example is the same as before.
The example can again be run directly by the source launcher. The output is essentially the same as before, except for the order the files were examined, and the use of the names of the root elements instead of the file names.
*** root element p1.P1C
http://www.example.com/class/P1C
http://www.example.com/class/P1C/constructor
*** root element p1
http://www.example.com/package/p1
*** root element m
http://www.example.com/module/m