JEP draft: Support Markdown in Documentation Comments
Author | jjg |
Owner | Jonathan Gibbons |
Type | Feature |
Scope | JDK |
Status | Submitted |
Component | tools / javadoc(tool) |
Reviewed by | Ron Pressler |
Created | 2023/09/11 17:45 |
Updated | 2023/11/13 19:44 |
Issue | 8316039 |
JEP TBD: Markdown Documentation Comments
Summary
Enable JavaDoc documentation comments to be written in Markdown rather
than solely in a mixture of HTML and JavaDoc @
-tags.
Goals
-
Make API documentation comments easier to write and to read in source form by introducing the ability to use Markdown syntax in documentation comments, alongside HTML elements and JavaDoc tags.
-
Do not adversely affect the interpretation of any existing documentation comments.
-
Extend the Compiler Tree API to facilitate other tools that analyze documentation comments to also be able to handle Markdown content in those comments.
Non-Goals
- It is not a goal to provide any automated conversion of existing documentation comments to use Markdown syntax.
Motivation
Documentation comments are stylized comments appearing in source code, near to the declarations that they serve to document. Documentation comments in Java source code use a combination of HTML and custom JavaDoc tags to mark up the text.
The choice of HTML for a markup language was reasonable in 1995. HTML is powerful, standardized, and was very popular at the time. But while it is no less popular today as a markup language consumed by web browsers, in the years since 1995 HTML has become much less popular as markup that is manually produced by humans because it is rather tedious to write and hard to read in its raw text form. These days it is more commonly generated from some other markup that is more suitable for humans. Because HTML is tedious to write, nicely-formatted documentation comments are also tedious to write, and even more tedious since many new programmers are not fluent in HTML due to its decline as a human-produced format.
Inline JavaDoc tags — such as {@link}
and {@code}
— are also rather
cumbersome and are even less familiar to programmers, often requiring the author
to consult the documentation for their usage. In a recent analysis of the
documentation comments in JDK source code, over 95% of the uses of inline tags
were for code fragments and links to elsewhere in the documentation, suggesting
that simpler forms of these constructs would be welcome.
Markdown is a popular documentation format that allows one to write using an easy-to-read, easy-to-write plain text format, which can easily be transformed to HTML. Documentation comments are typically not complicated structured documents, and for the constructs that typically appear in documentation comments, such as paragraphs, lists, styled text, and links, Markdown provides simpler forms than the corresponding forms in HTML. For those constructs that are not directly supported in Markdown, Markdown also allows the use of HTML as well. This makes it easier to read and write documentation comments in source code, while retaining the ability to generate the same sort of generated API documentation as before.
Introducing the ability to use Markdown in documentation comments can bring together the best of both worlds, allowing for concise syntax of the most common constructs, reducing the need for HTML markup and JavaDoc tags, while retaining the ability to support specialized tags for features that are not directly available in Markdown.
Description
As an example of the use of Markdown in a documentation comment, consider the comment for java.lang.Object.hashCode.
/**
* Returns a hash code value for the object. This method is
* supported for the benefit of hash tables such as those provided by
* {@link java.util.HashMap}.
* <p>
* The general contract of {@code hashCode} is:
* <ul>
* <li>Whenever it is invoked on the same object more than once during
* an execution of a Java application, the {@code hashCode} method
* must consistently return the same integer, provided no information
* used in {@code equals} comparisons on the object is modified.
* This integer need not remain consistent from one execution of an
* application to another execution of the same application.
* <li>If two objects are equal according to the {@link
* #equals(Object) equals} method, then calling the {@code
* hashCode} method on each of the two objects must produce the
* same integer result.
* <li>It is <em>not</em> required that if two objects are unequal
* according to the {@link #equals(Object) equals} method, then
* calling the {@code hashCode} method on each of the two objects
* must produce distinct integer results. However, the programmer
* should be aware that producing distinct integer results for
* unequal objects may improve the performance of hash tables.
* </ul>
*
* @implSpec
* As far as is reasonably practical, the {@code hashCode} method defined
* by class {@code Object} returns distinct integers for distinct objects.
*
* @return a hash code value for this object.
* @see java.lang.Object#equals(java.lang.Object)
* @see java.lang.System#identityHashCode
*/
The same comment could be written by expressing its structure and styling in Markdown, with little or no use of HTML and JavaDoc inline tags:
/// Returns a hash code value for the object. This method is
/// supported for the benefit of hash tables such as those provided by
/// [java.util.HashMap].
///
/// The general contract of `hashCode` is:
///
/// - Whenever it is invoked on the same object more than once during
/// an execution of a Java application, the `hashCode` method
/// must consistently return the same integer, provided no information
/// used in `equals` comparisons on the object is modified.
/// This integer need not remain consistent from one execution of an
/// application to another execution of the same application.
/// - If two objects are equal according to the
/// [equals][#equals(Object)] method, then calling the
/// `hashCode` method on each of the two objects must produce the
/// same integer result.
/// - It is _not_ required that if two objects are unequal
/// according to the [equals][#equals(Object)] method, then
/// calling the `hashCode` method on each of the two objects
/// must produce distinct integer results. However, the programmer
/// should be aware that producing distinct integer results for
/// unequal objects may improve the performance of hash tables.
///
/// @implSpec
/// As far as is reasonably practical, the `hashCode` method defined
/// by class `Object` returns distinct integers for distinct objects.
///
/// @return a hash code value for this object.
/// @see java.lang.Object#equals(java.lang.Object)
/// @see java.lang.System#identityHashCode
(For the purpose of this example, cosmetic changes like reflowing the text are deliberately avoided, to aid any before and after comparison.)
Here is a screenshot highlighting the differences:
Note the following:
-
The use of Markdown is indicated by a new form of documentation comment in which each line begins with
///
, instead of the current/** ... */
comment. -
The HTML
<p>
element is not required; a blank line is enough to indicate a paragraph break. -
The HTML
<ul>
and<li>
elements are replaced by Markdown bullet-list markers, using-
to indicate the beginning of each bullet in the list. -
The HTML
<em>
element is replaced by using underscores (_
) to indicate the font change. -
Instances of the
{@code ...}
tag are replaced by the equivalent use of backticks (`...`
) to indicate monospace font. -
Instances of
{@link ...}
to link to other program elements are replaced by extended forms of Markdown reference links. -
Use of block tags, like
@implSpec
,@return
and@see
is generally unaffected, although because the entire documentation comment is in Markdown, that includes the use of Markdown in the content of these tags, such as the backticks in the@implSpec
tag.
Comment Delimiters
Documentation comments containing Markdown use a new form of documentation comment,
composed of a series of consecutive lines, each beginning with optional whitespace
followed by ///
.
The overall syntax of a Markdown documentation comment is that of CommonMark. There are enhancements to links, to allow convenient linking to other program elements; simple GFM pipe tables are supported, as are all JavaDoc tags.
Links
You can create a link to an element declared elsewhere in your API by using an extended form of a Markdown reference link, in which the label for the reference is derived from a reference to the element itself.
To create a simple link whose text is derived from the identity of the element,
simply enclose a reference to the element in square brackets. For example,
to link to java.util.List
, you can write [java.util.List]
, or just
[List]
if there is an import
statement for java.util.List
in the code.
The text of the link will be displayed in monospace font.
The link is equivalent to using the standard JavaDoc {@link ...}
tag.
You can link to any kind of program element, as shown in the following examples:
/// * a module [java.base/]
/// * a package [java.util]
/// * a class [String]
/// * a field [String#CASE_INSENSITIVE_ORDER]
/// * a method [String#chars()]
To create a link with alternative text, use the form [text][element]
.
For example, to create a link to java.util.List
with the text a list
,
you can write [a list][List]
. The link will be displayed in the current font,
although you can use formatting details within the given text.
The link is equivalent to using the standard JavaDoc {@linkplain ...}
tag.
For example,
/// * [the `java.base` module][java.base/]
/// * [the `java.util` package][java.util]
/// * [a class][String]
/// * [a field][String#CASE_INSENSITIVE_ORDER]
/// * [a method][String#chars()]
According to the standard rules for reference links, you must escape any use of square
brackets within a reference. This might occur when you are creating a reference to a method
with an array parameter. The following shows a link to String.copyValueOf(char[])
[String#copyValueOf(char\[\])]
You can use all other forms of Markdown links as well, although links to other program elements are generally likely to be the most common.
Tables
Simple tables are supported, using the syntax defined in GitHub Flavored Markdown. For example, a simple table might be written as follows:
/// | Latin | Greek |
/// |-------|-------|
/// | a | alpha |
/// | b | beta |
/// | c | gamma |
Captions and other features that may be required for accessibility are not supported. In such situations, the use of HTML tables is still recommended.
JavaDoc tags
JavaDoc tags, both inline tags (like {@inheritDoc}
) and block tags
(like @param
and @return
), may be used in Markdown documentation comments,
although neither may be used within literal text,
such as in a code span (inline text enclosed within backticks)
or a code block (a block of text that is either indented or enclosed within
fences, like ```
or ~~~
).
For example, the following shows how JavaDoc tags can be mixed with Markdown.
/// {@inheritDoc}
/// In addition, this methods calls [#wait()].
///
/// @param i the index
public void m(int i) ...
The following examples illustrate that the character sequences @...
and {@...}
have no special meaning within code spans and code blocks.
/// The following code span contains literal text, and not a JavaDoc tag:
/// `{@inheritDoc}`
///
/// In the following indented code block, `@Override` is an annotation,
/// and not a JavaDoc tag:
///
/// @Override
/// public void m() ...
///
/// Likewise, in the following fenced code block, `@Override` is an annotation,
/// and not a JavaDoc tag:
///
/// ```
/// @Override
/// public void m() ...
/// ```
For those tags that may contain text with markup, in a Markdown documentation comment that markup will also be in Markdown format.
For example, the following shows the use of Markdown within a JavaDoc @param
tag:
/// @param the list, or `null` if no list is available
The {@inheritDoc}
tag is used to include documentation for a method
from one or more supertypes. The format of the comment containing the tag does not need
to be the same as the format of the comment containing the documentation to be inherited.
For example:
interface Base {
/** A method. */
void m()
}
class Derived implements Base {
/// {@inheritDoc}
public void m() { }
}
User-defined JavaDoc tags may be used in Markdown documentation comments,
as well as the standard JavaDoc tags. For example, in the JDK documentation
we define and use {@jls ...}
as a short form for links to the Java Language
Specification, and block tags like @implSpec
and @implNote
to introduce
sections of particular information.
/// For more information on comments, see {@jls 3.7 Comments}.
///
/// @implSpec
/// This implementation does nothing.
public void doSomething() { }
Standalone Markdown files
Markdown files in doc-files
subdirectories are processed appropriately, in
a similar manner to HTML files in such directories. JavaDoc tags in such files
will be processed. The page title will be inferred from the first heading.
YAML metadata, such as that supported by the Pandoc Markdown processor,
is not supported.
The file containing the content for the generated top-level ("overview") page may also be a Markdown file.
Syntax highlighting and embedded languages
The opening fence in a fenced code block may be followed by an info string, the first word of which is used to derive the CSS class name in the corresponding generated HTML, and which may be used by JavaScript libraries to enable syntax highlighting (such as with Prism) or rendering diagrams (such as with Mermaid).
For example, in conjunction with the appropriate libraries, the following shows how to display a fragment of CSS code with syntax highlighting:
/// ```css
/// p { color: red }
/// ```
You can add JavaScript libraries to your documentation by using the javadoc
--add-script
option.
Comment details
Because horizontal whitespace at the beginning and end of each line of Markdown code may be significant, the content of such a comment is determined as follows:
- Any leading whitespace and the three initial
/
characters are removed from each line. - The lines are shifted left, by removing leading whitespace characters, until the non-blank line with the least leading whitespace has no remaining leading whitespace.
- Additional leading whitespace and any trailing whitespace in each line is not removed, because it may be significant. For example, whitespace at the beginning of a line may indicate an indented code block or the continuation of a list item, and whitespace at the end of a line may indicate a hard line break.
(The policy to remove leading incidental whitespace is similar to that for String.stripIndent, except that there is no need for any special treatment for a trailing blank line.)
There are no restrictions on the characters that may appear after the ///
on each line of
the comment. In particular, the comment may contain code samples which may contain comments
of their own:
/// Here is an example:
///
/// ```
/// /** Hello World! */
/// public class HelloWorld {
/// public static void main(String... args) {
/// System.out.println("Hello World!"); // the traditional example
/// }
/// }
/// ```
As well as serving to visually distinguish the new kind of documentation
comment, the use of end-of-line (//
) comments eliminates the restrictions on
the content of the comment that are inherent with the use of traditional
(/* ... */
) comments. In particular, it is not possible to use the character
sequence */
within a traditional comment, (JLS 3.7) although it may
be desirable to do so when writing example code containing traditional comments,
strings containing "glob" expressions, and strings containing regular
expressions.
Note that for a blank line to be included in the comment, it must begin with any
optional whitespace followed by ///
. A completely blank line will cause any
preceding and following comment to be treated as separate comments, in which
case, all but the last comment will be discarded, and only the last comment
will be considered as a documentation comment for any declaration that may follow.
For example:
/// This is an example ...
///
/// ... of a 3-line comment containing a blank line.
The blank line in the following example will terminate the first comment, so that the comment that comes after the blank line will be treated as a separate comment.
/// This comment will be treated as a "dangling comment" and will be ignored.
/// This is the comment for the following declaration.
public void m() { }
The same is true for any other comment not beginning with ///
that may appear
between two ///
comments.
API and implementation
The representation of parsed documentation comments is covered by the com.sun.source.doctree
package in the Compiler Tree API.
We introduce a new type of tree node, RawTextTree
that just contains uninterpreted text,
together with an associated new kind
of DocTree.Kind.MARKDOWN
to indicate the kind
of content in a RawTextTree
. Corresponding new visitRawText
methods are added to
DocTreeVisitor
and its subtypes, DocTreeScanner
and DocTreePathScanner
.
RawTextTree
nodes with a kind of MARKDOWN
are used to represent Markdown content,
including HTML constructs, but excluding any JavaDoc tags, like {@inheritDoc}
or @param
.
We also introduce a new functional interface to support the ability to transform a DocCommentTree
after it has been parsed and before it is accessed by any downstream software, such as the
javadoc
tool or other utility software.
public interface DocCommentTreeTransformer {
DocCommentTree transform(DocTrees docTrees, DocCommentTree tree);
}
Processing Markdown takes place in three phases:
-
Parsing
Markdown comments are minimally parsed into a sequence of
RawTextTree
nodes, each with akind
ofDocTree.Kind.MARKDOWN
and containing Markdown content, interspersed with standardDocTree
nodes for inline and block tags. The individual inline and block tags are parsed in the same way as for traditional documentation comments, except that any tags that may contain descriptive content will also treat that content as Markdown content.The sequence of nodes is stored in a
DocCommentTree
node, in the normal manner.Unlike a traditional documentation comment, HTML constructs are not parsed into appropriate
DocTree
nodes, because too much of the surrounding context needs to be taken into account as well. -
Transformation
The
DocCommentTree
obtained by the initial parse is passed to an optional transformer, to analyze the Markdown content and replace any non-standard constructs with equivalentDocTree
nodes, such as those for Javadoc inline and block tags. The result of the transformation is an equivalentDocCommentTree
with any appropriate constructs replaced.The standard transformer, scans the tree for links to other API declarations, by looking for reference links with no associated link reference definition, and where the link label syntactically matches a reference to a program element. Any such links are replaced by an equivalent
{@link ...}
or{@linkplain ...}
node in the overall tree.The standard transformer is enabled by default when using the
javadoc
tool, but is opt-in for code using the Compiler Tree API directly, such as in any custom utilities to analyze Java source code and documentation comments, without necessarily generating class files or API documentation. (This avoids having a direct module dependency from thejdk.compiler
module to the module containing the internal Markdown library.)The transformer can be set by calling
DocTrees.setDocCommentTreeTransformer
, allowing for custom use in specialized applications. Alternate transformers could convert more Markdown constructs to HTML and JavaDoc tags. Like the existingDocTrees.setBreakIterator
method,setDocCommentTreeTransformer
should be called early in the compilation lifecyle, before any documentation comments are accessed. -
Rendering
The
DocCommentTree
that is parsed and transformed is rendered by thejavadoc
tool into HTML that is suitable for inclusion in the enclosing page that is being generated.For any sequence of
RawTextTree
and other nodes, the sequence is converted to a single string containing the text of theRawTextTree
nodes, and Unicode OBJECT REPLACEMENT CHARACTER (U+FFFC
) to indicate the position of non-Markdown content. The resulting string is rendered by the Markdown processor, and the U+FFFC characters replaced in the resulting output by the rendered forms of the non-Markdown content nodes.While most of the rendering is straightforward, special attention is given to Markdown headings.
-
The heading level is adjusted according to the enclosing context. This applies whether the heading was initially written in the documentation comment as an ATX-style heading (using a prefix
#
characters to indicate the level), or as a Setext-style heading (using underlining with=
or-
to indicate the level).For example, a level 1 heading in the documentation comment for a module, package or class will be rendered as a level 2 heading in the generated page, while a level 1 heading in the documentation comment for a field, constructor or method will be rendered a level 4 heading in the generated page.
The adjustment only applies to Markdown headings, and not to any direct use of HTML headings.
-
An
id
attribute will be included in the rendered HTML node, so that the heading can easily be referenced from elsewhere. The id will be generated from the content of the heading, in the same manner as other ids generated byjavadoc
.(You can easily obtain a link to the heading by clicking on the popup link icon when viewing the heading in a browser.)
-
The text of the heading is added to the main "Search" index for the overall documentation.
-
Markdown support leverages the use of an internal copy of the standard commonmark-java library. By design, the use of the library does not show in any public supported JDK API.
Future Work
The design is intended to support the following possible features:
Block tags
It would be possible to detect some stylized uses of a heading followed by appropriate content as something that could be converted into equivalent JavaDoc tags.
For example, a heading of Parameters
followed by a list of parameter names and their
descriptions could be converted into equivalent @param
tags.
-
Comment
# Parameters * x the x coordinate * y the y coordinate
-
Translation
@param x the x coordinate @param y the y coordinate
A similar policy could be adopted for the list of exceptions that may be thrown by a method:
-
Comment
# Throws * NullPointerException if the first parameter is `null` * NullPointerException if the second parameter is `null` * IllegalArgumentException if an argument is not accepted
-
Translation
@throws NullPointerException if the first parameter is `null` @throws NullPointerException if the second parameter is `null` @throws IllegalArgumentException if an argument is not accepted
There should only ever be a single description of the return value for a method, so there is no need to use a list in this case:
-
Comment
# Returns the square root of the argument
-
Translation
@return the square root of the argument
Note that while the proposed forms do look like "normal Markdown", they also take up more vertical space, and so developers may prefer to stay with the more concise forms, using old-style JavaDoc tags.
While it may be difficult to extend this strategy to all block tags, including
user-specified tags, it is worth noting that for the JDK code base, just 5 tags
(@param
, @return
, @see
, @throws
, and @since
) account for over 90% of all
uses of block tags.
Inline tags
While the uses of most block tags could be replaced by stylized use of headings
and ensuing content, there is no such equivalent for most of the less common inline
tags. Of these, {@inheritDoc}
is the most common, and there is no obvious
analog in Markdown. Rather than invent an alternative syntax for the sake of it,
it seems better to continue with the existing inline tag syntax.
Alternatives
Pluggable implementation
Instead of leveraging a specific Markdown parser implementation, an alternative could be to support the use of other user-specified Markdown processors, providing different flavors of Markdown. However, such an approach could lead to inconsistencies when generating documentation spanning different libraries for little perceived gain.
Translating more Markdown to HTML
The standard transformer is deliberately minimal,
primarily to handle the extended link nodes. A more powerful transformer
could translate additional Markdown nodes into equivalent DocTree
nodes,
representing plain text, HTML, and JavaDoc tags.
While such an approach would have the advantage that API clients may not need to be aware that the original source for the comment was in Markdown, there are also a number of disadvantages.
-
The more removed the representation is from the original syntax tree, the harder it is to give accurate, relevant diagnostics, should any be necessary. For example, messages about a (synthetic)
<table>
element may be confusing if there is no such item explicitly in the original comment. -
When synthesizing
DocTree
nodes for HTML elements derived from Markdown constructs, it is difficult to give accurate position information that relates the node back to its position in the original comment, since the node has no representation in the original comment. At best, you can give a nearby position. This problem has an analog in the Java compiler,javac
, when assigning positions for synthetic elements such as the default no-args constructor, or for bridge methods, which all typically appear to be defined at the opening brace of the class declaration. -
A general solution is difficult because it would require knowledge of any and all the JavaDoc tags that may be involved, because many tags permit rich content, such as Markdown or HTML, as part but not all of their content. For example, an
@param
tag is followed by a parameter name before the description, and the name may be enclosed in<...>
if the name is that of a type parameter: it would be wrong to interpret that name as a fragment of HTML. Likewise, the@serialField
is followed by a name and a type before the description. And while these are standard tags known to the standard doclet, the doclet also allows the use of user-defined tags as well.
While it may not be desirable for the default transformer to convert
more Markdown nodes to DocTree
nodes, it would be possible to
for there to be additional transformers that a client may choose to
use that are adequate and sufficient for their needs. This would primarily
be for applications other than javadoc
, that want to do custom processing
of documentation comments.
Using /**...*/
Comments
The somewhat more interesting alternative to consider is to parse Markdown
in the existing form of /**...*/
comments instead of or in addition to
introducing the use of ///
comments. While Markdown allows the use of
embedded HTML, Markdown and HTML are different languages with different
syntax rules. In HTML, whitespace is only significant as literal text in
a <pre>
element, whereas in Markdown, vertical whitespace may indicate
a paragraph break, leading horizontal whitespace may indicate an indented
code block or a nested list, and trailing whitespace may indicate a hard
line break, equivalent to <br>
in HTML. Additionally, there are numerous
examples in the JDK code of square brackets being used in narrative text
and so at risk of being interpreted as a link to a program element
(for example, The information is returned as a two-dimensional array (array[x][y])
).
Configurable comment styles
While it would be possible to build a way to configure a system that accept some
/** ... */
documentation comments in Markdown format and others in HTML format,
it is not clear that such a mechanism would have any significant advantage over
the more overt use of ///
comments for comments in Markdown format and the
continued use of /** ... */
for comments in HTML format.
Risks and Assumptions
Because the implementation employs a third-party library, commonmark-java, to transform Markdown to HTML, if external support for the library is dropped, we will have to maintain a fork of the library for use in our tools or find an equivalent alternative.
An additional risk is that of an increased presence of errors in
generated API specifications, because of the reduced ability to
check for bad code, and because authors sometimes omit to check
the generated form of their documentation. (JBS reports there have
been over forty issues in JDK containing the words "bad" or "malformed"
"HTML" in their Summary description, with over ten in the main core-libs
area.
Many involve mismatched tags that render the page effectively unusable.
And all of which, presumably, were not detected up front by the authors or
their reviewers.)
Dependencies
The primary dependency is on the chosen third-party library. There are no JDK features that are dependent on this work at this time.