JEP 467: Markdown Documentation Comments

OwnerJonathan Gibbons
TypeFeature
ScopeSE
StatusCandidate
Componenttools / javadoc(tool)
Discussionjavadoc dash dev at openjdk dot org
Reviewed byRon Pressler
Created2023/09/11 17:45
Updated2024/04/16 17:37
Issue8316039

Summary

Enable JavaDoc documentation comments to be written in Markdown rather than solely in a mixture of HTML and JavaDoc @-tags.

Goals

Non-Goals

Motivation

Documentation comments are stylized comments appearing in source code, near to the declarations that they serve to document. Documentation comments in Java source code use a combination of HTML and custom JavaDoc tags to mark up the text.

The choice of HTML for a markup language was reasonable in 1995. HTML is powerful, standardized, and was very popular at the time. But while it is no less popular today as a markup language consumed by web browsers, in the years since 1995 HTML has become much less popular as markup that is manually produced by humans because it is tedious to write and hard to read. These days it is more commonly generated from some other markup language that is more suitable for humans. Because HTML is tedious to write, nicely-formatted documentation comments are also tedious to write, and even more tedious since many new developers are not fluent in HTML due to its decline as a human-produced format.

Inline JavaDoc tags, such as {@link} and {@code}, are also cumbersome and are even less familiar to developers, often requiring the author to consult the documentation for their usage. A recent analysis of the documentation comments in the JDK source code showed that over 95% of the uses of inline tags were for code fragments and links to elsewhere in the documentation, suggesting that simpler forms of these constructs would be welcome.

Markdown is a popular markup language for simple documents that is easy to read, easy to write, and easily transformed into HTML. Documentation comments are typically not complicated structured documents, and for the constructs that typically appear in documentation comments, such as paragraphs, lists, styled text, and links, Markdown provides simpler forms than HTML. For those constructs that Markdown does not directly support, Markdown allows the use of HTML as well.

Introducing the ability to use Markdown in documentation comments would bring together the best of both worlds. It would allow concise syntax for the most common constructs and reduce the need for HTML markup and JavaDoc tags, while retaining the ability to use specialized tags for features not available in Markdown. It would make it easier to write and easier to read documentation comments in source code, while retaining the ability to generate the same sort of generated API documentation as before.

Description

As an example of the use of Markdown in a documentation comment, consider the comment for java.lang.Object.hashCode:

/**
 * Returns a hash code value for the object. This method is
 * supported for the benefit of hash tables such as those provided by
 * {@link java.util.HashMap}.
 * <p>
 * The general contract of {@code hashCode} is:
 * <ul>
 * <li>Whenever it is invoked on the same object more than once during
 *     an execution of a Java application, the {@code hashCode} method
 *     must consistently return the same integer, provided no information
 *     used in {@code equals} comparisons on the object is modified.
 *     This integer need not remain consistent from one execution of an
 *     application to another execution of the same application.
 * <li>If two objects are equal according to the {@link
 *     #equals(Object) equals} method, then calling the {@code
 *     hashCode} method on each of the two objects must produce the
 *     same integer result.
 * <li>It is <em>not</em> required that if two objects are unequal
 *     according to the {@link #equals(Object) equals} method, then
 *     calling the {@code hashCode} method on each of the two objects
 *     must produce distinct integer results.  However, the programmer
 *     should be aware that producing distinct integer results for
 *     unequal objects may improve the performance of hash tables.
 * </ul>
 *
 * @implSpec
 * As far as is reasonably practical, the {@code hashCode} method defined
 * by class {@code Object} returns distinct integers for distinct objects.
 *
 * @return  a hash code value for this object.
 * @see     java.lang.Object#equals(java.lang.Object)
 * @see     java.lang.System#identityHashCode
 */

The same comment can be written by expressing its structure and styling in Markdown, with no use of HTML and just a few JavaDoc inline tags:

/// Returns a hash code value for the object. This method is
/// supported for the benefit of hash tables such as those provided by
/// [java.util.HashMap].
///
/// The general contract of `hashCode` is:
///
///   - Whenever it is invoked on the same object more than once during
///     an execution of a Java application, the `hashCode` method
///     must consistently return the same integer, provided no information
///     used in `equals` comparisons on the object is modified.
///     This integer need not remain consistent from one execution of an
///     application to another execution of the same application.
///   - If two objects are equal according to the
///     [equals][#equals(Object)] method, then calling the
///     `hashCode` method on each of the two objects must produce the
///     same integer result.
///   - It is _not_ required that if two objects are unequal
///     according to the [equals][#equals(Object)] method, then
///     calling the `hashCode` method on each of the two objects
///     must produce distinct integer results.  However, the programmer
///     should be aware that producing distinct integer results for
///     unequal objects may improve the performance of hash tables.
///
/// @implSpec
/// As far as is reasonably practical, the `hashCode` method defined
/// by class `Object` returns distinct integers for distinct objects.
///
/// @return  a hash code value for this object.
/// @see     java.lang.Object#equals(java.lang.Object)
/// @see     java.lang.System#identityHashCode

(For the purpose of this example, cosmetic changes such as reflowing the text are deliberately avoided, to aid in before-and-after comparison.)

Key differences to observe:

Here is a screenshot highlighting the differences between the two versions, side by side:

A screenshot of the differences

Using /// for Markdown comments

The use of /// for Markdown comments is to overcome some issues when using traditional /** comments.

  1. Any block comment beginning /* cannot contain the character sequence */. (JLS §3.7). It is becoming increasingly common to put examples of code in documentation comments, and this restriction precludes any examples containing embedded /*...*/ comments or any expression containing the characters */, without the use of very disruptive workarounds.

    In any // comment, there is no restriction on the characters that may appear on the rest of the line.

  2. In a documentation comment beginning /**, the use of leading whitespace followed by one or more asterisks on each line is optional. When generally omitted from the lines of a comment there is an ambiguity with Markdown constructs that themselves begin with an asterisk, such as emphases, list items or thematic breaks.

    Using /// comments, there is never any such ambiguity.

  3. In a documentation comment that uses leading whitespace and asterisks on each line, it is common practice to use an additional space after any asterisks, to separate the content of the line from the marker at the beginning of the line. In Markdown, the amount of leading whitespace on each line is significant, with particular significance given to lines beginning with up to three spaces. It is a loss to give up one of those characters to separate the content from the marker at the beginning of the line, and might even be misleading for authors used to counting whitespace characters. Conversely, if no space is used, the issue of ambiguity again arises when the Markdown content begins with an asterisk.

    Using /// comments, new rules are introduced to strip leading indentation from the lines of the comment, so that whitespace may be used after the initial comment marker, without affecting the use of whitespace to begin some of the lines of Markdown content.

It is not an option to change the syntax of the Java language to allow new forms of comment. Therefore, any new style of documentation comment must be in the form of either a traditional comment or a series of end-of-line comments. While the preceding points justify the use of end-of-line comments instead of traditional comments, the question remains of how to disambiguate documentation comments from other uses of end-of-line comments. The use of an additional / echoes the use of an additional * when using a traditional comment for a documentation comment. In addition, while not a primary consideration, other languages that support end-of-line documentation comments (like Dart and Rust) have successfully been using /// for some time now.

Syntax

Markdown documentation comments are written in the CommonMark variant of Markdown. There are enhancements to links, to allow convenient linking to other program elements. Simple GFM pipe tables are supported, as are all JavaDoc tags.

You can create a link to an element declared elsewhere in your API by using an extended form of Markdown reference link, in which the label for the reference is derived from a standard JavaDoc reference to the element itself.

To create a simple link whose text is derived from the identity of the element, simply enclose a reference to the element in square brackets. For example, to link to java.util.List, you can write [java.util.List], or just [List] if there is an import statement for java.util.List in the code. The text of the link will be displayed in the monospace font. The link is equivalent to using the standard JavaDoc {@link ...} tag.

You can link to any kind of program element:

/// - a module [java.base/]
/// - a package [java.util]
/// - a class [String]
/// - a field [String#CASE_INSENSITIVE_ORDER]
/// - a method [String#chars()]

To create a link with alternative text, use the form [text][element]. For example, to create a link to java.util.List with the text a list, you can write [a list][List]. The link will be displayed in the current font, although you can use formatting markup within the text. The link is equivalent to using the standard JavaDoc {@linkplain ...} tag.

For example:

/// - [the `java.base` module][java.base/]
/// - [the `java.util` package][java.util]
/// - [a class][String]
/// - [a field][String#CASE_INSENSITIVE_ORDER]
/// - [a method][String#chars()]

In reference links, you must escape any use of square brackets. This might occur in a reference to a method with an array parameter; for example, a link to String.copyValueOf(char[]) is written as [String#copyValueOf(char\[\])].

You can use all other forms of Markdown links, including links to URLs, but links to other program elements are likely to be the most common.

Tables

Simple tables are supported, using the syntax of GitHub Flavored Markdown. For example:

/// | Latin | Greek |
/// |-------|-------|
/// | a     | alpha |
/// | b     | beta  |
/// | c     | gamma |

Captions and other features that may be required for accessibility are not supported. In such situations, the use of HTML tables is still recommended.

JavaDoc tags

JavaDoc tags, both inline tags such as {@inheritDoc} and block tags such as @param and @return, may be used in Markdown documentation comments:

/// {@inheritDoc}
/// In addition, this methods calls [#wait()].
///
/// @param i the index
public void m(int i) ...

JavaDoc tags may not be used within literal text, such as code spans (`...`) or code blocks, that is, blocks of text that are either indented or enclosed within fences such as ``` or ~~~. In other words, the character sequences @... and {@...} have no special meaning within code spans and code blocks:

/// The following code span contains literal text, and not a JavaDoc tag:
/// `{@inheritDoc}`
///
/// In the following indented code block, `@Override` is an annotation,
/// and not a JavaDoc tag:
///
///     @Override
///     public void m() ...
///
/// Likewise, in the following fenced code block, `@Override` is an annotation,
/// and not a JavaDoc tag:
///
/// ```
/// @Override
/// public void m() ...
/// ```

For those tags that may contain text with markup, in a Markdown documentation comment that markup is also in Markdown:

/// @param l   the list, or `null` if no list is available

The {@inheritDoc} tag incorporates documentation for a method from one or more supertypes. The format of the comment containing the tag does not need to be the same as the format of the comment containing the documentation to be inherited:

interface Base {
    /** A method. */
    void m()
}

class Derived implements Base {
    /// {@inheritDoc}
    public void m() { }
}

User-defined JavaDoc tags may be used in Markdown documentation comments. For example, in the JDK documentation we define and use {@jls ...} as a short form for links to the Java Language Specification, and block tags such as @implSpec and @implNote to introduce sections of particular information:

/// For more information on comments, see {@jls 3.7 Comments}.
///
/// @implSpec
/// This implementation does nothing.
public void doSomething() { }

Standalone Markdown files

Markdown files in doc-files subdirectories are processed appropriately, in a similar manner to HTML files in such directories. JavaDoc tags in such files are processed. The page title is inferred from the first heading. YAML metadata, such as that supported by the Pandoc Markdown processor, is not supported.

The file containing the content for the generated top-level overview page may also be a Markdown file.

Syntax highlighting and embedded languages

The opening fence in a fenced code block may be followed by an info string. The first word of the info string is used to derive the CSS class name in the corresponding generated HTML, and may also be used by JavaScript libraries to enable syntax highlighting (such as with Prism) and rendering diagrams (such as with Mermaid).

For example, in conjunction with the appropriate libraries, this would display a fragment of CSS code with syntax highlighting:

/// ```css
/// p { color: red }
/// ```

You can add JavaScript libraries to your documentation by using the javadoc --add-script option.

Syntactical details

Because horizontal whitespace at the beginning and end of each line of Markdown text may be significant, the content of a Markdown documentation comment is determined as follows:

(The policy to remove leading incidental whitespace is similar to that for String.stripIndent(), except that there is no need for any special treatment for a trailing blank line.)

There are no restrictions on the characters that may appear after the /// on each line of the comment. In particular, the comment may contain code samples which may contain comments of their own:

/// Here is an example:
///
/// ```
/// /** Hello World! */
/// public class HelloWorld {
///     public static void main(String... args) {
///         System.out.println("Hello World!"); // the traditional example
///     }
/// }
/// ```

As well as serving to visually distinguish the new kind of documentation comment, the use of end-of-line (//) comments eliminates the restrictions on the content of the comment that are inherent with the use of traditional (/* ... */) comments. In particular, it is not possible to use the character sequence */ within a traditional comment (JLS §3.7) although it may be desirable to do so when writing example code containing traditional comments, strings containing glob expressions, and strings containing regular expressions.

For a blank line to be included in the comment, it must begin with any optional whitespace and then ///:

/// This is an example ...
///
/// ... of a 3-line comment containing a blank line.

A completely blank line will cause any preceding and following comment to be treated as separate comments. In that case, all but the last comment will be discarded, and only the last comment will be considered as a documentation comment for any declaration that may follow:

/// This comment will be treated as a "dangling comment" and will be ignored.

/// This is the comment for the following declaration.
public void m() { }

The same is true for any other comment not beginning with /// that may appear between two /// comments.

API and implementation

Parsed documentation comments are represented by elements of the com.sun.source.doctree package in the Compiler Tree API.

We introduce a new type of tree node, RawTextTree, which contains uninterpreted text, together with a new tree-node kind, DocTree.Kind.MARKDOWN, which indicates Markdown content in a RawTextTree. We add corresponding new visitRawText methods to DocTreeVisitor and its subtypes, DocTreeScanner and DocTreePathScanner.

RawTextTree nodes with a kind of MARKDOWN represent Markdown content, including HTML constructs but excluding any JavaDoc tags such as {@inheritDoc} and @param.

Markdown text is processed in two phases:

  1. Parsing — Markdown comments are parsed into a sequence of RawTextTree nodes, each with a kind of DocTree.Kind.MARKDOWN and containing Markdown content, interspersed with standard DocTree nodes for inline and block tags. The inline and block tags are parsed in the same way as for traditional documentation comments, except that tag content is also parsed as Markdown. The sequence of nodes is stored in a DocCommentTree node, in the normal manner.

    Unlike a traditional documentation comment, HTML constructs are not parsed into corresponding DocTree nodes, because too much of the surrounding context needs to be taken into account.

    The Markdown content in the DocCommentTree resulting from the initial parse is then examined for any reference links with no associated link reference definition, and for which the link label syntactically matches a reference to a program element. Any such link is replaced by an equivalent node representing either {@link ...} or {@linkplain ...}.

  2. Rendering — The DocCommentTree is rendered by the javadoc tool into HTML that is suitable for inclusion in the page being generated.

    Any sequence of RawTextTree nodes and other nodes is converted into a single string containing the text of the RawTextTree nodes with the Unicode OBJECT REPLACEMENT CHARACTER (U+FFFC) standing in for non-Markdown content. The resulting string is rendered by the Markdown processor and then the U+FFFC characters are replaced in the resulting output by the rendered forms of the non-Markdown content nodes.

    While most of the rendering is straightforward, special attention is given to Markdown headings:

    • The heading level is adjusted according to the enclosing context. This applies whether the heading was initially written in the documentation comment as an ATX-style heading (using a prefix of # characters to indicate the level) or as a Setext-style heading (using underlining with = or - to indicate the level).

      For example, a level 1 heading in the documentation comment for a module, package, or class is rendered as a level 2 heading in the generated page, while a level 1 heading in the documentation comment for a field, constructor, or method is rendered as a level 4 heading in the generated page.

      This adjustment applies only to Markdown headings, not to any direct use of HTML headings.

    • An id identifier attribute is included in the rendered HTML so that the heading can easily be referenced from elsewhere. The identifier is generated from the content of the heading, in the same manner as other identifiers generated by javadoc. (You can easily obtain a link to the heading by clicking on the popup link icon when viewing the heading in a browser.)

    • The text of the heading is added to the main search index for the generated documentation.

The implementation leverages an internal copy of the well-known commonmark-java library. By design, the use of the library is not revealed in any public supported JDK API.

While most of the features described here are part of the JDK javadoc tool and the Compiler Tree API in the jdk.javadoc module, there is one place where the use of a new style for documentation comments will be observable in the Java SE java.compiler module. This is the method Elements.getDocComment which returns the normalized text of the documentation comment, if any, for a declaration. This method is updated to encompass /// comments. In addition, because the kind of comment affects its potential interpretation, a new method is provided to determine whether the documentation comment for a declaration uses the /** ...*/ form using traditional comments or the /// form using end-of-line comments.

Future Work

It would be possible to detect some stylized uses of headings followed by appropriate content and convert them into equivalent JavaDoc tags.

For example, a heading of Parameters followed by a list of parameter names and their descriptions could be converted into equivalent @param tags:

A similar policy could be adopted for the list of exceptions that may be thrown by a method:

There should only ever be a single description of the return value for a method, so there is no need to use a list in this case:

The proposed forms do look like normal Markdown, but they also take up more vertical space. Developers may prefer to stay with the more concise forms, using old-style JavaDoc tags.

It may be difficult to extend this strategy to all block tags, including user-specified tags, but in the JDK code base just five tags (@param, @return, @see, @throws, and @since) account for over 90% of all uses of block tags.

Alternatives

Pluggable implementation

Instead of leveraging a specific Markdown parser implementation, we could instead support the use of other user-specified Markdown processors, providing different flavors of Markdown. However, such an approach could lead to inconsistencies when generating documentation spanning different libraries for little perceived gain.

Translating more Markdown to HTML

We could translate additional Markdown constructs into equivalent DocTree nodes, representing plain text, HTML, and JavaDoc tags. While such an approach would have the advantage that API clients may not need to be aware that the original source for the comment was in Markdown, there are also a number of disadvantages:

Inline tags

While the uses of most block tags could be replaced by stylized use of headings and ensuing content, there is no such equivalent for most of the less common inline tags. Of these, {@inheritDoc} is the most common, and there is no obvious analog in Markdown. Rather than invent an alternative syntax for the sake of it, it seems better to continue with the existing inline tag syntax.

Markdown in /**...*/ comments

As described above, there are many advantages to using /// for documentation comments. Setting those reasons aside, we could parse Markdown embedded in traditional /**...*/ comments instead of, or in addition to, introducing /// comments.

There are two possibilities: either encode within each /** comment a way to distinguish between a Markdown comments and a traditional comment, or treat all existing /** comments as Markdown comments.

Treating existing comments as Markdown is untenable, however, because Markdown and HTML are different languages with different syntax rules. In HTML, whitespace is only significant as literal text in a <pre> element. In Markdown, by contrast, vertical whitespace may indicate a paragraph break, leading horizontal whitespace may indicate an indented code block or a nested list, and trailing whitespace may indicate a hard line break, equivalent to <br> in HTML. Additionally, the [rules][cm-html] for using HTML in Markdown documents are somewhat convoluted and non-intuitive. Finally, there are numerous examples in the JDK code of square brackets in narrative text, which would risk being interpreted as links to program elements; for example, The information is returned as a two-dimensional array (array[x][y]).

It would be possible to encode the kind of comment by placing a short string immediately after the initial /** to indicate when the ensuing text should be treated as Markdown. For example:

/**md
 * Hello _World!_
 */

When this was prototyped, it was generally unpopular, and was seen to be too intrusive in small comments, and too insignificant in big comments.

Configurable comment styles

We could build a configurable system that accepts some /** ... */ documentation comments in Markdown and others in HTML. It is not clear, however, that such a mechanism would have any significant advantage over the more overt use of /// comments for comments in Markdown and the continued use of /** ... */ for comments in HTML.

Risks and Assumptions