JEP draft: Escape Sequences For Line Continuation and White Space (Preview)

OwnerJim Laskey
TypeFeature
ScopeSE
StatusClosed / Withdrawn
Componentspecification / language
Discussionamber dash dev at openjdk dot java dot net
EffortS
DurationS
Created2019/07/17 14:12
Updated2019/10/29 12:52
Issue8227870

Summary

Add two new escape sequences for string literals and text block for managing explicit whitespace and carriage control.

Goals

Motivation

JEP 355 - Text Blocks (Preview) made great strides to improve the readability of complex string literals and string expressions. Nonetheless, there were a few issues left to be resolved. Specifically,

Discarding Implicit Newlines

In text blocks, newlines (U+000A) are not typically declared explicitly using \n. Instead, newlines are inserted implicit wherever content breaks to the next line. What if an implicit newline is not desired?

For example, it is common practice to split very long string literals into concatenations of smaller substrings and then hard wrapping the resulting string expression onto multiple lines.

String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
                   "elit, sed do eiusmod tempor incididunt ut labore " +
                   "et dolore magna aliqua.";

This is exactly the form of complex string expression that text blocks express more readably.

String text = """
                Lorem ipsum dolor sit amet, consectetur adipiscing
                elit, sed do eiusmod tempor incididunt ut labore
                et dolore magna aliqua.
                """;

However, using text blocks to represent long strings has a drawback. An implicit newline is inserted on every line.

It would be helpful to be able to selectively denote which lines do not pick up the implicit newline.

The positioning of text block closing delimiter can be used to discard the final newline or to manage content indentation, but not both in the same text block. That is:

It is sometimes desirable to position the closing delimiter to preserve leading white space without a final newline in the string.

Retaining Trailing White Space In Text Blocks

The space (U+0020) character's lack of observability creates a problem for text blocks. Text blocks are missing the per line delimiters, like those found in string literals, that clearly indicate where the content of a line begins and where the content of a line ends.

This lack of direct space character observability is the primary influencer for text blocks defaulting to strip trailing white space. However, this decision leads to a counter issue. How does a developer retain trailing white space in a text block?

The simplest solution is to use an observable placeholder such as the octal escape sequence for space \040 (ASCII character 32, white space)

String colors = """
    red\040\040\040\040\040
    green\040\040\040
    blue\040\040\040\040
    """;

This works because escape sequences are converted after incidental white space is removed. The above text block can be reduced to:

String colors = """
    red    \040
    green  \040
    blue   \040
    """;

We can do this because only the last space needs to be observable. This observable character sequence acts as a fence, preventing the stripping of trailing white space from going beyond the sequence. Any white space to the left of the fence is not stripped away. Retention of trailing white space can be provided by using a character sequence fence.

Still, this use of the \040 octal escape sequence is rather arcane. Beside the excessiveness, these sequences can preplex readers not fully versed in ASCII. Readability is enhanced when a more intuitive escape sequence is available for observable space.

Description

Change JLS 3.10.6 Escape Sequences for Character and String Literals and String::translateEscapes to recognize two new catagories of escape sequences:


Alternatives

Line Continuation

Alternate escape sequences were evaluated, such as \+, \-, \c. It is felt that \<line-terminator> is the least obscuring sequence and is consistent with other languages (ex. bash.)

Having \<line-terminator> defined as a generalized continuation sequence in the Java language was also evaluated. It is felt that unlike other languages (ex. C macroes), continuation would only be relevant to string literals and text blocks.

One of the main advantages of using the \<line-terminator> escape sequences over other line continuation techniques is the zero runtime cost. Most alternatives require some kind of runtime computation, with the cost increasing as a string gets larger. While we could reduce or zero this cost with optimization, developers are loath to use method invocation for prosaic idioms; gets in their way.

Even so, a straightforward approach for line wrapping long string literals would be to simply replace the newlines with spaces or empty string.

String text = """
                Lorem ipsum dolor sit amet, consectetur adipiscing
                elit, sed do eiusmod tempor incididunt ut labore
                et dolore magna aliqua.""".replace('\n', ' ');

or

String text = """
                Lorem ipsum dolor sit amet, consectetur adipiscing
                 elit, sed do eiusmod tempor incididunt ut labore
                 et dolore magna aliqua.
                """.replace("\n", "");

Either approach may get the desired result, but is still encumbered with runtime cost and the need for an explicit call. As well, we have little control over line terminator retention or trailing white space stripping.

Another approach is to use a visible fence sequence, such as $ or .... The fence sequence, in combination with the line terminator, does provide control over line terminator retention or trailing white space stripping, but is still encumbered with runtime cost and the need for an explicit call.

String text = """
                Lorem ipsum dolor sit amet, consectetur adipiscing $
                elit, sed do eiusmod tempor incididunt ut labore $
                et dolore magna aliqua.""".replace("$.\n", "");

or

String text = """
                Lorem ipsum dolor sit amet, consectetur adipiscing ...
                elit, sed do eiusmod tempor incididunt ut labore ...
                et dolore magna aliqua.""".replace("...\n", "");

Observable Space

Observable space could be downplayed as an aesthetic change but we feel that \s provides significant code clarity. The following examples equivalently represent five spaces:

"     "
"\040\040\040\040\040"
"\u0020\u0020\u0020\u0020\u0020"
"\s\s\s\s\s"

Other escape sequences were evaluated. Most other sequences don't add value and the association of \s to space is clear (as \t is for tab).

\<space> was also considered.

"\ \ \ \ \ "

While aesthetically acceptable, the interpretation of \<space> at the end of line becomes perplexing. Is this a \<space> or a \␊? Using the observable character s removes any ambiguity.

Alternate character sequence fences can be used for trailing white space retention, but they incur a runtime cost when stripped away.

String colors = """
        red   $
        green $
        blue  $
        """.replace("$\n", "\n");

It is also possible to change the text block rules to not remove incidental white space. However, the strong argument for removing incidental white space remains.

Testing

Tests will be added to test various permutations of the new escape sequences, along with testing interaction with existing escape sequences.

Risks and Assumptions

The primary risk is that there may be tools in the field that cannot be modified to accept these new escape sequences.

Dependencies

Dependent on JEP 355 - Text Blocks (Preview) moving forward.