Issues
Performance
javac using Java.g is significantly slower than using the standard parser. The are two potential solutions.
- Tweak Java.g where possible to improve performance. There is a trade off here between clarity and performance, since it is also a goal to try and unify the grammars used by javac and presented in JLS.
- Improve the code generated by ANTLR. See, for example, Faster expression parsing for Antlr by Terence Parr.
The following table measures the performance of the ANTLR javac compiler against that the standard compiler. Two bodies of code were tested:
- The OpenJDK langtools repository: 156,336 lines in 662 files
- The OpenJDK jdk repository: 2,563,605 lines in 7569 files
For each body of code, the following measurements were taken:
- The time taken to just scan (lex) the .java source files
- The time taken to scan and parse the .java source files
- The time taken to compile the .java source files
- The time taken to complete a standard build
Times in the following table are in seconds; for the first three
rows, the times were measured as elapsed time using
System.currentTimeMillis(); for the full build, the times were
measured with the Unix time
command, using the sum of
user time and system time.
langtools | jdk | |||||
---|---|---|---|---|---|---|
standard javac | ANTLR javac | / | standard javac | ANTLR javac | / | |
lex source files | 0.111 | 1.223 | 11.02 | 18.181 | 190.54 | 10.48 |
lex and parse source files | 0.366 | 2.609 | 7.12 | 47.123 | 346.34 | 7.35 |
compile source files | 9.176 | 18.730 | 2.04 | 84.481 | 168.161 | 1.99 |
full build | 47.915 | 58.751 | 1.23 | 487.68 | 574.96 | 1.17 |
Although the ANTLR lexer and parser are significantly slower than their hand-written counterparts, the impact is ameliorated in the context of typical real world usage.
Error handling
The error messages generated by ANTLR are typically not as detailed as those generated by the standard parser, which can often give more hints about what is expected at any point.
The standard parser has features that make it suitable for use in an IDE like NetBeans, that are not necessarily required for a batch compiler. In particular, it has support for improved error recovery, after a syntax error has been found, and it has support for retaining valid subtrees in the context of a syntax error. For example, consider this input:
for (int i = 0; i < 10; i++) ?
The trees for "int i = 0;
", "i <
10
", and "i++
" should not necessarily be
discarded just because there is an error in the body of the
for-loop. In a case like this, the standard parser will return an
ErrorTree containing any trees which were successfully parsed
before the syntax error was discovered. This allows an IDE to
analyze those trees even though the complete statement is
syntactially malformed.
Interaction with IDEs and other tools
While not encouraged or necessarily supported, some downstream clients of javac may depend on internal API, such as the lexer and parser classes. The clients may be significantly affected by a change to using an ANTLR-based lexer and parser.