JEP draft: Safer Process Launch by ProcessBuilder and Runtime.exec
Author | rriggs |
Owner | Roger Riggs |
Type | Feature |
Scope | JDK |
Status | Draft |
Release | tbd |
Component | core-libs / java.lang |
Effort | S |
Duration | M |
Created | 2021/03/16 19:15 |
Updated | 2022/03/24 16:40 |
Issue | 8263697 |
Summary
Improve safety of process launch by ProcessBuilder and Runtime.exec on Windows. The arguments are checked to ensure they can be passed to the launched application without the possibility of splitting or combining arguments.
Motivation
The java.lang.Runtime.exec and java.lang.ProcessBuilder APIs are used to launch an application in a separate operating system process. On Windows, the arguments of the caller are encoded into a single command line, to be decoded by the application. It is natural to expect the list of arguments supplied to be passed to the child application intact. While it is true for most operating systems, on Windows, it is not always possible to pass an arbitrary string due the details of command line encoding and decoding.
The details of encoding arguments by Runtime.exec
and ProcessBuilder
has been previously described only as implementation specific, leaving it to
developers to discover, sometimes by trial and error, what works for their particular
program. For example, if an argument contains
or may contain space or tab, applications find it appealing to add double-quotes
around an argument. ProcessBuilder already handles the encoding of arguments with
space and tab to provide some measure of portability across operating systems. The
application supplied quotes are not necessary and create an ambiguity about whether
the quotes themselves are intended to be passed to the application or are only
present to ensure the argument is passed as a single string.
On Windows, the arguments are assembled and encoded into a single string passed to the new process when it is created using CreateProcess. In the newly created program, the arguments are parsed from the command line using one of the common conventions for the meaning of quotes, backslashes and special characters. This can lead to cases where the intent is unclear and raises the possibility of a mismatch between the encoding of the arguments and the parsing of the command line into the corresponding arguments. To resolve the ambiguity, the encoding should be well-defined with respect to quotes and special characters and a good match to the parsing of the command line in the application.
For example, quotes are typically balanced, but if there are unmatched quotes in an
argument and
the argument is added to the command line, the matching quote may not be present and
subsequent arguments may be merged into the malformed argument. For example, a
list of three arguments { "abc", "\"def", "xyz" }
could be naively inserted into
the command line as: abc "def xyz
and would later be parsed as: {"abc", "def xyz"}
joining three arguments into two.
Similarly, if an argument is File with Space and BackSlash\
(without quotes) the
resulting command line string should be quoted to keep it as a single string. With
first and last quotes added, the string becomes: "File with Space and BackSlash\"
.
Seems reasonable, but we need to look at how that will be parsed by the application.
One of the common parsers for the command line considers that a
backslash before double quote is a literal quote instead of an opening or closing
quote. In this case, the resulting command line is parsed as one argument:
File with Space and BackSlash"
and contains a final quote instead of backslash.
The most common parsing syntax for applications implemented in C,
C++, C# and others can correctly encode arguments with
spaces, special characters, literal quotes, and backslashes that are decoded by the
application to the original argument strings.
Other programs such as .cmd
and .bat
command lines are processed by a
command shell such as cmd.exe
use simpler rules that do not have an
effective way to encode literal quotes; so some argument strings containing
embedded quotes cannot be encoded.
The command shell, cmd.exe
, that handles .cmd
and .bat
also implicitly enables
interpretation of special characters for redirection that allow file access and
pipelines that invoke other programs`. Both of these cases if dubious encoding
pose a security risk if not carefully used and reviewed.
In JDK 18 and earlier on Windows, the default for ProcessBuilder
and Runtime.exec
the encoding of arguments is quite lenient, allowing unmatched quotes, and unquoted
special characters that can merge or split arguments. The absence of checks allows
various forms of ambiguous commands that cannot be reliably parsed to recover the
original arguments.
Stricter checking and argument encoding is supported in earlier versions but
the safety features are opt-in requiring the application to explicitly set a system
property or to use a security manager.
The safer modes help applications apply the recommendations of the
Secure Coding Guidelines for Java SE to avoid
risks such as injection attacks and unintended execution.
But most developers do not take advantage of the additional safety checks.
Changing the default to be more secure can reduce the risk of unidentified
execution and file access.
Description
We are changing the default for ProcessBuilder
and Runtime.exec
to require quoted
arguments to have properly balanced quotes and to guard against splitting or
merging of arguments.
For scripts such as .cmd
and .bat
, executed by shell programs, the encoding of
special characters such as < > & |
is modified to prevent implicit access to
shell features such as redirection and pipelines. This restriction is not applicable
when the shell is explicitly invoked; for example: cmd.exe
.
Most existing programs work as before, as the argument encoding is
straight-forward and unchanged.
To opt-out of argument checking for an argument it can be wrapped in triple
double-quotes. An application that disables the argument checking must be
carefully reviewed to avoid potential security risks.
The specific argument checks and encoding depend on whether the executable is an . exe
or
not. The executable is recognized by Windows and ProcessBuilder as an .exe
if the
file name ends in case-insensitive .EXE
or does not have a dot in the
filename. The special characters for .exe
and non-.exe
that are to be quoted
are defined by Windows for [C++ command line arguments] exe-quotes and
Cmd arguments as:
.exe
:space
tab
- non-
.exe
:space
tab
& < > [ ] | { } ^ = ; ! ' + , ~
ProcessBuilder
and Runtime.exec
handle argument encoding and passing without the
application needing to add or modify the argument except in a few unusual
cases. It is strongly recommended to supply arguments without attention to
double-quotes or special characters, allowing ProcessBuilder to handle any necessary encoding.
Argument encoding is performed as follows:
-
Any argument without double-quotes or special characters does not require encoding and is passed as-is.
-
Any argument containing a double-quote or any of the special characters but not starting with a single double-quote is encoded by adding leading and trailing quotes. For
.exe
programs, embedded quotes and backslashes are encoded as per main command arguments. For non-.exe
programs, backslash is not special and the argument can not contain quotes; an exception is thrown if quotes are present. -
Any argument starting with a single double-quote must correctly balance double-quotes according to main command arguments or cmd.exe Reference as appropriate for the
.exe
or non-.exe
. Special characters in the argument must be within double-quotes including space and tab. An exception is thrown if the double-quotes are not matched or if special characters are not within quotes. The quoting ensures the argument is not split into two arguments or joined with another argument. For compatibility with earlier versions, if there are one or more backslashes before the final quote it is encoded as per main command arguments. -
Any argument that begins and ends with triple double-quotes (
"""
) is inserted into the command line without the beginning and end triple double-quotes. This raw-string syntax can be used to opt-out of checking for quotes and special characters; it raises the visibility of the unique requirements of the argument and invites review.
For non-.exe
commands, the required quoting of special characters (<
, >
, &
, |
)
prevents the implicit use of redirection and pipelines. For use cases where the
application requires explicit use of the shell capabilities the application can
invoke cmd.exe /C
with the executable and its arguments. The arguments are checked and encoded as above
for .exe
applications. The same technique has been supported in earlier versions.
Extra precautions should be taken in creating the argument passed to a shell to avoid
unintentional and possibly risky side effects.
Note that on Linux and macOS, there is no checking for redirection and pipeline
characters in arguments. Those characters are only interpreted by command shells such
as sh
, bash
, or zsh
and pose a risk only if the executable is a shell. Note the risk of reading or
writing files can occur with any command as a normal argument or command option. The
protection afforded by restricting, on Windows, redirection and pipeline characters
is minimal and not consistent across operating systems. ProcessBuilder does not do
any checking or encoding for specific programs on any operating system.
The setting of a security manager does not have any effect on the
interpretation or encoding of command line arguments. This is a change from earlier
versions that use a safe mode similar to jdk.lang.Process.allowAmbiguousCommands=false
when a security manager is enabled. If a security manager is enabled, the
permission to execute the program is checked; this is unchanged from previous
versions.
Examples
The motivation for these changes showed some cases where the current command line encoding using the lenient mode may put the application at risk if it incorporates input from the environment or untrusted sources. The example below show how the application to use ProcessBuilder to avoid or mitigate those risks.
Using a List of Arguments instead of a Single String
To run a java Hello
program with a string containing spaces use separate arguments.
An array of strings is easier to use and more reliable than a single command line. When using a single string, it must be carefully encoded such that it can be decoded back into the individual arguments. The application is more complex because it must be aware of spaces, special characters, and encoding of quotes. The following may fail if spaces are dropped, added, or special characters are not quoted.
String cmd = "java" + " " + "Hello" + " " + "\"Now is the time.\"";
Process p = Runtime.getRuntime().exec(cmd);
Compare with using an argument list. The runtime handles the encoding of command and arguments containing spaces and quotes to keep arguments separate and distinct.
String[] args = {"java", "Hello", "Now is the time."};
Process p = Runtime.getRuntime().exec(args);
-or-
Process p = new ProcessBuilder(args).start();
Enabling redirection and pipelines
To run a command dir
and use a shell pipe to display the contents.
When invoking a normal executable, such as dir
the characters are not quoted and
are passed to the program unquoted. In this example, the arguments are passed to
the dir
command. dir
is not a shell and does not handle re-direction or pipeline special
characters. The output of dir
will say it can't find the file "|more".
List<String args = List.of("dir", "|more");
Process p = new ProcessBuilder(args).start();
To paginate the directory listing, cmd.exe
is used as the executable.
cmd.exe
invokes the dir
command and interprets the special character to pipe the
output to more
.
List<String args = List.of("cmd.exe", "/C", "dir", "|more");
Process p = new ProcessBuilder(args).start();
Redirection and pipelines with .cmd and .bat scripts
To run a .cmd
or .bat
script and redirect the output to a file.
If the executable is .cmd
or .bat
, not an .exe
, the arguments containing
special characters are quoted to avoid implicitly causing redirection, pipelines, or
group execution. In this case, the script will be passed the string">log.out"
and
redirection will not occur.
List<String args = List.of("log.cmd", ">log.out");
Process p = new ProcessBuilder(args).start();
To redirect the log output, cmd.exe
is used as the executable shell and is passed
the log.cmd
as the command to run and the requested redirection.
List<String args = List.of("cmd.exe", "/C", "log.cmd", ">log.out");
Process p = new ProcessBuilder(args).start();
The Default Command Line Encoding is Good Enough
Passing a directory name with spaces to an application.
A typical directory path may or may not contain spaces and end in a backslash ("\").
The straight-forward code works whether the application is an .exe
or non-.exe
.
Note: double backslash in the source is a single backslash in the Java string.
List<String args = List.of("cmd.exe", "/C", "do.cmd", "C:\\Program Files\\");
Process p = new ProcessBuilder(args).start();
ProcessBuilder encodes the argument (because of the space) by surrounding it with
double-quotes. The number of backslashes before the quote is doubled as required by
the C++ Command line encoding to ensure the quote is considered the matching final
quote and not considered a literal quote. The argument in the command line is
now:"C:\\Program Files\\\\"
In the case where the application is an .exe
executable, the command line is parsed
and the doubling of backslashes before the final quote is reversed, yielding the
original argument "C:\\Program Files\\"
.
In the case where the application is a .cmd
script, the argument string contains
the quotes and additional backslash, "\"C:\\Program Files\\\\\"
. Since the argument
is a Windows directory path, the addition of the backslash is benign and ignored when
doing file operations.
Compatibility
Command arguments that do not contain double-quotes or special characters work as before, there are no changes and no special encoding is needed. Arguments containing spaces or tabs also work as before. The most common case for existing applications is lenient mode in which there has been no checking for balanced quotes or use of special characters. The only encoding was done to add quotes to arguments that contain space or tab. While most applications have well-formed arguments, with these changes, exceptions are thrown for malformed arguments; the application should be corrected to balance the quotes.
In the lenient mode of previous releases, there has been no difference between
the encoding of arguments of .exe
and non-. exe
programs with respect to special
characters. With these changes, arguments for non-.exe
programs containing
characters : space
tab
& < > [ ] | { } ^ = ; ! ' + , ~
are quoted.
Existing double quotes in arguments, as long as they are balanced, are fine as is.
Arguments containing embedded quotes that should be passed as a literal
quote cannot be encoded and an exception will be thrown.
An additional fallback, is to invoke the shell, such as cmd.exe
and pass the
arguments as describe above.
For programs that cannot be updated, backward compatibility with the more lenient
mode is achieved by setting the system property
jdk.lang.Process.allowAmbiguousCommands=true
. The property is set on the
command line and cannot be changed using System.setProperty
and applies to
every use of ProcessBuilder.
Existing applications can be checked for compatibility using current releases by
setting jdk.lang.Process.allowAmbiguousCommands=false
on the command line. On JDK
18 and earlier, it performs stricter checking of quotes and special characters. Note:
the set of special characters for non-.exe
programs is expanded with this proposal.
On Linux and macOs operating systems, the jdk.lang.Process.allowAmbiguousCommands
system property is unused and arguments are passed literally.
Testing
Existing tests will be updated to verify the new encodings. Compatibility tests will
confirm lenient mode enabled with
jdk.lang.Process.allowAmbiguousCommands = true
has the same behavior as previous
JDK versions.
Commands will be tested for .exe
, non-exe
, and cmd.exe /C
use cases.
Risks and Assumptions
The argument passing to start a process is heavily dependent on the interpretation of quotes and special characters in application parameters, the subsequent encoding of those parameters to create a Windows command line and corresponding parsing of the command line to recover the arguments. There is a risk that making the interpretation stricter may throw exceptions in cases that previously were allowed or the command line may be encoded differently.
The interpretation of quotes and special characters is very close to the lenient
mode, the most frequent use case in the absence of a security manager. The lenient
mode can be re-enabled by setting the jdk.lang.Process.allowAmbiguousCommands
system property.