Notes on Asynchronous I/O

Notes on the Asynchronous I/O implementation

(November 2008)

Fixed Thread Pool

An asynchronous channel group associated with a fixed thread pool of size N, submits N tasks that wait on I/O or completion events from the kernel. Each task simply dequeues an event, does any necessary I/O completion, and then dispatches directly to the user's completion handler that consumes the result. When the completion handler terminates normally then the task returns back to waiting on a next event. If the completion handler terminates due to an uncaught error or runtime exception then the task terminates and is immediately replaced by a new task. This is depicted in the following diagram:

This configuration is relatively simple and delivers good performance for suitably designed applications. Note that it does not support the creation of threads on-demand or trimming back of the thread pool when idle. It is also not suitable for applications with completion handler implementations that block indefinitely; if all threads are blocked in completion handlers then I/O events cannot be serviced (forcing the operating system to queue accepted connections for example). Tuning requires choosing an appropriate value for N.

User-supplied Thread Pool

An asynchronous channel group associated with a user-supplied thread pool submits tasks to the thread pool that simply invoke the user's completion handler. I/O and completion events from the kernel are handled by one or more internal threads that are not visible to the user application. This configuration is depicted in the following diagram:

This configuration works with most thread pools (cached or fixed) with the following exceptions:

The thread pool must support unbounded queueing.
The thread that invokes the execute method must never execute the task directly. That is, internal threads do not invoke completion handlers.
Thread poool keep alive must be disabled on older editions of Windows. This restriction arises because I/O operations are tied to the initiating thread by the kernel.

This configuration delivers good performance despite the hand-off per I/O operation. When combined with a thread pool that creates threads on demand, it is suitable for use with applications that have completion handlers that occasionally need to block for long periods (or indefinitely). The value of M, the number of internal threads, is not exposed in the API and requires a system property to configure (default is 1).

Default Thread Pool

Simpler applications that do not create their own asynchronous channel group will use the default group that has an associated thread pool that is created automatically. This thread pool is a hybrid of the above configurations. It is a cached thread pool that creates threads on demand (as it is may be shared by different applications or libraries that use completion handlers that invoke blocking operations).

As with the fixed thread pool configuration it has N threads that dequeue events and dispatch directly to the user's completion handler. The value of N defaults to the number of hardware threads but may be configured by a system property. In addition to N threads, there is one additional internal thread that dequeues events and submits tasks to the thread pool to invoke completion handlers. This internal thread ensures that the system doesn't stall when all of the fixed threads are blocked, or otherwise busy, executing completion handlers.

What happens when an I/O operation completes immediately?

When an I/O operation completes immediately then the API allows for the completion handler to be invoked directly by the initiating thread if the initiating thread itself is one of the pooled threads. This creates the possibility that there may be several completion handlers on a thread's stack. The following diagram depicts a thread stack where a read or write method has completed immediately and the completion handler invoked directly. The completion handler, in turn, initiates another I/O operation that completes immediately and so its completion handler is invoked directly, and so on.

By default, the implementation allows up to 16 I/O operations to complete directly on the initiating thread before requiring that all completion handlers on the thread stack terminate. This policy helps to avoid stack overflow and also starvation that could arise if a thread initiates many I/O operations that complete immediately. This policy, and the maximum number of completion handler frames allowed on a thread stack is configured by a system property where required. An addition to the API, in the future, may allow an application to specify how I/O operations that complete immediately be handled.

Direct Buffers

The asynchronous I/O implementation is optimized for use with direct buffers. As with SocketChannels, all I/O operations are done using direct buffers. If an application initiates an I/O operation with a non-direct buffer then the buffer is transparently substituted with a direct buffer by the implementation.

By default, the maximum memory that may be allocated to direct buffers is equal to the maximum java heap size (Runtime.maxMemory). This may be configured, where required, using the MaxDirectMemorySize VM option (eg: -XX:MaxDirectMemorySize=128m).

The MBean browser in jconsole can be used to monitor the resources associated with direct buffers.