JEP 107: Bulk Data Operations for Collections

Owner	Mike Duigou
Type	Feature
Scope	SE
Status	Closed / Delivered
Release	8
Component	core-libs
JSR	335
Discussion	lambda dash dev at openjdk dot java dot net
Effort	L
Duration	XL
Depends	JEP 109: Enhance Core Libraries with Lambda
	JEP 126: Lambda Expressions & Virtual Extension Methods
Endorsed by	Brian Goetz
Created	2011/09/23 20:00
Updated	2024/04/22 16:06
Issue	8046097

Summary

Add functionality to the Java Collections Framework for bulk operations upon data. This is commonly referenced as "filter/map/reduce for Java." The bulk data operations include both serial (on the calling thread) and parallel (using many threads) versions of the operations. Operations upon data are generally expressed as lambda functions.

Goals

Provide new features for bulk data processing utilizing lambda functions including parallel operations

Non-Goals

Convert existing usages to parallel operation.

Motivation

FlumeJava as used by Google internally and PLinq offered by Microsoft are the most directly similar offerings. Linq and Plinq in particular are seen as extremely valuable by .NET developers and a subject of much envy by Java developers.

The primary benefits are for developers currently building single threaded business process applications. To be able to take advantage of concurrency with minimal changes to their application is expected to be a huge benefit.

Description

The serial implementation provides a bridge from existing collections bulk-data operations to parallel operation that does not change the threading model of the application.

The parallel implementation is the central element of this feature. The parallel operation provides the opportunity to accelerate operations upon large amounts of data by dividing the task between multiple threads (processors). The parallel implementation builds upon the java.util.concurrency Fork/Join implementation introduced in Java 7.

For both the serial and parallel implementations an "eager" mode and a "lazy" mode are possible. In eager mode operations upon data are performed directly upon the data at the time the operation function is invoked. In lazy mode the operations upon the data are deferred until the final result is requested. Lazy mode operation allows the implementation more optimization opportunities based upon reorganization of the data and operations to be performed.

Testing

Benchmarking and performance regression testing is going to be critical to delivering a quality final product.

This work will require significant hardware resources to fully test. ie. dedicated 8+ core systems for all of the primary supported platforms.

Dependences

Lambda language changes
Core libraries changes described in JEP 109
Involvement from JSR 335 EG and JSR 166 EG and Doug Lea in particular

Impact

Compatibility: Forward compatibility only
Security: Standard
Performance/scalability: Significant testing and benchmarking required
User experience: None
I18n/L10n: None
Portability: 100% java implementation. No native code planned.
Packaging/installation: Delivered as part of JRE install
Documentation: Standard
TCK: No special requirements. New TCK tests will be required.
Internationalization: Same as JCF
Localization: None