JEP 151: Compress Time-Zone Data

AuthorsStuart Marks, Darryl Mocek, Peter Jensen
OwnerStuart Marks
TypeFeature
ScopeJDK
StatusClosed / Withdrawn
Componentcore-libs
Discussioni18n dash dev at openjdk dot java dot net
EffortS
DuplicatesJEP 150: Date & Time API
Endorsed byBrian Goetz
Created2011/08/26 20:00
Updated2014/07/10 20:31
Issue8046141

Summary

Store time-zone data more efficiently, in a single compressed file rather than in one uncompressed file per zone.

Description

The original reason to keep time-zone data in individual uncompressed files, rather than in a single compressed file, was (we surmise) to optimize access to the data for a particular time zone and to reduce dynamic memory consumption.

Given that the data for a given zone is read only once, and most applications only use one or a few zones, this is probably not a major concern. It is entirely possible that the current implementation was simply more convenient, and that there was simply no requirement to justify the extra effort of using a compressed format.

For a large number of files of random size, the amount of disk overhead is expected to be number-of-files * 0.5 * file-system-block-size.

The block size on UNIX, Linux (including embedded Linuxes), and NTFS file systems is typically 4KB. There are 500+ time zone files, resulting in an expected overhead of about 1MB (in line with observations).

On a system with a smaller block size of 1KB we would still expect to see an overhead of about 250KB, or about 100% of the actual file size.

Options for reducing the dynamic footprint include:

  1. Store files in a zip/jar archive
  2. Use an embedded database

For (1) very minimal and localized changes are required to implement reading zip-file entries rather than individual files.

(2) requires a database. The performance characteristics of using a database are unknown. This may still be interesting if the future installed-module format already makes use of a database for efficient storage and access to items contained in a module.

Testing

Requires testing of the performance impact of retrieving time zone data, especially the first call to retrieve time zone data.

Requires changing the testing of the upgrade tools to ensure the time zone data has been written out properly.

Risks and Assumptions

Time-zone updates using a compressed format will not apply to older JDKs. This might require some duplication of effort, to provide updates in two different formats.

There is a risk of a decrease in performance when retrieving time-zone data, in particular for the first zone requested. In the case of a zip file, the performance decrease is expected to be small.

Impact