Floating-point data compression
High-precision numerical data from computer simulations, observations, and experiments is often represented in floating point, and can easily reach terabytes to petabytes of storage. Moving such large data sets to and from disk, across the internet, between compute nodes, and even through the memory hierarchy presents a significant performance bottleneck. To address this problem, we have developed lossy and lossless high-speed data compressors that can greatly reduce the amount of data stored and moved.
For lossless compression, where each and every bit of each floating-point number has to be exactly preserved without any loss in accuracy, our memory efficient streaming fpzip compressor usually provides 1.5x-4x data reduction, depending on data precision and smoothness.
To achieve much higher compression ratios, lossy compression is needed, where small, often imperceptible or numerically negligible errors may be introduced. Our zfp compressor for floating-point and integer data often achieves compression ratios on the order of 100:1, i.e. to less than one bit per value of compressed storage. zfp frequently gives more accurate results than competing compressors (including our own fpzip), while at up to 2 GB/s/core throughput also is many times faster. It can achieve an exact bit rate, ensure that reconstructed values are within an absolute error tolerance, or meet a specified precision requirement. zfp also comes with C++ compressed array classes that support random access and can be used in place of conventional C arrays or STL vectors, e.g. for numerical computations.
zfp and fpzip were both designed for compressing logically regular 1D, 2D, or 3D arrays of single- or double-precision floating-point numbers that exhibit spatial correlation (e.g. regularly sampled functions), and should not be used to compress unstructured data such as triangle mesh geometry, unorganized point sets, or streams of unrelated numbers. Think of fpzip as the floating-point analogue to PNG image compression, and zfp as advanced JPEG for floating-point arrays. Source code for both compressors is available for download below.
zfp is an open source C/C++ library for compressed floating-point arrays that support very high throughput read and write random access. zfp was written by Peter Lindstrom at Lawrence Livermore National Laboratory, and is loosely based on the algorithm described in the following paper:
zfp was designed to achieve high compression ratios and therefore uses lossy but optionally error-bounded compression. Although bit-for-bit lossless compression is not always possible, zfp is usually accurate to within machine epsilon in near-lossless mode, and is often orders of magnitude more accurate and faster than other lossy compressors.
zfp development is being funded by the US Department of Energy’s Exascale Computing Project and by the Advanced Simulation and Computing Program. Experimental features such as variable-rate random-access arrays are being investigated on LLNL’s Variable Precision Computing Project.
For more information on zfp, please see the tabs on the left.
fpzip is a library for lossless or lossy compression of 2D or 3D floating-point scalar fields. Although written in C++, fpzip has a C interface. fpzip was developed by Peter Lindstrom at LLNL, and is based on the algorithm described in the following paper:
Peter Lindstrom and Martin Isenburg, “Fast and Efficient Compression of Floating-Point Data,” IEEE Transactions on Visualization and Computer Graphics, 12(5):1245-1250, September-October 2006, doi:10.1109/TVCG.2006.143.
fpzip was primarily designed for lossless compression, but also has provision for lossy compression. For lossy compression, our zfp compressor often outperforms fpzip.
For information on the API, usage, and licensing, please see the header file inc/fpzip.h in the tar file.
Questions and Comments on either zfp or fpzip