VPC: Variable Precision Computing

Decades ago, when memory was a scarce resource, computational scientists routinely worked in single precision and were more sophisticated in dealing with the pitfalls finite-precision arithmetic.  Today, however, we typically compute and store results in 64-bit double precision by default even when very few significant digits are required.  Often more precision is used as a simple guard against corruption from roundoff error instead of taking the time to ensure algorithms are robust to roundoff.  In other cases, only isolated calculations require additional precision (e.g., tangential intersection of a ray with a surface in computational geometry).  Many of the 64 bits are representing errors – truncation, iteration, roundoff – instead of useful information about the solution.  This over-allocation of resources is wasteful of power, bandwidth, storage, and FLOPs; we communicate and compute on many meaningless bits and do not take full advantage of the computer hardware we purchase. 

Because of the growing disparity of FLOPs to memory bandwidth and the rise of General-Purpose GPU (GPGPU) computing – which has better peak performance in single precision – there has been renewed interest in mixed precision computing, where tasks are identified that can be accomplished in single precision in conjunction with double precision.  Such static optimizations reduce data movement and FLOPs, but their implementations are time consuming and difficult to maintain, particularly across computing platforms.  Task-based mixed-precision would be more common if there were tools to simplify development, maintenance, and debugging.

But why stop there?  We often adapt mesh size, order, and models when simulating to focus the greatest effort only where needed.  Why not do the same with precision?  We envision a framework where simulations can dynamically adjust precision at a per-bit level depending on the needs of the task at hand.  Just as adaptive mesh resolution frameworks adapt spatial grid resolution to the needs of the underlying solution, our system will locally provide more or less precision.  It will not be easy, however, to upend the applecart of legacy floating-point.  Acceptance from the community will require that we address three concerns: that we can ensure accuracy, ensure efficiency, and ensure ease of use, including development, debugging, and in application. 

Enter the Variable Precision Computing (VPC) project, an LLNL Laboratory Directed Research and Development (LDRD) Strategic Initiative (SI) project led by the Computation directorate.  To achieve our vision within this project, we have organized our approach into three integrated and concurrent thrusts.  First, we will show significant and immediate gains by developing the algorithms and software to support the use of adaptive precision through the LLNL-developed floating-point compression algorithm, ZFP, and the hierarchical multiresolution data format IDX on data where errors do not accumulate, for example, in situ analysis, visualization, restart, data mining, tabular data, and storage.  This adaptive rate compression (ARC) will provide local adaptive precision, where the rate of compression will differ between regions of the data (temporally, spatially, and/or by variable) in response to the information content of the data.  Such adaptivity will further require the development of precision indicators.  In addition to software tools and libraries to support ARC, this thrust will also produce examples in laboratory applications that demonstrate the potential savings, which could be as high as a 10-20x compression of data for restart or up to 100x for visualization.

The next challenge is to introduce techniques in applications where the effects of limited precision can accumulate. In our second thrust, we first will consider the use of mixed precision using existing data types (16-, 32- and 64-bit), which will require careful consideration of fundamental questions in numerical analysis (error propagation and stability) and determination of classes of problems for which variable precision is appropriate.  Accordingly, we have identified applications in hydrodynamics, transport, first-principles molecular dynamics, combustion, and graph analysis on which we can demonstrate the advantages (and challenges) of variable precision computing.  We will explore techniques that combine adaptive layers of representation, similar to adaptive mesh refinement, using techniques similar to error transport and iterative refinement.  In addition, we will investigate the inline use of adaptive rate compression; prior work has shown that, with a fixed rate of 4x, inline compression in applications such as Miranda and pF3d reproduced uncompressed results with less than 1% error.  For all of these techniques, to facilitate adoption, we will develop automated source-to-source code translation tools and tools to illuminate the propagation of round-off errors through algorithms.  In addition, we will demonstrate in application codes that we can accurately estimate and control the errors from varying precision.  

In the third thrust, we will consider accumulating finite-precision errors in the context of new data representations.  In addition to improved arithmetic properties such as associativity, the recently proposed Universal number format (Unum) allows for a variable bit-length representation.  We will leverage the implementation of Unums developed in a current LDRD Feasibility study to investigate their potential within the context of lab-relevant exascale co-design mini-apps.  We will also investigate applying operations directly to the ARC data, which in effect treats the ZFP representation as a new data type.  A third research thread is to combine ARC with Unums to minimize both the compressed and uncompressed data.  This thrust contains the most revolutionary of our ideas, and its main result will be proof-of-concept demonstrations that can be used to motivate community and vendor adoption of successful techniques.