FGFS: Fast Global File Status

Motivation

Large-scale systems typically mount many different file systems with distinct performance characteristics and capacity. Applications must efficiently use this storage in order to realize their full performance potential. Users must take into account potential file replication throughout the storage hierarchy as well as contention in lower levels of the I/O system, and must consider communicating the results of file I/O between application processes to reduce file system accesses. Addressing these issues and optimizing file accesses requires detailed runtime knowledge of file system performance characteristics and the location(s) of files on them.

Fast Global File Status

FGFS is an open-source package that provides scalable mechanisms and programming interfaces to retrieve global information of a file, including its degree of distribution or replication and consistency. FGFS uses a novel node-local technique (packaged up in a separate package called Mount Point Attributes), which turns expensive, non-scalable file system calls into simple string comparison operations. FGFS raises the namespace of a locally-defined file path to a global namespace with little or no file system calls to obtain global file properties efficiently. Our evaluation on a large multi-physics application shows that most FGFS file status queries on its executable and 848 shared library files complete in 272 milliseconds or faster at 32,768 MPI processes. Even the most expensive operation, which checks global file consistency, completes in under 7 seconds at this scale, an improvement of several orders of magnitude over the traditional checksum technique.

Publications and Talks

Dong H. Ahn, Michael J. Brim, Bronis R. de Supinski, Todd Gamblin, Greg L. Lee, Matthew P. LeGendre, Barton P. Miller, Adam Moody, Martin Schulz, Efficient and Scalable Retrieval Techniques for Global File Properties, in the Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Boston, MA, May, 2013. LLNL-PROC-554055

Dong H. Ahn, Efficient and Scalable Retrieval Techniques for Global File Properties, technical talk at the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Boston, MA, May, 2013. LLNL-PRES-636652

Team

LLNL: Dong H. AhnBronis de Suspinski | Todd Gamblin | Greg Lee | Adam Moody | Matthew LeGendre, LLNL | Martin Schulz, LLNL

External: Michael J. Brim, ORNL | Barton P. Miller, UW