MACSio (pronounced “max-ee-oh”) was developed to fill a long existing void in co-design proxy applications that allow for I/O performance testing as well as evaluation of tradeoffs in data model interfaces and parallel I/O paradigms for multi-physics, HPC applications. Two key design features of MACSio set it apart from existing I/O proxy applications and benchmarking tools. The first is the level of abstraction (LOA) at which MACSio is designed to operate.
Levels of Abstraction (LOA) in the HPC I/O stack (left), typical abstraction objects (middle)
and example implementations (right)
The second is the degree of flexibility MACSio is designed to provide in driving an HPC I/O workload through parameterized, user-defined data objects and a variety of parallel I/O paradigms and I/O interfaces. Combined, these features allow MACSio to closely mimic I/O workloads for a wide variety of real HPC applications and, in particular, multi-physics applications where data object distribution and composition vary dramatically both within and across parallel tasks. These data objects are then marshaled between primary and secondary storage according to a variety of application use cases (e.g. restart dump or trickle dump) using one or more I/O interfaces (plugins) and parallel I/O paradigms, allowing for direct comparisons of software interfaces, parallel I/O paradigms, and file system technologies with the same set of customizable data objects.
Block Diagram of MACSio main and I/O plugins. Uni-modal plugins manage data only in files
with bi-modal plugins manage data both in files and in memory.
I/O Performance Characteristics
Here we show (in red) typical I/O performance characteristics as a function of request size for various layers of software in the HPC I/O stack. For any given layer, performance typically always improves with increasing request size. This is because all overheads are amortized over larger and larger transfers. Because higher layers in the stack have more overhead due to additional metadata necessary to implement the cooresponding abstractions, performance is typically lower for a given request size with each layer uppwards in the stack. This is demonstrated by the gaps between the red bandwidth vs. request size performance curves. In general, if an application is able to generate larger requests, these overheads can be amortized away to insignificance.
Typical I/O performance as a function of request size (red).
Typical I/O request size histogram (yellow) as percent of total dump.
We also show (in yellow) an I/O request histogram for a typical restart dump as a percentage of total bytes in the dump. A given bar indicates the percent of total bytes in the dump that were transferred at that request size. In fact, we show two different categories of requests. Those that originate from the application itself (solid yellow) as well as those that originate from one or more of the lower layers (hashed yellow) in the I/O stack on behalf of the application (typically metadata associated with the abstractions). In this example a majority of the smaller requests originated from the application itself. This suggests that the application could be adjusted to aggregate many of its smaller requests into a single larger request and experience improved performance. By appropriate use of timing and request size information gathered from within MACSio and its I/O plugins, this kind of detailed application I/O emulation and performance analysis is possible.
Download MACsio at Github
MACsio Doxygen docs
MACsio design document
“Replicating HPC I/O workloads with proxy applications” | 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS) | James Dickson, Stephen Wright (University of Warwick), Satheesh Maheswaran, Andy Herdman (UK Atomic Weapons Establishment), Mark Miller (Lawrence Livermore National Laboratory)