Understanding and optimizing the performance of large-scale simulations generally relies on the availability of two data types: hardware or system software performance measurements and the context for these measurements (such as the processor or call-path involved). While the HPC community has developed a number of tools for measurement data collection, they are often geared towards specific types of analyses and maintain only limited and specialized contextual information. Unlike the more monolithic code bases of the past, however, modern applications consist of a complex network of physics packages, numerical solvers, and support libraries. Thus a comprehensive understanding of performance behavior requires the ability to compile, analyze, and compare measurements and contexts from many independent sources.
Caliper, a general-purpose application introspection system, makes that task easier by acting as the “glue” that connects various independent context annotations, measurement services, and data processing services. As both a flexible model for representing performance-related data and a transparent library for combining performance metrics and program context information, Caliper enables users to build customized performance measurement and analysis solutions.
Caliper’s key capability, composing information from different context annotations and measurement providers, is made possible by two features. First, Caliper is based on a generic attribute:value data model that allows storage of any kind of data. In contrast, classic performance analysis tools only support few, non-extensible types of automatically derived or user-provided context information. Second, Caliper provides a process-wide, in-memory data store, which automatically combines information provided by individual annotations and attached measurement plug-in services at runtime. This allows developers to independently annotate application modules, libraries, or runtime system components, and correlate measurement data and contextual information across the software stack.
Caliper creates a single context stream of metadata and contextual snapshot records for each process. The streams are currently written to disk for offline analysis. Because of its flexible data format, importing Caliper data into external data analytics and visualization tools is a straightforward task.
We have used Caliper on various projects, including the instrumentation of several components of an LLNL radiation hydrodynamics code. Although each component is instrumented individually, Caliper allows us to look at how the components impact one other. For example, the ability to annotate the domain sizes produced by the AMR library SAMRAI and the iteration count in the HYPRE solver assisted application developers in studying the effects of the domain sizes on solver convergence. Because our model handles shared context, existing instrumentation does not interfere with new instrumentation, making it easy to add new annotations as application developers see fit. This study helped to demonstrate how Caliper’s process-wide data store offers a new path towards effective and insightful performance analysis for complex HPC applications.