Exascale-class systems will exhibit a new level of complexity, in terms of their underlying architectures and their system software. At the same time, the complexity of applications will rise sharply, both to implement new science possible at exascale and to exploit the new hardware features necessary to achieve exascale performance. To overcome these limitations and to enable applications to reach exascale performance, users will expect a new generation of tools that help them address these complexities and fully exploit the systems' performance. These tools need to help users address the bottlenecks of exascale machines, work seamlessly with the programming models on the target machines, scale to the full size of the machine, provide the necessary automatic analysis capabilities, and be flexible and modular enough to overcome the complexities and changing demands of exascale architectures.
At LLNL, we are addressing these challenges and requirements through a series of tool research and development efforts targeting performance analysis, optimization, debugging, and correctness tools, as well as automatic tuning and optimization capabilities. In addition, these efforts are supported by activities on tool infrastructures that enable us to work with more modular tool designs and support rapid tool prototyping.
Each of the following focus areas has several projects and/or pieces of software associated with it. Please see right sidebar for a list of current projects.
To reach exascale, performance tools will no longer be a luxury for power users, but an essential infrastructure for guiding both application and the software stack developers to exascale. Performance tools for exascale systems must help developers identify shortcomings in their codes and the software stack as well as provide on-line performance feedback to guide runtime adaptation. As such, we require a series of tool sets that not only provide a range of advanced analysis capabilities, but that are intuitive and easy-to-use as well as available across a range of platforms.
The increased complexity and core counts of exascale systems will diminish the effectiveness of traditional interactive debuggers. To cope with this complexity, application developers will need additional tools that can help users to either automatically or semi-automatically reduce the problem to smaller core counts or to detect the problem itself.
The use of generic and modular components will be key to achieving manageable and scalable tools. Each functionally separable part of a tool should be implemented as its own component, which then is made available as part of a component library. Tools can assemble these components into a full end-to-end solution with minimal glue code.
LLNL's Stack Trace Analysis Tool helps users quickly identify errors in code running on today's largest machines.
Todd Gamblin leads the PAVE project, which develops performance data visualization techniques that are more intuitive for application scientists.