Due to the rising complexity and scale of systems and applications, we require novel, sophisticated and easy-to-use approaches for performance analysis at scale. We are developing such techniques, which we are deploying in close collaboration with our code teams. Our projects include Open|SpeedShop and eGprof. We are also designing novel techniques to rebalance application load at scale based on Libra.
The traditional debugging paradigm in which interactive debuggers attach to all processes of a parallel application and then step through individual lines of code, overwhelm the user beyond a few dozen processes while our systems already have hundreds of thousands. Thus, we are developing new paradigms that overcome this limitation. Our first approach identifies process equivalence classes through STAT to reduce process space to a manageable size. In our alternative approach, we are designing and implementing novel techniques that enable automatic root cause analysis in collaboration with the Cooperative Bug Isolation Project at the University of Wisconsin.
Modern high-performance, scientific applications are typically composed from multiple internal and third-party libraries written in different languages. Our research in scientific middleware provides high-performance interfaces between software components written in different languages and the basic infrastructure for hybrid parallel-distributed programming models.
Our research focuses on open source compiler infrastructure to build source-to-source program transformation and analysis tools for large-scale Fortran, C, C++, OpenMP, and UPC applications. Our technology is particularly well suited for building custom tools for static analysis, program optimization, arbitrary program transformation, performance analysis, and cyber-security.
As high performance computing systems increase in scale, applications running on them experience increasingly frequent hard and soft failures due to larger component counts in the machines. Already, traditional approaches to fault tolerance are reaching practical scaling limits on today's systems. To address the needs of current and future systems, our research focuses on new scalable hierarchical, application-level checkpointing through the SCR library, and algorithmic-based fault tolerance techniques.
In order to efficiently exploit new architectures with wide multi- and many-core nodes, as well as accelerators, we are require hybrid approaches to programming large scale applications. We are investigating extensions of OpenMP and MPI that support their interoperability and fully exploit these node architectures. To achieve these goals, we are leading the OpenMP Language Committee and actively participating in the MPI Forum.
Data-centric computing concerns the acquisition, processing, analysis, storage, and query of data sets and streams. Data typically originates from external sensors. We are interested in computing architectures, systems, software infrastructure, algorithms and applications optimized for the analysis of large data sets or high bandwidth data streams.
Tools for extreme scale machines and applications must themselves be highly scalable in terms of process counts and in their ability to gather and to analyze huge data volumes. We are developing several modular and widely useful infrastructure components that satisfy these requirements for a range of tools. Our projects include P^nMPI, a virtualization layer for PMPI tools, and Muster, a framework for scalable parallel clustering.
This work focuses on the analysis of large data sets to extract high-level features to facilitate understanding of the information contained in the data. Typical data sets include video, collections of images, streaming data from network or environmental sensors, text collections, and graph networks.
We are developing scalable methods to manipulate and analyze very large data sets. This research involves developing novel out-of-core representations that optimize data access as well as algorithms that traverse and manipulate data efficiently.
Projects in this area perform research and development of scalable linear solvers, particularly multigrid and multilevel methods, as well as nonlinear solvers and robust time integrators. They also produce software for high performance computers.
Multiple projects in this area explore block-structured methods for focusing resolution in parts of a simulation domain where it is most needed, while avoiding the overheads associated with unstructured meshes and providing a natural decomposition for parallel architectures.
Research in this area focuses on novel methods for solving partial differential equations (PDEs). This includes an object-oriented framework for PDEs on complex domains as well as the development of numerical methods for wave propagation simulation.
This research focuses on the investigation and implementation of novel radiation and neutron transport solver techniques. Major focus is on the development of solution methods that scale to exascale architectures, discretization methods that are free of ray effects, and improvement of existing solvers.
CASC contributes to the modeling and prediction of materials properties through challenging computational approaches. These include solving molecular dynamics equations based on first principles as well as extending traditional molecular dynamics simulations to hydrodynamics scales.
CASC contributes to several large-scale simulation codes for complex applications, both internally and in collaboration with other research groups. These simulations often involve separate packages for different physical processes, with both software interface and numerical analysis issues to be addressed in order to make all components work together effectively.