CHAI: Copy Hiding Application Interface
As Lawrence Livermore advances toward exascale computing—machines capable of one billion billion calculations per second—existing codes must scale without performance degradation. The Livermore Computing (LC) Division, responsible for the systems and tools that support the Laboratory’s first-class computational infrastructure, is keenly aware of the challenges of extreme-scale goals, such as the portability of million-line codes across diverse hardware architectures. LC’s Advanced Architecture Portability Specialists (AAPS) have developed a new abstraction model called CHAI (Copy Hiding Application Interface) that automates the movement of data between main memory and other memory locations with minimal intrusion on the source code. This streamlined process provides productivity and efficiency boosts. “One of our objectives is to develop tools that make the Laboratory’s codes flexible for future generations of hardware,” notes David Poliakoff, who co-leads the CHAI initiative with Holger Jones.
ALE3D, a prominent shock hydrodynamics code developed at the Laboratory, uses arbitrary Lagrangian-Eulerian techniques to solve three-dimensional problems. The need to simulate the fluid and elastic behavior of materials in extreme conditions, such as the thermodynamic response of an underwater explosion, makes this code a mainstay in research areas such as heat conduction and chemical kinetics. Accordingly, its evolution into an exascale environment is crucial. RAJA, a Livermore-developed portability management application, helps codes like ALE3D by reducing source code disruption during porting. Although RAJA dispatches looped constructions between graphics processing units (GPUs) and central processing units (CPUs), it is not designed to manage the data being moved. The burden is on the developer to manually designate which data to copy before moving—a process that takes time and focus away from core physics simulation tasks, such as fine-tuning the behavior of temperature exchange in a fluid. CHAI resolves the resulting productivity drag through automation. “RAJA ships the code while CHAI ships the data,” says Poliakoff of the complementary relationship.
CHAI assembles and stores data in a unique way. The physical properties of a simulated system, such as temperature data points, can be organized into an array. Arrays can be assembled into larger arrays, which in turn can be organized into even larger arrays, and so on. How arrays are stored in memory affects the movement of data between storage locations. Using a concept called a ManagedArray, CHAI maintains (or “hides”) multiple copies of an array in different storage locations, communicating access and modification information to RAJA. The user can select various memory configurations and plug-ins, and then let CHAI determine how to allocate different types of memory, including high bandwidth and shared memory. This process allows data within thousands of loops to be moved easily when needed and to remain in place when not needed.
Figure 1. For designated RAJA loop structures, CHAI can move multiple arrays of data points simultaneously between CPU and GPU architectures without consuming space on unneeded arrays.
To test these portability benefits, the AAPS team first created a CHAI prototype for use with a proxy application containing only 8,000 lines of code. According to Poliakoff, this “small playground” enabled the team to iterate quickly. “Before spending resources on moving CHAI to ALE3D, we needed to test multiple configurations and scenarios on a smaller, more accessible code base,” he explains. The team compared the CHAI storage solution to a unified memory architecture, an optimized setup in which a GPU shares system memory. The CHAI prototype outperformed expectations. “We anticipated sacrificing some efficiency in favor of productivity. We hoped the prototype would be comparable to the architecture-specific solution, but it actually ran 15 percent faster,” says Poliakoff. “The efficiency gains we saw in the prototype are huge when CHAI is integrated with larger codes.”
Scaling the CHAI solution from an 8,000-line code to a code hundreds of times larger, like ALE3D, inevitably brings complexity to LC’s exascale strategy. Such data management must adapt to more overhead—not only more data, but more complex data structures, storage and access mechanisms, and what Poliakoff calls “adventurous” execution patterns. The AAPS team adapted CHAI to fit these requirements. “We worked through the software complexity with a long series of refinements and compromises, with trades involving execution speed, portability, and elegance,” explains Jones. “The process has taken many months, some of which is still going on today.” This approach has proved successful, as ALE3D developer Peter Robinson reports: “CHAI’s programming model, when combined with RAJA, provides ALE3D with a risk mitigation strategy that, once implemented, should give us the means to easily port our code to future architectures with heterogeneous memory architectures, and gives us a strategy for prepping our code for [the Laboratory’s forthcoming advanced technology system] Sierra.”
Laboratory colleagues have begun adopting CHAI for other three-dimensional simulation codes, and Jones and Poliakoff plan to continue the ALE3D rollout, collaborate with Livermore’s Co-Design Center, and socialize CHAI with computation teams looking for memory management solutions. In addition, the AAPS team is leveraging a series of clinics at Harvey Mudd College (Claremont, California) in which students learn how to perform software engineering tasks while helping expand CHAI’s capabilities. Finally, the team has made the application available to the open-source community via GitHub.