Writing portable code is still one of the largest challenges as we approach the exascale era where architectures are necessarily becoming more complex and more diverse.
Rob Neely
grid of squares representing GPUs and CPUs

Multi-Lab Workshops Explore Advances in HPC Performance Portability and Productivity

Monday, March 25, 2019

The 2019 Department of Energy (DOE) Performance, Portability and Productivity meeting is slated for April 2–4, 2019, in Denver, CO, where attendees will have the opportunity to share ideas and updates on performance portability—the ability for applications to be used effectively on different systems without the need for extreme customizations—across the DOE’s current and future supercomputers.

These machines are used for high performance computing (HPC) and simulation, which have transformed the way research is conducted at the DOE labs. But as consistent exponential gains in computing power slows down every year, architectures are changing in response and becoming more complex to program, and applications are necessarily reacting by retargeting their software to new and diverse HPC offerings. This, in turn, is unleashing a new set of challenges around performance portability in the HPC community, and Lawrence Livermore is helping lead the discussion about them and demonstrating workable solutions.

Rob Neely, a Livermore technical manager with a broad responsibility for helping coordinate activities around HPC applications, computer science and math research, and supercomputing systems that are critical to the Laboratory’s mission, is a co-organizer of the April meeting. This is the fourth in a series of workshops on the topic he has helped coordinate, and it is on track to be the biggest yet. Ian Karlin from Livermore Computing has been actively involved as a regular steering committee member for the meetings as well.

Workshop Evolution

The first such DOE performance portability meeting—which Neely spearheaded in response to major shifts in supercomputing and corollary new machines at the Argonne, Oak Ridge, Los Alamos, Lawrence Berkeley, Sandia, and Lawrence Livermore national laboratories—was held in 2016 in Glendale, AZ. This workshop, the Centers of Excellence Performance Portability Meeting, provided a chance for the laboratories’ Centers of Excellence (collaborations between each site and their HPC vendors) “to discuss the challenges with making our large and complex codes run efficiently across all of these platforms when planned systems were diverging along many-core and heterogeneous-node architectures,” Neely says.

In other words, code written as general purpose and portable should still run at about the same speed and efficiency as if it were tailored to the machine. Similarly, an application should run as well on a system sited at Livermore as it does at other sites. This concept of performance portability stipulates that codes need to perform with minimal changes across architectures, whether this occurs between facilities’ systems or with a next-generation computer.

The initial meeting was followed by a second meeting in 2017 in Denver, CO, which Los Alamos National Laboratory organized. Afterward, organizers decided to expand the focus of subsequent meetings to include the important concept of productivity. In November 2018, the first International Workshop on Performance, Portability and Productivity in HPC (P3HPC) debuted at the SC18 Supercomputing Conference in Dallas, TX. The half-day workshop aimed to improve supercomputing performance and develop solutions for applications’ portability while bringing the third P, productivity, into the mix.

Productivity can be viewed as how easy writing the code is, with it also being performance portable. This quality is difficult to measure, as it can be somewhat subjective, but as earlier workshops had made clear, performance-portable solutions were not highly attractive if the effort level to develop and maintain the code was excessive.

Regarding the 2018 meeting’s attendance, Neely notes, “This P3HPC workshop at SC18 also marked the first time we’ve really expanded our audience beyond just the DOE staff and vendor partners.” Approximately 80–100 people attended, and 8 researchers presented their findings and participated as panel members for a discussion. He met researchers from institutions he had not worked with before.

“I’ve been lucky enough to be immersed in the HPC community as it’s grown,” Neely says, “and there’s nothing like face-to-face interactions in a dynamic environment to spur excitement about working with others on challenging problems.”

Progress toward Portability

Several strategies are underway at Livermore to further portability efforts. RAJA is a software library initiated by computer scientists Richard Hornung and Jeff Keasler and now maintained by a growing number of Livermore computer scientists and external collaborators. It allows developers to write application code that can run efficiently on different architectures; it has been used extensively to prepare Livermore applications for Sierra.

“RAJA is a key part of the production code development environment here at Livermore,” Hornung says. “Usage is expanding outside of Livermore. So far, it seems to be doing what we hoped it would do. I think users are generally pretty pleased with it—especially those who have tried to write code using other programming models directly and understand the details of those models.”

Umpire and CHAI are two other Livermore-developed technologies the Laboratory uses in its portability efforts. While the focus of RAJA is execution portability, these complementary efforts help make memory management on heterogeneous architectures portable.

Writing portable code, Neely notes, “is still one of the largest challenges as we approach the exascale era where architectures are necessarily becoming more complex and more diverse.”

Figure: LLNL computer scientists have been developing performance portability solutions to ensure that programs designed for homogenous computer nodes, which feature just central processing units (CPUs), can run efficiently on the Sierra supercomputer’s heterogeneous nodes, which feature both CPUs and graphics processing units (GPUs). (Click to enlarge.)

Looking Ahead

The April meeting is open to application developers preparing code for DOE platforms, collaborators at universities and other organizations, vendors, and solution providers who are developing tools to improve performance portability and productivity. It is organized by Lawrence Berkeley, which is home to the National Energy Research Supercomputing Center.

Having reviewed the registration list for April’s meeting, Neely notes participation from new areas, particularly in Europe, which he attributes to the previous workshop’s broader audience. Also, at two and a half days, this springtime meeting will be more comprehensive.

“We extended explicit invitations to people we knew will give good presentations and who can cover particular topics,” Neely says. “We then put out a call for abstracts for others in the community and invited them to give a talk or present a poster where they can discuss more informally what they’re doing.”

Neely explains that, since the initial 2016 meeting, in addition to graphic processing units emerging as the most competitive accelerator technology in computing, one of the major developments has been the increase in hard data providing evidence of true performance portability—as opposed to reports of just what the research would hopefully provide. Speakers at the April meeting will be encouraged to present their findings, including their failures, rather than only ideas.

Neely notes that it is difficult to say how long meetings on performance portability will be needed. “We wouldn’t need these anymore if architectures have either ‘settled down’ enough and the languages, the language standards, and the compilers that support them have all matured enough to meaningfully solve the problem,” he says. If developers can write software using a particular language and standard, different compiler vendors will support it with their compilers, and the code will run well, then these annual meetings would likely no longer be necessary.

Innovation in HPC, regardless, will continue. Neely cites advancements in auto manufacturers’ use of HPC simulation as a major tool for improving automotive safety—something he was recently the direct beneficiary of. During a business-related trip with colleagues to Santa Fe, NM, in the fall, their car was broadsided by another speeding through a red light. Everyone walked away without injuries and just a ringing in their ears from the airbags, but the experience left a strong impression about how safe modern cars have become.

“Had we been in a car developed 15 or 20 years ago, we likely could have been seriously injured,” Neely says.

For HPC to continue making these inroads outside of just the national labs, according to Neely, “We need to continue to share our experiences and help shape the conversation with this broader audience.”

HPC is critical for a range of other applications, according to Neely, such as weather prediction (he worked on climate modeling while interning at Argonne National Laboratory as an undergraduate in the early 1990s and has been at Livermore since 1994), earthquake modeling (for example, by performing simulations to plan for evacuation routes and where to send first responders), medical research (using modeling to design better drugs and more personalized medicine), and artificial learning (supporting new scientific insights), as well as the national security drivers of the Advanced Simulation and Computing Program that support the National Nuclear Security Administration’s mission at the Laboratory.

“The Stockpile Stewardship Program here has an insatiable appetite for computing,” Neely says, “so our ability to run our applications anywhere—whatever the architecture might be—using these performance portability techniques will be a huge benefit to our mission.”

— Steve Tanamachi