StarSapphire: Data-driven Modeling and Analysis
Exa-DM: Enabling Scientific Discovery in Exascale Simulations

Extreme-scale systems are enabling the simulation of increasingly complex phenomena, with the output of these simulations being analyzed in different ways to gain deeper insights into the phenomena being modeled. Analysis that is motivated by scientific discovery is particularly challenging as we may not have a precise notion of what we are looking for in the data. This makes the analysis very iterative and interactive. It may involve re-running the simulation to write out additional data or generate data with a different set of input parameters.

As we prepare for exascale systems, the substantial changes expected in the architecture of these systems will affect the process of scientific discovery. As the I/O and storage systems are unlikely to provide the required capabilities at the exascale, it may not be possible to write out all the data from a simulation. A proposed solution is to move all the analysis “in situ”, and write out only the results, which hopefully, will be much smaller in size. Unfortunately, this idea is in direct conflict with the process of scientific discovery, which often involves addressing questions which have not been formulated when the simulation was run.

In this project, we investigate ways in which we can use data mining techniques to address this conflict and identify a middle ground where we reduce the amount of data output while ensuring that we write out enough data to support scientific discovery. Using the detection and tracking of coherent structures as an example problem, we consider:

  • exascale implementations of known algorithms,
  • automated detection of coherent structures,
  • general reduced representations of the data, and
  • enhancement of a data exploration tool to the exascale.

This work is being done using data from fusion and combustion simulations.

This project is joint with Prof. George Karypis, co-PI, University of Minnesota. Our collaborators providing data and domain expertise are Prof. Zhihong Lin from UC Irvine for fusion and Prof. Sean Garrick from University of Minnesota for combustion.