Working in CASC opens opportunities for me as a researcher.
Jayaraman Thiagarajan
Jay Thiagarajan presents at the BASCD conference
People highlight

Jayaraman Thiagarajan Pushes Data Science Forward

In 2013, Jayaraman “Jay” Thiagarajan happened upon a job opening for a postdoctoral researcher at LLNL’s Center for Applied Scientific Computing (CASC). He had not visited the Livermore campus as an intern, nor did he have any connection to scientific staff prior to applying. Originally from India, Jay joined CASC after completing his dissertation—Sparse Methods in Image Understanding and Computer Vision—at Arizona State University.

Today, Jay’s research has grown to include multiple related fields. “I work with different types of large-scale, structured data that require the design of unique machine learning [ML] techniques,” he says. This exploration ranges from deep learning–based graph analysis to ML and artificial intelligence (AI) solutions for computer vision, healthcare, language modeling, and scientific applications.

For example, Jay helped develop a new approach for clinical time-series analysis of electronic health records—a data type with great potential for making inferences with ML methods. The 2018 research was accepted at a premier AI venue, the AAAI Conference on Artificial Intelligence. In the completely different application of geophysics, his work to characterize geologic features using an ML regression technique was recognized in a high-impact journal.

Jay leads a Laboratory Directed Research and Development project on high-dimensional sampling that develops new methods that are provably effective for fundamental problems in data such as surrogate modeling and automatic ML. The algorithms from this project can save millions of central processing unit hours in simulations through reliable, informative experiment designs.

Jay also uses ML to improve the way high performance computing (HPC) systems work. By modeling execution time, power consumption, and other data, he and CASC colleagues developed a semi-supervised ML framework that identifies performance-optimizing configurations. He also developed PADDLE (Performance Analysis using a Data-Driven Learning Environment), which enables more efficient user-centric workflows for tasks like identifying causes of network congestion. HPC codes running on these systems can similarly benefit from Jay’s optimization analysis.

“Explainable AI is another area I’m very excited about,” adds Jay. Research in this domain strives to make ML models more trustworthy so scientists can act on reliable predictions—in other words, ensuring the model makes the right decisions for the right reasons. Additionally, he studies the intersection of data analysis, uncertainty quantification, and visualization. “Science can’t progress without uncertainties, but many AI solutions don’t accommodate such a situation,” Jay explains. “Enabling deep learning models to quantify confidences about predictions will help us develop more realistic neural networks.”

To give back to the computing community, Jay provides peer review for several conferences and journals and serves on the ML technical program committee of the Supercomputing conference series. With nearly 100 publications, 6 patent applications, and 2 monographs on his curriculum vitae, he still finds time to mentor the next generation of ML experts. He has advised several PhD students, including a National Science Foundation intern, Department of Energy Computational Science Graduate Fellows, and Computation’s summer interns.

“I’m surprised by how much my research has expanded since CASC hired me,” Jay states. “This breadth is possible only because of the Lab’s diversity of thought.”

My foot is constantly on the pedal to find the next big problem to solve. Generating new ideas is easy with all the great minds at the Lab. We have to move quickly in data science, and people who are interested in pushing boundaries can get ahead of the research pace.

I can do highly impactful research at the Lab and work on challenging problems firsthand. In other environments, I might rarely even meet the scientists who generate the data I am building models with. Driven by the needs of scientific discovery, the problems are much better defined here.

Jayaraman Thiagarajan