As global, broad-based climate change projections have become more useful, effectively managing the vast accompanying volumes of data represents a major challenge for the computational scientists who support the projections. In the area of understanding and predicting climate change and extreme weather events, advanced tools are required to securely store, manage, access, analyze, visualize, and process enormous and distributed data sets. This “big data” challenge is being met with the Earth System Grid Federation (ESGF), an international collaboration led by LLNL with a primary goal of facilitating advancements in Earth system science. Designed and maintained by dozens of American, European, Asian, and Australian research institutions, ESGF now powers most global climate change research, notably including assessments by the United Nations’ Intergovernmental Panel on Climate Change.
ESGF offers an immense, computerized climate database that standardizes and organizes observational and simulation data from 21 countries, allowing scientists to compare models against actual observations and reanalysis. Key capabilities include
- National and international network infrastructure integrating the world’s climate model and measurement archives;
- Shared resources across multiple centers for high performance computing and storage of tens of petabytes of transportable data;
- Easy-to-use and secure, federated web-based application programming interface and data infrastructure;
- Flexible infrastructure allowing participants to customize parameters;
- High performance search, analysis, and visualization tools;
- Access to a broad set of data and tools for comparative and exploratory analysis; and
- Virtual collaborative environment for analysis tasks demanding large, varied datasets.
Virtually all climate science researchers worldwide use ESGF to discover, access, and compute data. In fact, many of today’s most recognized climate projects employ the valuable software and services developed by the ESGF team and its community. ESGF is designed to remain robust even as data volumes continue to grow exponentially. Currently, 25,000 users from 2,700 sites on six continents are sharing data through ESGF. More than 5 petabytes of data have been downloaded to the climate community through ESGF, making it one of the most complex, successful big data systems in existence. The federation will continue to expand access to relevant data integrated with tools for analysis and visualization that are supported by the necessary hardware and network capabilities to interpret peta- and exascale scientific data.
ESGF combines grid-based computing with a distributed architecture, keeping participating members sovereign while simultaneously linking them together. To achieve this, ESGF developers created a unique system of nodes that requires very little explicit coordination while still providing a robust “data space” for storage and computation. Teams work in highly distributed research environments, using unique scientific instruments, exascale-class computers, and extreme amounts of data. Users can access ESGF data using Web browsers, scripts, and client applications. A key to ESGF’s success is its ability to effectively produce, validate, and analyze research results collaboratively, so that, for example, new results generated by one team member are immediately accessible to the rest of the team, who can annotate, comment on, and otherwise interact with those results.
The ESGF peer-to-peer architecture is based on a dynamic system of nodes—independently administered yet united by common protocols and interfaces—that interact on an equal basis and offer a broad range of user and data services, depending on how each is set up. Data are published, stored, and served from dozens of nodes around the globe, yet they are searchable and accessible as if they were stored in a single global archive. Metadata shared among projects help fully integrate the repository of data and components for usability and interoperability. ESGF also promotes standard conventions for data transformation, quality control, and data validation across processes and projects.
The ESGF website contains extensive documentation, developer tools, implementation wikis, user tutorials, the most recent and past ESGF Annual Face-to-Face Conference Reports, and the latest news.
ESGF software is helmed by LLNL’s Analytics and Informatics Management Systems group. Other climate projects in the Laboratory’s portfolio include
- Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT)
- Community Diagnostics Package (CDP)
- Climate Model Output Rewriter (CMOR)
- Accelerated Climate Modeling for Energy (ACME) Workbench
- Distributed Resources for the Earth System Grid Federation Advanced Management (DREAM)
- Community Machine Learning (CML)
- Community Data Management System (CDMS)
ESGF has been covered in the January 2013 issue of Science & Technology Review: Dealing with Data Overload in the Scientific Realm. See also Our People for a profile of ESGF Executive Committee Chair Dean N. Williams.
For more information, contact Dean N. Williams.