skip to main content

Efficient data IO for a Parallel Global Cloud Resolving Model

Palmer, Bruce ; Koontz, Annette ; Schuchardt, Karen ; Heikes, Ross ; Randall, David

Environmental Modelling and Software, 2011, Vol.26(12), pp.1725-1735 [Periódico revisado por pares]

Texto completo disponível

Citações Citado por
  • Título:
    Efficient data IO for a Parallel Global Cloud Resolving Model
  • Autor: Palmer, Bruce ; Koontz, Annette ; Schuchardt, Karen ; Heikes, Ross ; Randall, David
  • Assuntos: High Performance Io ; Parallel Io Libraries ; Data Formatting ; Geodesic Grid ; Global Cloud Resolving Model ; Grid Specifications ; Engineering ; Environmental Sciences ; Computer Science ; Ecology
  • É parte de: Environmental Modelling and Software, 2011, Vol.26(12), pp.1725-1735
  • Descrição: Execution of a Global Cloud Resolving Model (GCRM) at target resolutions of 2–4 km will generate, at a minimum, 10s of Gigabytes of data per variable per snapshot. Writing this data to disk, without creating a serious bottleneck in the execution of the GCRM code, while also supporting efficient post-execution data analysis is a significant challenge. This paper discusses an Input/Output (IO) application programmer interface (API) for the GCRM that efficiently moves data from the model to disk while maintaining support for community standard formats, avoiding the creation of very large numbers of files, and supporting efficient analysis. Several aspects of the API will be discussed in detail. First, we discuss the output data layout which linearizes the data in a consistent way that is independent of the number of processors used to run the simulation and provides a convenient format for subsequent analyses of the data. Second, we discuss the flexible API interface that enables modelers to easily add variables to the output stream by specifying where in the GCRM code these variables are located and to flexibly configure the choice of outputs and distribution of data across files. The flexibility of the API is designed to allow model developers to add new data fields to the output as the model develops and new physics is added. It also provides a mechanism for allowing users of the GCRM code to adjust the output frequency and the number of fields written depending on the needs of individual calculations. Third, we describe the mapping to the NetCDF data model with an emphasis on the grid description. Fourth, we describe our messaging algorithms and IO aggregation strategies that are used to achieve high bandwidth while simultaneously writing concurrently from many processors to shared files. We conclude with initial performance results. ► A strategy for linearizing data on a geodesic grid is developed. ► A modular IO library based on this strategy is developed that can be easily incorporated into the GCRM with minimal effort. ► A subset of processors is used for IO to reduce contention with the file system. ► Bandwidth results for a number of different IO configurations are presented.
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.