Solving global shallow water equations on heterogeneous supercomputers.

The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasi...

Full description

Bibliographic Details
Main Authors: Haohuan Fu, Lin Gan, Chao Yang, Wei Xue, Lanning Wang, Xinliang Wang, Xiaomeng Huang, Guangwen Yang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5345762?pdf=render
id doaj-c11543b7deb24bc29bbcbcbbde6d6676
record_format Article
spelling doaj-c11543b7deb24bc29bbcbcbbde6d66762020-11-24T20:50:16ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01123e017258310.1371/journal.pone.0172583Solving global shallow water equations on heterogeneous supercomputers.Haohuan FuLin GanChao YangWei XueLanning WangXinliang WangXiaomeng HuangGuangwen YangThe scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale the computing problems onto more nodes and various kinds of accelerators has become a challenge for the model development. This paper describes our efforts on developing a highly scalable framework for performing global atmospheric modeling on heterogeneous supercomputers equipped with various accelerators, such as GPU (Graphic Processing Unit), MIC (Many Integrated Core), and FPGA (Field Programmable Gate Arrays) cards. We propose a generalized partition scheme of the problem domain, so as to keep a balanced utilization of both CPU resources and accelerator resources. With optimizations on both computing and memory access patterns, we manage to achieve around 8 to 20 times speedup when comparing one hybrid GPU or MIC node with one CPU node with 12 cores. Using a customized FPGA-based data-flow engines, we see the potential to gain another 5 to 8 times improvement on performance. On heterogeneous supercomputers, such as Tianhe-1A and Tianhe-2, our framework is capable of achieving ideally linear scaling efficiency, and sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes). Our study also provides an evaluation on the programming paradigm of various accelerator architectures (GPU, MIC, FPGA) for performing global atmospheric simulation, to form a picture about both the potential performance benefits and the programming efforts involved.http://europepmc.org/articles/PMC5345762?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Haohuan Fu
Lin Gan
Chao Yang
Wei Xue
Lanning Wang
Xinliang Wang
Xiaomeng Huang
Guangwen Yang
spellingShingle Haohuan Fu
Lin Gan
Chao Yang
Wei Xue
Lanning Wang
Xinliang Wang
Xiaomeng Huang
Guangwen Yang
Solving global shallow water equations on heterogeneous supercomputers.
PLoS ONE
author_facet Haohuan Fu
Lin Gan
Chao Yang
Wei Xue
Lanning Wang
Xinliang Wang
Xiaomeng Huang
Guangwen Yang
author_sort Haohuan Fu
title Solving global shallow water equations on heterogeneous supercomputers.
title_short Solving global shallow water equations on heterogeneous supercomputers.
title_full Solving global shallow water equations on heterogeneous supercomputers.
title_fullStr Solving global shallow water equations on heterogeneous supercomputers.
title_full_unstemmed Solving global shallow water equations on heterogeneous supercomputers.
title_sort solving global shallow water equations on heterogeneous supercomputers.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2017-01-01
description The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale the computing problems onto more nodes and various kinds of accelerators has become a challenge for the model development. This paper describes our efforts on developing a highly scalable framework for performing global atmospheric modeling on heterogeneous supercomputers equipped with various accelerators, such as GPU (Graphic Processing Unit), MIC (Many Integrated Core), and FPGA (Field Programmable Gate Arrays) cards. We propose a generalized partition scheme of the problem domain, so as to keep a balanced utilization of both CPU resources and accelerator resources. With optimizations on both computing and memory access patterns, we manage to achieve around 8 to 20 times speedup when comparing one hybrid GPU or MIC node with one CPU node with 12 cores. Using a customized FPGA-based data-flow engines, we see the potential to gain another 5 to 8 times improvement on performance. On heterogeneous supercomputers, such as Tianhe-1A and Tianhe-2, our framework is capable of achieving ideally linear scaling efficiency, and sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes). Our study also provides an evaluation on the programming paradigm of various accelerator architectures (GPU, MIC, FPGA) for performing global atmospheric simulation, to form a picture about both the potential performance benefits and the programming efforts involved.
url http://europepmc.org/articles/PMC5345762?pdf=render
work_keys_str_mv AT haohuanfu solvingglobalshallowwaterequationsonheterogeneoussupercomputers
AT lingan solvingglobalshallowwaterequationsonheterogeneoussupercomputers
AT chaoyang solvingglobalshallowwaterequationsonheterogeneoussupercomputers
AT weixue solvingglobalshallowwaterequationsonheterogeneoussupercomputers
AT lanningwang solvingglobalshallowwaterequationsonheterogeneoussupercomputers
AT xinliangwang solvingglobalshallowwaterequationsonheterogeneoussupercomputers
AT xiaomenghuang solvingglobalshallowwaterequationsonheterogeneoussupercomputers
AT guangwenyang solvingglobalshallowwaterequationsonheterogeneoussupercomputers
_version_ 1716804240618291200