An FMM Based on Dual Tree Traversal for Many-Core Architectures
The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N -body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2013-09-01
|
Series: | Journal of Algorithms & Computational Technology |
Online Access: | https://doi.org/10.1260/1748-3018.7.3.301 |
Summary: | The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N -body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse linear solvers. A direct comparison with other state-of-the-art fast N -body codes demonstrates that orders of magnitude increase in performance can be achieved by careful selection of the optimal algorithm and low-level optimization of the code. The current N-body solver uses a fast multipole method with an efficient strategy for finding the list of cell-cell interactions by a dual tree traversal. A task-based threading model is used to maximize thread-level parallelism and intra-node load-balancing. In order to extract the full potential of the SIMD units on the latest CPUs, the inner kernels are optimized using AVX instructions. |
---|---|
ISSN: | 1748-3018 1748-3026 |