King Fahd University of Petroleum & Minerals
Computer Engineering Department
Professor Mayez Al-Mouhamed
Research Topic 3: PARALLEL SIMULATION OF GRAVITATIONAL N-BODY PROBLEM
Introduction: Large and complex engineering problems often need too much computation time and storage to run on ordinary single processor computers. Even if they can be solved, powerful computation capability is required to obtain more accurate and reliable results within reasonable time. Parallel computing can fulfill such requirements for high performance computing. HPC refers to the use of high-speed processors (CPUs) and related technologies to solve computationally intensive problems. In recent years, HPC has become much more widely available and affordable, primarily due to the use of multiple low-cost processors that work in parallel on the computational task. Important advances have been made in the development of multiple-instruction, multiple-data (MIMD) multiprocessing systems, which consist of a network of computer processors that can be programmed independently. Such parallel architectures have revolutionized the design of computer algorithms, and have had a significant influence on finite element analysis. Nowadays, clusters of affordable compute servers make large-scale parallel processing a very viable strategy for scientific applications to run faster. In fact, the new multi-core processors have turned even desktop workstations into high-performance platforms for single-job execution.
This wider availability of HPC systems is enabling important trends in engineering simulation by developing simulation models that require more computer resources. These models need more computer memory and more computational time as engineers include greater geometric detail and more-realistic treatment of physical phenomena. These higher-level of details models are critical for simulation to reduce the need for expensive physical testing. However, HPC systems make these higher-fidelity simulations practical by yielding results within the engineering project’s required time. A second important trend is toward carrying up many simulations for the same problem so that engineers can consider multiple design ideas, conduct parametric studies and even perform automated design optimization. HPC systems provide the throughput required for completing multiple simulations simultaneously, thus allowing design decisions to be made early in the project.
The Gravitational N-Body Problem: Computational methods that track the motions of bodies interacting with one another, and possibly subject to an external field as well, have been the extensively studied for centuries. These methods were called “N-body” methods and have been applied to problems in astrophysics, semiconductor device simulation, molecular dynamics, plasma physics, and fluid mechanics. The problem states that: given initial states (position, mass and velocity) of N bodies, compute their states after time interval T. such kind of problems are computational extensive and will gain from the use of HPC in developing approximate solutions that helps in studying such phenomena. The simplest approach to tackle N-Body problem is to iterate over a sequence of small time steps. Within each time step, the acceleration on a body is computed by summing the contribution from each of the other N-1 bodies which is known as brute force algorithm. While this method is conceptually simple, easy to parallelize on HPC, and a choice for many applications, its O(N^2) time complexity make it impractical algorithm for large-scale simulations involving millions of bodies.To reduce the brute force algorithm time complexity, many algorithms has been proposed to get approximated solution for the problem within a reasonable time complexity and acceptable error bounds. These algorithms include Appel [APP85] and Barnes-Hut. It was claimed that Appel’s algorithm run in O(N) and Barnes-Hut (BH) run in O(Nlog(N)) for uniformly distributed bodies around the space. Greengard and Rokhlin developed the fast multi-pole method (FMM) which runs in O(N) time complexity and can be adjusted to give any fixed precision accuracy.
Summary: The above algorithms were initially proposed as sequential algorithms. However, with the evolution of vector machines and HPC clusters recently, there were many parallel implementations for these algorithms on different machine architectures. Zhao and Johnsson describe a nonadaptive 3D version of Greengard's algorithm on the Connection Machine CM-2. Salmon [SAL90] implemented the BH algorithm, with multipole approximations, on message passing architectures including the NCUBE and Intel iPSC. Nyland et al. implemented a 3D adaptive FMM with data-parallel methodology in Proteus, an architecture-independent language, and many other implementations on different machines. The objective is to implement the BH algorithm and parallelize it on IBM-e1350 eServer cluster. Our application of BH will be for a special N-Body problem which is the gravitational N-Body or Galaxy evolution problem. First we will implement the sequential algorithm; then, we will parallelize it using OpenMP set of directives with Intel C++ compiler installed on Redhat Linux server. The sequential BH algorithm and different approaches to parallelize it are presented. The implementation of the BH algorithm concentrates on the aspects that allow us to parallelize it easily. Different parallelization techniques included with openMP are investigated. Speedup of the parallel implementation will be outlined together with some future work guidelines.
Results: