High-Performance Computing Research Group

Professor Mayez Al-Mouhamed

Research Topic 1: Parallel Programming of Semi-Static Problems for on Massively Parallel Computers

Introduction: Massive Parallelism is characterized by the regularity of its computing model. CUDA is an elegant solution to the problem of representing parallelism in algorithms, not all algorithms, but enough to matter. Therefore, there is need to translate/formulate domain problems into computational models that can be solved efficiently by available computing resources (1) understanding the relationship between the domain problem and the computational models, (2) understanding the strength and limitations of the computing devices, and (3)Designing the model implementations to steer away from the limitations

Description: Graphic processing Units (GPUs) are gaining ground in high-performance computing especially in arena of Massively Parallel Computing. The basic strategy is to use multithreading to hide the memory latency. Therefore, a large number of threads are created and a low cost switching mechanism is used to switch among threads in the case of exception. A few programming tools were proposed like CUDA and OpenCL. These are based on establishing a mapping (Kernel) from the computational problem to the computing elements. Therefore, the need to evaluate these programming tools with respect to ease of use, programming level, expressiveness and adequacy with respect to computation problems exhibiting static, semi-static, and dynamic parallelism.

In this research the plan is as follows:

1. Explore massively parallel multiprocessors and their programming models. Streaming multiprocessor, SIMD and multithreading. Highly multithreaded architectures, thread-Level parallelism, resources sharing, thread scheduling, score-boarding, transparent scalability.

2. Data dependence analysis, recurrences, races. Shared-memory, atomicity, mutual exclusion, barrier, and synchronization.

3. Memory hierarchy optimization, locality and data placement, data reuse, loop reordering transformations, shared-memory usage, global memory bandwidth and accesses.

4. Control-flow, SIMD, thread blocks partitioning, vector parallel reduction, tree-structured computation, serialized gathering, pedicated execution, and dynamic task queues.

5. Study of CUDA and OpenCL parallel programming languages.

6. Review of Methodologies of Parallel Programming for Non-Static Parallelism in Massively Parallel Computers (GPUs) such as the NO-Body problem or others. Prepare a comparative analysis of these approaches. As application, the writing parallel programs for following applications:

Static problems: the Alternating Direction Integration (ADI), (2) the Ocean Red-Black Relaxation loop, and (3) the Matrix Multiply, and (4) Solving a system of linear equations using Jacobi, and
Semi-static problems: N-Body problem.

We will collect results (like execution time and speedup of parallelized programs vs sequential execution) and comments on the obtained program performance on the Tesla GPU available at the ICS department).

Propose some programming methodology for the efficient parallel programming of Semi-static problems in Massively Parallel Computers. Carry out analysis of these methodologies with respect to ease of use, programming level, expressiveness and adequacy with respect to computation problems exhibiting static, semi-static, and dynamic parallelism.

R Results:

Performance Evaluation of CUDA Parallel Programming on Tesla GPUs, by Mr. Ayaz Khan.

Some Research Directions in Massively-Parallel Computing, by Dr. Mayez Al-Mouhamed.