TEACHING MATERIAL
- The Need and Feasibility of Parallel Computing, Technology Trends, Microprocessor Performance Attributes, Goal of Parallel Computing. Computing Elements, Programming Models, Flynn's Classification, Multiprocessors Vs. Multicomputers. Current Trends In Parallel Architectures, Communication Architecture. (PCA Chapter 1.1, 1.2) (Chapter 1 and 2) (PPT).
Reference paper: On the Future of High Performance Computing: How to Think for Peta and Exascale Computing
- Parallel Architectures Convergence: Communication Architecture, Communication Abstraction. Naming, Operations, Ordering, Replication. Communication Cost Model.(PCA Chapter 1.2, 1.3) (PPT)
- Parallel Programs: Conditions of Parallelism. Asymptotic Notations for Algorithm Analysis, PRAM. Levels of Parallelism, Hardware Vs. Software Concurrency. Data Vs. Functional Parallelism. Amdahl’s Law, DOP, Concurrency Profile. Steps in Creating Parallel Programs: Decomposition, Assignment, Orchestration, Mapping. (PCA Chapter 2.1, 2.2)(PPT). Reference material:
- Example of data parallel programming using CUDA: CUDA-lite paper and Program Analysis
- "Getting Started with OpenMP*".
- More Work-Sharing with OpenMP - Intel® Software Network.mht
- Advanced OpenMP Programming - Intel® Software Network.mht
- Simple loop data dependence analysis
- Sample Alternating Direction Integration (ADI) C code.
- Parallelization of An Example Program: Ocean simulation Iterative equation solver (2D Grid). (PCA Chapter 2.3)(PPT)
- Cluster Computing: Origins, Broad Issues in Heterogeneous Computing (HC). Message-Passing Programming. Overview of Message Passing Interface (MPI 1.2). (PP Chapter 2, Appendix A, MPI and HC)(PPT), Reference material on MPI and MPI timing issues
- Considerations in Parallel Program Creation Steps for Performance. (PCA Chapter 3)(PPT)
- Basic Parallel Programming Techniques and Examples. Massively Parallel Computations: Pixel-based Image Processing. Divide-and-conquer Problem Partitioning: Parallel Bucket Sort, Numerical Integration, Gravitational N-Body Problem. Pipelined Computations: Addition, Insertion Sort, Solving Upper-triangular System of Linear Equations. Synchronous Iteration: Barriers, Iterative Solution of Linear Equations. Dynamic Load Balancing: Centralized, Distributed, Moore's Shortest Path Algorithm. (PP Chapters 3-7, 12)(PPT)
- Network Properties and Requirements For Parallel Processing. Static Point-to-point Connection Network Topologies. Network Embeddings. Dynamic Connection Networks. (PP Chapter 1.3, PCA Chapter 10)(PPT)
- Parallel System Performance: Evaluation & Scalability. Workload Selection. Parallel Performance Metrics Revisited. Application/Workload Scaling Models of Parallel Computers. Parallel System Scalability. (PP Chapter 1, PCA Chapter 4)(Perf) (PPT)
- The Cache Coherence Problem in Shared Memory Multiprocessors. Cache Coherence Approaches. Snoopy Bus-Snooping Cache Coherence Protocols: Write-invalidate: MSI, MESI, Write-Update: Dragon. (PCA Chapter 5)(PPT) (PPT)
- Cache Coherence in Scalable Distributed Memory Machines: Hierarchical Snooping, Directory-based cache coherence. (PPT)
Unified Compute Device Architecture (CUDA). Introduction and Example.
GRADING
Student Presentations from course project
REFERENCES
Reference courses and material:
Parallel Virtual Machine (PVM/MPI and pthread):