COE 502 Parallel Processing Architectures
COE 420 Parallel Computing
ICS 446 Cluster Computing

TEACHING MATERIAL

  1. The Need and Feasibility of Parallel Computing, Technology Trends, Microprocessor Performance Attributes, Goal of Parallel Computing. Computing Elements, Programming Models, Flynn's Classification, Multiprocessors Vs. Multicomputers. Current Trends In Parallel Architectures, Communication Architecture. (PCA Chapter 1.1, 1.2) (Chapter 1 and 2) (PPT).

    Reference paper: On the Future of High Performance Computing: How to Think for Peta and Exascale Computing

    1. Designing for Power: Intel Leadership in Power Efficient Silicon and System Design, www.intel.com/technology.

    2. Practical SIMD Vectorization Techniques for Intel® Xeon Phi™ Coprocessors

    3. General-Purpose Graphics Processing Units in Service-Oriented Architectures

    4. GPU-Accelerated Scalable Solver for Banded Linear Systems

    5. Exploration of Automatic Optimization for eUDA Programming

  2. Parallel Architectures Convergence: Communication Architecture, Communication Abstraction. Naming, Operations, Ordering, Replication. Communication Cost Model.(PCA Chapter 1.2, 1.3) (PPT)
  3. Parallel Programs: Conditions of Parallelism. Asymptotic Notations for Algorithm Analysis, PRAM. Levels of Parallelism, Hardware Vs. Software Concurrency. Data Vs. Functional Parallelism. Amdahl’s Law, DOP, Concurrency Profile. Steps in Creating Parallel Programs: Decomposition, Assignment, Orchestration, Mapping. (PCA Chapter 2.1, 2.2)(PPT). Reference material:
    1. Example of data parallel programming using CUDA: CUDA-lite paper  and Program Analysis
    2. "Getting Started with OpenMP*".
    3. More Work-Sharing with OpenMP - Intel® Software Network.mht
    4. Advanced OpenMP Programming - Intel® Software Network.mht
    5. Simple loop data dependence analysis
    6. Sample Alternating Direction Integration (ADI) C code.
  4. Parallelization of An Example Program: Ocean simulation Iterative equation solver (2D Grid). (PCA Chapter 2.3)(PPT)
  5. Cluster Computing: Origins, Broad Issues in Heterogeneous Computing (HC). Message-Passing Programming. Overview of Message Passing Interface (MPI 1.2). (PP Chapter 2, Appendix A, MPI and HC)(PPT), Reference material on MPI  and MPI timing issues
  6. Considerations in Parallel Program Creation Steps for Performance. (PCA Chapter 3)(PPT)
  7. Basic Parallel Programming Techniques and Examples. Massively Parallel Computations: Pixel-based Image Processing. Divide-and-conquer Problem Partitioning: Parallel Bucket Sort, Numerical Integration, Gravitational N-Body Problem. Pipelined Computations: Addition, Insertion Sort, Solving Upper-triangular System of Linear Equations. Synchronous Iteration: Barriers, Iterative Solution of Linear Equations. Dynamic Load Balancing: Centralized, Distributed, Moore's Shortest Path Algorithm. (PP Chapters 3-7, 12)(PPT) 
    1. Main reference papers for OpenMp
    2. OpenUH: A Portable and Optimizing OpenMP Compiler
    3. Dragon analysis tool
    4. Reference papers to Benchmarking OpenMp Performance
    5. Reference to some Applications Using OpenMp
  8. Network Properties and Requirements For Parallel Processing. Static Point-to-point Connection Network Topologies. Network Embeddings. Dynamic Connection Networks. (PP Chapter 1.3, PCA Chapter 10)(PPT)
  9. Parallel System Performance: Evaluation & Scalability. Workload Selection. Parallel Performance Metrics Revisited. Application/Workload Scaling Models of Parallel Computers. Parallel System Scalability. (PP Chapter 1, PCA Chapter 4)(Perf) (PPT)
  10. The Cache Coherence Problem in Shared Memory Multiprocessors. Cache Coherence Approaches. Snoopy Bus-Snooping Cache Coherence Protocols: Write-invalidate: MSI, MESI, Write-Update: Dragon. (PCA Chapter 5)(PPT)  (PPT)
  11. Cache Coherence in Scalable Distributed Memory Machines: Hierarchical Snooping, Directory-based cache coherence. (PPT)
  12. Unified Compute Device Architecture (CUDA). Introduction and Example.

GRADING

Student Presentations from course project

REFERENCES

Reference courses and material:

Parallel Virtual Machine (PVM/MPI and pthread):