King Fahd University of Petroleum & Minerals
College of Computer Sciences and Engineering
Computer Engineering Department
COE 420 Parallel Computing
Parallel Processing Architectures
(COE 502)
Cluster Computing (ICS
446)
Course Syllabus
Course Description:
Introduction to parallel processing
architectures, sequential, parallel, pipelined, and dataflow architectures.
Vectorization methods, optimization, and performance. Interconnection networks,
routing, complexity, and performance. Small-scale, medium-scale, and large-scale
multiprocessors. Data-parallel paradigm and techniques. Multithreaded
architectures and programming. The students are expected to carry out research
projects in related field of studies. (Pre-requisite:
COE 301 Computer Organization or equivalent).
Course Outline:
-
The Need and Feasibility of Parallel
Computing, Technology Trends, Microprocessor Performance Attributes, Goal of
Parallel Computing. Computing Elements, Programming Models, Flynn's
Classification, Multiprocessors Vs. Multicomputers. Current Trends In
Parallel Architectures, Communication Architecture. (PCA Chapter 1.1, 1.2)
(Chapter 1 and 2).
-
Parallel Architectures Convergence:
Communication Architecture, Communication Abstraction. Naming, Operations,
Ordering, Replication. Communication Cost Model.(PCA Chapter 1.2, 1.3)
-
Parallel Programs: Conditions of Parallelism.
Asymptotic Notations for Algorithm Analysis, PRAM. Levels of Parallelism,
Hardware Vs. Software Concurrency. Data Vs. Functional Parallelism. Amdahl’s
Law, DOP, Concurrency Profile. Steps in Creating Parallel Programs:
Decomposition, Assignment, Orchestration, Mapping. (PCA Chapter 2.1, 2.2)
-
Parallelization of An Example Program: Ocean
simulation Iterative equation solver (2D Grid). (PCA Chapter 2.3)
-
Cluster Computing: Origins, Broad Issues in
Heterogeneous Computing (HC). Message-Passing Programming. Overview of
Message Passing Interface (MPI 1.2). (PP Chapter 2, Appendix A, MPI and HC
References Below)
-
Considerations in Parallel Program Creation
Steps for Performance. (PCA Chapter 3)
-
Basic Parallel Programming Techniques and
Examples. Massively Parallel Computations: Pixel-based Image Processing.
Divide-and-conquer Problem Partitioning: Parallel Bucket Sort, Numerical
Integration, Gravitational N-Body Problem. Pipelined Computations: Addition,
Insertion Sort, Solving Upper-triangular System of Linear Equations.
Synchronous Iteration: Barriers, Iterative Solution of Linear Equations.
Dynamic Load Balancing: Centralized, Distributed, Moore's Shortest Path
Algorithm. (PP Chapters 3-7, 12)
-
Network Properties and Requirements For
Parallel Processing. Static Point-to-point Connection Network Topologies.
Network Embeddings. Dynamic Connection Networks. (PP Chapter 1.3, PCA
Chapter 10, handout)
-
Parallel System Performance: Evaluation &
Scalability. Workload Selection. Parallel Performance Metrics Revisited.
Application/Workload Scaling Models of Parallel Computers. Parallel System
Scalability.
(PP Chapter 1, PCA Chapter 4, handout)
-
The Cache Coherence Problem in Shared Memory
Multiprocessors. Cache Coherence Approaches. Snoopy Bus-Snooping Cache
Coherence Protocols: Write-invalidate: MSI, MESI, Write-Update: Dragon.
(PCA Chapter 5, handout)
-
Cache Coherence in Scalable Distributed
Memory Machines: Hierarchical Snooping, Directory-based cache coherence. (PCA
Chapter 8)
Note: PCA stands for the book on Parallel Computer Architecture and PP stands
for the book on Parallel Programming.