No Title

King Fahd University of Petroleum & Minerals
College of Computer Sciences and Engineering
Computer Engineering Department
COE 520 Architecture and Design of Computer Systems

Instructor: Dr. Mayez Al-Mouhamed (Email: mayez@ccse.kfupm.sa.edu)
Office: Room 22-325 (Tel. 2934) and Lab 22-339 (Tel. 3536).
Office hours: S.M.W. from 9:00 AM to 10:00 and .U.T. from 11:00 AM to 11:50.
Text Book: Computer architecture: a quantitative approach, Hennessy and Patterson, second edition, 1996, Morgan Kaufnam Publishers, Inc.
References: selected papers from IEEE T.C., IEEE T.P.D.S., etc.
Computer Architecture and Parallel Processing, K. Hwang and F. Briggs, Mc-Graw-Hill, 1987. The Technology of Parallel Processing, A. Decegama, Prentice-Hall Inter., latest edition.
Grading Policy: Exam 1: 20/100 , Exam 2: 20/100 , Course project: 20/100, homeworks: 10/100, and Final Exam: 30/100 (scheduled by the registrar). Late submission is subject to 10
Attendance: attendance is required by all students. Excuse for official authorized must be presented to the instructor no later than one week following the absence. Unexcused absences lead to a ``DEN'' grade.

Course Description: Classification of computer systems, architectural developments, computer performance. Linear and nonlinear pipeline design, instruction and arithmetic pipeline, superscalar. Memory hierarchy, cache and virtual memories, cache coherence, memory system performance. Parallel architectures, performance measures, SIMD and MIMD architectures. Interconnection networks.
Pre-requisite: graduate standing.

Course Outline:

1.: Introduction (6 lectures)
Classification of computer systems, Architectural developments. Computer performance, measuring CPU time, CPI, MIPS, FLOPS. Use of benchmarks (SPEC, etc.). (Chapter 1).
2.: Pipelining (12 lectures)
Linear and nonlinear pipeline design. Instruction and arithmetic pipeline. Superscalar and superpipelined design. (Chapters 3 and 4)
3.: Memory system (12 lectures)
Memory hierarchy. Cache and virtual memories. Cache coherence. Memory system performance. (Chapters 5)
4.: Parallel architectures (12 lectures)
Motivation and performance measures, SIMD architectures. MIMD shared-memory and message-passing architectures. Interconnection networks. (Chapters 6 and 7)
5.: Miscellaneous (midterm and presentations) (3 lectures)

List of active projects:

1.: Investigation of Beowulf Cluster system with PVM or MPI. (Stu. M. Razzaque (220308) and I. R. Quadri (220248)).
For the first four weeks the students will survey (1) the Beowulf cluster computer system, (2) new trends in the architectures of cluster computer systems, and cluster performance. The students will prepare for a presentation within four weeks.
2.: Study of the DLX simulator (Stu. S. Sirajuddin (220282) and M. Y. Shareef (220326)).
For the first four weeks the students will survey (1) the DLX processor architecture, (2) the DLS simulator, (3) run on the simulator examples of programs in assembly language, and (4) search for newer simulators for pipelined processors (acquiring and testing). The students will prepare for a presentation within four weeks.
3.: Investigation of Grid Computing Problems (Stu. L. Al-Awami (970728))
For the first four weeks the students will survey (1) some typical Grid Computing problems, (2) investigate how these problems are partitioned for parallel processing, (3) identify and investigate algorithms used for load-balancing in grid computing, and (4) performance issues on specific parallel machines. The students will prepare for a presentation within four weeks.
4.: Study of Benchmarks programs used for Desktop computers. (Stu. S. Al-Mohsen (973987) and M. Khajamohiuddin (220328)) For the first four weeks the students will survey (1) determine representative types of benchmarks used for Desktop computers, (2) classify the benchmarks depending on their objectives, (3) acquire some available benchmarks for running them and demonstration to the class, and (4) classify some known computers based on their benchmarks performance. The students will prepare for a presentation within four weeks.
5.: Performance of Scalable Switching Architectures (Stu. A. Shafayat (970485))
For the first four weeks the students will survey (1) high-speed switching architectures, (2) survey scalability in switching architectures, (3) study performance of a scalable switching architecture (SSA), and (4) Program an analytical model for an SSA. The student will prepare for a presentation within four weeks.
6.: Study of Switching Architectures with Multicast (Stu. Th. Al-Gahtani (220134))
For the first four weeks the students will survey (1) Surveying of typical swtching architectures, (2) study of unicast and multicast problems, and (3) classify these switches based on performance. The student will prepare for a presentation within four weeks.
7.: Study of Techniques used for Improving Performance of Instruction Pipelining (Stu. Y. Al-Dilaijan (974012) and R. Mesmar (220138))
For the first four weeks the students will survey (1) Surveying of I-pipelining techniques in at least two recent processors, (2) dynamic execution models, (3) methods of resolution of hazards, and (4) compiler support techniques to I-pipelining. The students will prepare for a presentation within four weeks.
8.: Investigation of Thread-Level Parallelism (Stu. Mohammed Al-Shammeri (953704), and O. Al-Saadoun (957942))
For the first four weeks the students will survey (1) Instruction-level parallelism (ILP) with examples from known processors, (2) Thread-level parallelism (TLP) with examples from known processors, (3) Identify ILP and TLP in VLIW and Superscalar processors, (4) EPIC (explicit Parallel Instruction Computing) architectures, (5) Case study of some high-end processors (Intel Merced), and (6) New advances (Raw architecture, Simlutaneous Multithreading architectures). The students will prepare for a presentation within four weeks.

List of active projects:
An overall Report on the Project is required

1.: Investigation of Beowulf Cluster system with PVM or MPI. (Stu. M. Razzaque (220308) and I. R. Quadri (220248)).
The students will prepare for a refined presentation and a report within six weeks. Please investigate the Beowulf cluster computer systems with respect to (1) hardware issues like the PCs, the interconnection Network, the NIC, the data rates, the communication costs, (2) software issues like the O.S., single system image, communication library, and availability, (3) selection a a few applications and commenting on their performance, (4) find new trends in the architecture/network of cluster computer systems and comments on their motivation, design, and performance.
2.: Study of the DLX simulator (Stu. S. Sirajuddin (220282) and M. Y. Shareef (220326)).
The students will prepare for a refined presentation and a report within six weeks. Please investigate the Beowulf cluster computer systems with respect to (1) take a code like ADI benchmark, write it in C, generate its code in DLX, and collect its run time. (2) restructure the DLX ADI assembly to the best you can, run it, and compare its performance to the compiled version, (3) Try loop unrolling and compare performance. Now you may advice a method for compiler loop restructuring such as the use of loop unrolling. Explain the compiler approach with respect to a source code in DLX and give example.
3.: Investigation of Grid Computing Problems (Stu. L. Al-Awami (970728))
The student will prepare for a refined presentation and a report within six weeks. The suggested plan is: (1) determine what computations can be parallelized using Grid Computing approach, (2) provide some examples of Grid Computing problems, (3) explain how these problems are partitioned into grid and assigned to each computing node, (4) now you may present load balancing algorithms and comment on their performance. You may implement one load balancing algorithm of your own and present its performance.
4.: Study of Benchmarks programs used for Desktop computers. (Stu. S. Al-Mohsen (973987) and M. Khaja-mohiuddin (220328)) The student will prepare for a refined presentation and a report within six weeks. The suggested plan is: (1) determine representative benchmarks most commonly used for Desktop computers, (2) classify the benchmarks depending on their objectives such as benchmarking the CPU, the display system, the hard disk, and the NIC/Network, (3) acquire benchmarks as stated in (2), run them during your presentation, comments on the results, and provide a copy of those benchmarks to the students.
5.: Performance of Scalable Switching Architectures (Stu. A. Shafayat (970485))
The student will prepare for a refined presentation and a report within six weeks. The suggested plan is: (1) describe a few recent (after 1995) switching architectures with their motivation and performance, (2) Discuss the issue of scalability in hardware and throughput for the above architectures, (3) suggest a scalable feature/architecture, and (4) build the analytical model (ask instructor), write its program, run it for different size, load, and configuration, and show how performance scales up by scaling hardware.
6.: Study of Switching Architectures with Multicast (Stu. Th. Al-Gahtani (220134))
The student will prepare for a refined presentation and a report within six weeks. The suggested plan is: (1) describe a few recent (after 1995) multicast switching architectures with their motivation and performance, (2) Address the hardware complexity of each proposal and comment on whether it is practical or not, (3) Use of a simulator (ask instructor) to assess performance of multicast by using uniform traffic, (4) Provide your own suggestions for the design of a multicast switch.
7.: Study of Techniques used for Improving Performance of Instruction Pipelining (Stu. Y. Al-Dilaijan (974012) and (Stu. R. Mesmar (220138))
The student will prepare for a refined presentation and a report within six weeks. The suggested plan is: (1) Surveying of I-pipelining techniques in recently proposed micro-architectures (last ten years), (2) how structural, data, and control hazards are resolved and at what level, (3) main proposed features like branch-prediction, speculation, etc., (4) provide a comparison of these micro-architectures with respect to major aspects especially expected performance and limitations. Implement a branch-prediction table with 2-bit history and evaluate performance by using typical loop structures.
8.: Investigation of Thread-Level Parallelism (Stu. Mohammed Al-Shammeri (953704)) The student will prepare for a refined presentation and a report within six weeks. The suggested plan is: (1) issues or different schools of instruction-level parallelism (ILP) in major micro-architectures with examples. (2) what are the main research problems in each category like hazards resolution, brach prediction, speculative execution, hardware/compiler trends, etc., (2) Thread-level parallelism (TLP) with internal organization and examples from known processors, (3) Identify ILP and TLP in VLIW and Superscalar processors, (4) EPIC (explicit Parallel Instruction Computing) architectures, (5) Case study of some high-end processors (Intel Merced), and (6) New advances (Raw architecture, Simultaneous Multithreading architectures). Please avoid simple enumeration of approaches by providing motivation and execution philosophy.