King Fahd University of Petroleum & Minerals
College of Computer Sciences and Engineering
Computer Engineering Department
COE 520 Architecture and Design of Computer Systems
- Instructor: Dr. Mayez Al-Mouhamed (Email: mayez@ccse.kfupm.sa.edu)
- Office: Room 22-325 (Tel. 2934) and Lab 22-339 (Tel. 3536).
- Office hours: S.M.W. from 9:00 AM to 10:00 and .U.T.
from 11:00 AM to 11:50.
- Text Book:
Computer architecture: a quantitative approach, Hennessy and Patterson,
second edition, 1996, Morgan Kaufnam Publishers, Inc.
References: selected papers from IEEE T.C., IEEE T.P.D.S., etc.
Computer Architecture and Parallel Processing, K. Hwang and F.
Briggs, Mc-Graw-Hill, 1987.
The Technology of Parallel Processing, A. Decegama,
Prentice-Hall Inter., latest edition.
- Grading Policy:
Exam 1: 20/100 ,
Exam 2: 20/100 ,
Course project: 20/100, homeworks: 10/100,
and Final Exam: 30/100 (scheduled by the registrar).
Late submission is subject to 10
- Attendance: attendance is required by all students.
Excuse for official authorized must be presented to the instructor
no later than one week following the absence.
Unexcused absences lead to a ``DEN'' grade.
Course Description:
Classification of computer systems, architectural developments,
computer performance. Linear and nonlinear pipeline design, instruction
and arithmetic pipeline, superscalar. Memory hierarchy, cache and virtual
memories, cache coherence, memory system performance.
Parallel architectures, performance measures, SIMD and MIMD architectures.
Interconnection networks.
Pre-requisite: graduate standing.
Course Outline:
- 1.
- Introduction (6 lectures)
Classification of computer systems, Architectural developments.
Computer performance, measuring CPU time, CPI, MIPS, FLOPS.
Use of benchmarks (SPEC, etc.). (Chapter 1).
- 2.
- Pipelining (12 lectures)
Linear and nonlinear pipeline design. Instruction and
arithmetic pipeline. Superscalar and superpipelined design.
(Chapters 3 and 4)
- 3.
- Memory system (12 lectures)
Memory hierarchy. Cache and virtual memories. Cache coherence.
Memory system performance. (Chapters 5)
- 4.
- Parallel architectures (12 lectures)
Motivation and performance measures, SIMD architectures.
MIMD shared-memory and message-passing architectures.
Interconnection networks. (Chapters 6 and 7)
- 5.
- Miscellaneous (midterm and presentations) (3 lectures)
List of active projects:
- 1.
- Investigation of Beowulf Cluster system with PVM or MPI.
(Stu. M. Razzaque (220308) and I. R. Quadri (220248)).
For the first four weeks the students will survey (1) the Beowulf
cluster computer system, (2) new trends in the architectures of
cluster computer systems, and cluster performance.
The students will prepare for a presentation within four weeks.
- 2.
- Study of the DLX simulator
(Stu. S. Sirajuddin (220282) and M. Y. Shareef (220326)).
For the first four weeks the students will survey
(1) the DLX processor architecture, (2) the DLS simulator,
(3) run on the simulator examples of programs in assembly
language, and (4) search for newer simulators for pipelined
processors (acquiring and testing).
The students will prepare for a presentation within four weeks.
- 3.
- Investigation of Grid Computing Problems
(Stu. L. Al-Awami (970728))
For the first four weeks the students will survey
(1) some typical Grid Computing problems,
(2) investigate how these problems are partitioned for
parallel processing,
(3) identify and investigate algorithms used for load-balancing
in grid computing, and (4) performance issues on specific parallel
machines.
The students will prepare for a presentation within four weeks.
- 4.
- Study of Benchmarks programs used for Desktop computers.
(Stu. S. Al-Mohsen (973987) and M. Khajamohiuddin (220328))
For the first four weeks the students will survey
(1) determine representative types of benchmarks used for
Desktop computers,
(2) classify the benchmarks depending on their objectives,
(3) acquire some available benchmarks for running them and
demonstration to the class,
and (4) classify some known computers based on their
benchmarks performance.
The students will prepare for a presentation within four weeks.
- 5.
- Performance of Scalable Switching Architectures
(Stu. A. Shafayat (970485))
For the first four weeks the students will survey
(1) high-speed switching architectures,
(2) survey scalability in switching architectures,
(3) study performance of a scalable switching architecture (SSA), and
(4) Program an analytical model for an SSA.
The student will prepare for a presentation within four weeks.
- 6.
- Study of Switching Architectures with Multicast
(Stu. Th. Al-Gahtani (220134))
For the first four weeks the students will survey
(1) Surveying of typical swtching architectures,
(2) study of unicast and multicast problems, and
(3) classify these switches based on performance.
The student will prepare for a presentation within four weeks.
- 7.
- Study of Techniques used for Improving Performance of
Instruction Pipelining
(Stu. Y. Al-Dilaijan (974012) and R. Mesmar (220138))
For the first four weeks the students will survey
(1) Surveying of I-pipelining techniques in at least two recent processors,
(2) dynamic execution models,
(3) methods of resolution of hazards, and
(4) compiler support techniques to I-pipelining.
The students will prepare for a presentation within four weeks.
- 8.
- Investigation of Thread-Level Parallelism
(Stu. Mohammed Al-Shammeri (953704), and O. Al-Saadoun (957942))
For the first four weeks the students will survey
(1) Instruction-level parallelism (ILP) with examples from known processors,
(2) Thread-level parallelism (TLP) with examples from known processors,
(3) Identify ILP and TLP in VLIW and Superscalar processors,
(4) EPIC (explicit Parallel Instruction Computing) architectures,
(5) Case study of some high-end processors (Intel Merced), and
(6) New advances (Raw architecture, Simlutaneous Multithreading
architectures).
The students will prepare for a presentation within four weeks.
List of active projects:
An overall Report on the Project is required
- 1.
- Investigation of Beowulf Cluster system with PVM or MPI.
(Stu. M. Razzaque (220308) and I. R. Quadri (220248)).
The students will prepare for a refined presentation and a
report within six weeks.
Please investigate the Beowulf cluster computer systems with
respect to (1) hardware issues like the PCs, the interconnection
Network, the NIC, the data rates, the communication costs,
(2) software issues like the O.S., single system image,
communication library, and availability,
(3) selection a a few applications and commenting
on their performance,
(4) find new trends in the architecture/network of
cluster computer systems and comments on their motivation,
design, and performance.
- 2.
- Study of the DLX simulator
(Stu. S. Sirajuddin (220282) and M. Y. Shareef (220326)).
The students will prepare for a refined presentation and a
report within six weeks.
Please investigate the Beowulf cluster computer systems with
respect to
(1) take a code like ADI benchmark, write it in C, generate
its code in DLX, and collect its run time.
(2) restructure the DLX ADI assembly to the best you can, run it,
and compare its performance to the compiled version,
(3) Try loop unrolling and compare performance.
Now you may advice a method for compiler loop restructuring
such as the use of loop unrolling.
Explain the compiler approach with respect to a source code
in DLX and give example.
- 3.
- Investigation of Grid Computing Problems
(Stu. L. Al-Awami (970728))
The student will prepare for a refined presentation and a
report within six weeks.
The suggested plan is:
(1) determine what computations can be parallelized using Grid
Computing approach,
(2) provide some examples of Grid Computing problems,
(3) explain how these problems are partitioned into grid
and assigned to each computing node,
(4) now you may present load balancing algorithms and comment
on their performance.
You may implement one load balancing algorithm of your own
and present its performance.
- 4.
- Study of Benchmarks programs used for Desktop computers.
(Stu. S. Al-Mohsen (973987) and M. Khaja-mohiuddin (220328))
The student will prepare for a refined presentation and a
report within six weeks.
The suggested plan is:
(1) determine representative benchmarks most commonly used for
Desktop computers,
(2) classify the benchmarks depending on their objectives
such as benchmarking the CPU, the display system, the hard disk,
and the NIC/Network,
(3) acquire benchmarks as stated in (2), run them during
your presentation, comments on the results, and provide a copy of
those benchmarks to the students.
- 5.
- Performance of Scalable Switching Architectures
(Stu. A. Shafayat (970485))
The student will prepare for a refined presentation and a
report within six weeks.
The suggested plan is:
(1) describe a few recent (after 1995) switching architectures
with their motivation and performance,
(2) Discuss the issue of scalability in hardware and throughput
for the above architectures,
(3) suggest a scalable feature/architecture, and
(4) build the analytical model (ask instructor), write its program,
run it for different size, load, and configuration, and show how
performance scales up by scaling hardware.
- 6.
- Study of Switching Architectures with Multicast
(Stu. Th. Al-Gahtani (220134))
The student will prepare for a refined presentation and a
report within six weeks.
The suggested plan is:
(1) describe a few recent (after 1995) multicast switching
architectures with their motivation and performance,
(2) Address the hardware complexity of each proposal and
comment on whether it is practical or not,
(3) Use of a simulator (ask instructor) to assess performance
of multicast by using uniform traffic,
(4) Provide your own suggestions for the design of a
multicast switch.
- 7.
- Study of Techniques used for Improving Performance of
Instruction Pipelining (Stu. Y. Al-Dilaijan (974012)
and (Stu. R. Mesmar (220138))
The student will prepare for a refined presentation and a
report within six weeks.
The suggested plan is:
(1) Surveying of I-pipelining techniques in recently proposed
micro-architectures (last ten years),
(2) how structural, data, and control hazards are resolved
and at what level,
(3) main proposed features like branch-prediction, speculation, etc.,
(4) provide a comparison of these micro-architectures with
respect to major aspects especially expected performance and
limitations.
Implement a branch-prediction table with 2-bit history
and evaluate performance by using typical loop structures.
- 8.
- Investigation of Thread-Level Parallelism
(Stu. Mohammed Al-Shammeri (953704))
The student will prepare for a refined presentation and a
report within six weeks.
The suggested plan is:
(1) issues or different schools of instruction-level parallelism
(ILP) in major micro-architectures with examples.
(2) what are the main research problems in each category like
hazards resolution, brach prediction, speculative execution,
hardware/compiler trends, etc.,
(2) Thread-level parallelism (TLP) with internal organization
and examples from known processors,
(3) Identify ILP and TLP in VLIW and Superscalar processors,
(4) EPIC (explicit Parallel Instruction Computing) architectures,
(5) Case study of some high-end processors (Intel Merced), and
(6) New advances (Raw architecture, Simultaneous Multithreading
architectures).
Please avoid simple enumeration of approaches by providing
motivation and execution philosophy.