King Fahd University of Petroleum & Minerals
College of Computer Sciences & Engineering

Department of Computer Engineering




COE 421: Fault Tolerant Computing (3-0-3)



Syllabus




Catalog Description

Introduction to fault tolerant computing (FTC). Goals of fault tolerance (FT). Design techniques to achieve FT. Evaluation of FT systems. Reliability modeling and analysis of FT systems. Availability modeling. Design of FT VLSI?WSI circuits. Introduction to testing.

Prerequisite: COE 308 or equivalent.

Text Book:

D. Siewiorek and R. Swarz, ``Reliable Computer Systems: Design and Evaluation'', Digital-Press, 3rd Edition, 1998.

Course Objectives:

(1) Master the fundamental concepts in fault-tolerant computing.

(2) To Master application of the theory of reliability modeling and evaluation

(3) To Master designing reliable and fault tolerant computer systems.

(4) Appreciate the basic issues in yield enhancement of VLSI/WSI circuits.

Learning Outcomes:

(1) To introduce students to the fundamental concepts in fault-tolerant computing.

(2) To expose students to the theory of reliability modeling and evaluation

(3) To introduce students to the basic principles for designing reliable computer systems.

(4) To expose students to some of the commercially available fault tolerant/highly available systems.

(5) To introduce students to the basic issues in yield enhancement of VLSI/WSI circuits.


Topics:

1.
Module 1: Introduction and Fundamental Concepts (Chapter2 1 and 2)
Origins of FTC, Goals of FT, Applications of FTC, Faults, Errors, Failures, Fault characterization, Fault modeling.

2.
Module 2: Design Techniques to Achieve Fault Tolerance (Cahpetr3 7 Appendix B)
Design issues, Hardware redundancy, Information redundancy, Time redunadancy, Software redundancy.

3.
Module 3: Evaluation Techniques ()
Quantitative evaluation methods, Reliability modeling, Safety modeling, Availability modeling, maintainability modeling.

4.
Module 4: Design of Practical Fault-Tolerant Systems (Chapters 7-10)
The design process, Fault avoidance, Lonf-life applications, Critical-computation applications, High-availability applications.

5.
Module 5: FT Design of VLSI/WSI Circuits (Chapter 3 and Appendix A)
Failure modes, Self-checking circuits, Reliability & Yield enhancement of array processros.

6.
Module 6: Introduction to Testing (Chapter 4 and Appendix C)
Test pattern generation methods, Design-for-Testability, Testability analysis.


Computer Usage:

Use of available reliability modeling and evaluation tools.

Laboratory Experiments:

None.

Grading Policy (Tentative):

30% Assignments & Quizzes
15% Major Exam I (Tentatively during week 5)
20% Major Exam II (Tentatively during week 10)
35% Final Exam (Scheduled by the Registrar)
ABET Category content:
Engineering Science: 50 %

Engineering Design: 50%


Prepared by: Prof. Mostafa Abd-El-Barr. Date: November 2002.