# **COMPUTER ENGINEERING DEPARTMENT**

# **COE 308**

## **COMPUTER ARCHITECTURE**

### **Final Exam**

#### Second Semester (982)

### Time: 7:30-9:30 AM

Student Name : \_\_\_\_\_\_

Student ID. : \_\_\_\_\_

| Question | Max Points | Score |
|----------|------------|-------|
| Q1       | 25         |       |
| Q2       | 15         |       |
| Q3       | 10         |       |
| Q4       | 10         |       |
| Q5       | 25         |       |
| Q6       | 15         |       |
| Total    | 100        |       |

Dr. Aiman El-Maleh

#### [25 Points]

**Q.1.** Indicate whether the following is true or false, and if the answer is false indicate why:

1. (True, False) The bandwidth of a crossbar switch connecting 3 processors to 2 memories is larger than the bandwidth of a crossbar switch connecting 2

processors to 3 memories, assuming that the probability of a processor making a request is equivalent in both cases.

- **2.** (**True, False**) Suppose that a processor takes 600ns to execute a task, and that 20% of the task cannot be executed in parallel. By using a MIMD multiprocessor system with a large number of processors, it is possible to execute the task in 100ns.
- **3.** (**True, False**) In a SIMD multiprocessor system with 10 processors working on 10X10 arrays where row-wise and column-wise operations are performed on the array, it is sufficient to spread the array data across 10 memory modules to avoid memory contention. Note that each processor operates on either a single row or a single column.
- 4. (True, False) In a shared-memory multiprocessor system, the memory has a single address space so that any processor could access any memory location by using its address, and the memory access time will be non-uniform.
- **5.** (**True, False**) Message-passing MIMD multiprocessor systems are scalable, easier to program, and more flexible than shared-memory MIMD multiprocessor systems.
- 6. (True, False) In a paged system, the least recently-used page replacement algorithm produces the smallest number of page faults for any given sequence of page references in comparison to any other replacement algorithm (except the optimal replacement).
- 7. (True, False) A cache with a full-associative mapping organization has a higher hit ratio than a set-associative or direct mapping organizations when the same program is executed under the three cache organizations and the same cache size is used.
- 8. (True, False) It is possible to represent the positive integer numbers ranging from 1 to 255 using a 10-bit floating-point format.
- **9.** (**True, False**) In a floating-point representation, the bias value for a base-2 exponent and a base-8 exponent in a 5-bit field is equal to 16.
- **10.** (**True, False**) A booth multiplication algorithm always performs less number of addition/subtraction operations than a regular shift-and-add multiplication algorithm.

[15 Points]

**Q.2.** Briefly describe the functionality of the following (in not more than 4 lines for each):

- 1. Reorder buffer
- 2. Reservation station
- 3. Translation look-aside buffer (TLB)
- 4. Page table
- 5. Segment table

# [10 Points]

Q.3. Suppose that you have two processes that are executed simultaneously and that the two processes need to access a shared variable in memory. Write a sketch of the code for

the two processes such that at most one process has access to the shared variable at a time, and no deadlock can happen.

[10 Points]

**Q.4.** It is required to design a four-stage pipeline to add two 8-bit numbers,  $A_7..A_0$  and  $B_7..B_0$ , such that each stage of the pipeline uses a single full adder. You were given the following two designs:

1. In the first design, bits  $A_3$ - $A_0$  and  $B_3$ - $B_0$  are first added in 4 cycles, then the carry-out is forwarded to the 2<sup>nd</sup> stage where it is added with bits  $A_6$ - $A_4$  and  $B_6$ - $B_4$  in 3 cycles, and finally the carry-out is forwarded to the 4<sup>th</sup> stage where it is added to bits  $A_7$  and  $B_7$ , as indicated by the pipeline block diagram below.



2. In the second design, bits  $A_3$ - $A_0$  and  $B_3$ - $B_0$  are first added in 4 cycles, then the carry-out is forwarded to the 1<sup>st</sup> stage where it is added with bits  $A_7$ - $A_4$  and  $B_7$ - $B_4$  in 4 additional cycles, as indicated by the pipeline block diagram below.



Draw the reservation tables of the two pipelines, and indicate which design you would choose. Justify your answer.

**Q.5.** Suppose that it is required to interconnect 8 processors to 8 memory modules using an interconnection network such that each processor can access each memory module.

1. Given the three-stage Clos network shown below, indicate whether the network is blocking or non-blocking. If it is blocking illustrate this by an example and draw a non-blocking three-stage Clos network assuming each cell in the first stage has 2 inputs and each cell in the last stage has 2 outputs.



2. Draw a baseline network connecting the 8 processors to the 8 memory modules, and illustrate its routing strategy. Explain how the routing strategy can be used to connect processor 2 to memory module 3.

3. Draw an omega network connecting the 8 processors to the 8 memory modules, and illustrate its routing strategy. Explain how the routing strategy can be used to connect processor 2 to memory module 3.

**Q.6** Suppose that it is required to interconnect 16 processors in a multiprocessor system. Show how the processors can be connected using the static interconnection networks indicated below, and compute the maximum distance between any two processors in each. Note that it is important that you write down the processor number on each node.

- 1. Tree network
- 2. 4-cube network
- 3. A mesh network with 4X4 processors having the following connections between processors:

| Left  |
|-------|
| Right |
| Up    |
| Down  |
|       |