## COE 308, Computer Architecture, Term 982 HW# 5

Due date: Monday, April 26

**Q.1.** Suppose that it is required to design a hardware to execute the code given below, and you are given the implementation shown below as a possible solution. It is assumed that the propagation delay across the comparator (>) is 15ns, across the multiplexor (MUX) is 5 ns, and across the adder/subtractor blocks (ADD/SUB) is 20ns. Assume that in every clock cycle, the corresponding numbers in A[i] and B[i] will be stored in R1 and R2, respectively.



- (ii) Modify the implementation to get a 3-stage pipeline such that the clock period for each stage is 20ns. Then, determine the total time that will be taken for executing the code on the 3-stage pipeline implementation. What is the speedup factor compared to the original implementation.
- (iii) Modify the implementation to get a 2-stage pipeline such that the clock period for each stage is 25ns. Note here that the number of arithmetic blocks used in the modified implementation need not be the same. Then, determine the total time that will be taken for executing the code on the 3-stage pipeline implementation. What is the speedup factor compared to the original implementation.
- **Q.2.** A microprocessor has a 3-stage instruction pipeline, an instruction fetch unit (IF), an instruction decode unit (ID), and an instruction execution unit (IE). Assume that a conditional branch instruction outcome will be known in the instruction execution unit. Suppose that the following code is processed through the pipeline:

|     | 1 MOV CX, 4  | ; | CX=4     |
|-----|--------------|---|----------|
| L1: | 2 ADD AX, CX | ; | AX=AX+CX |
|     | 3 INC AX     | ; | AX=AX+1  |

| 4 INC BX     | ; | BX=BX+1                |
|--------------|---|------------------------|
| 5 DEC CX     | ; | CX=CX-1                |
| 6 JNZ L1     | ; | Jump to L1 if not zero |
| 7 ADD BX, AX | ; | BX=BX+AX               |

- (i) Draw a space-time diagram and compute the total code processing time assuming that the pipeline must be cleared after a branch instruction has been decoded.
- (ii) Draw a space-time diagram and compute the total code processing time assuming the use of a branch history table. Assume that when a branch instruction is executed initially it is not taken.
- (iii) Assume that there are two versions of the JNZ instruction, a delayed instruction by 2 cycles, called JNZD2, and a non-delayed JNZ instruction. Modify the code to take advantage of the delayed instruction and draw a space-time diagram and compute the total modified code processing time.
- **Q.3.** Problem 5.8 in page 221. Assume that the five-stage instruction pipline is (IF, ID, OF, EX, OS).
- **Q.4.** Problem 5.9 in page 222.
- **Q.5.** Problem 5.14 in page 223.
- **Q.6.** Problem 5.15 in page 223.