# **COMPUTER ENGINEERING DEPARTMENT**

# **ICS 233**

# **COMPUTER ARCHITECTURE & ASSEMBLY LANGUAGE**

# **Final Exam**

First Semester (081)

Time: 7:30-10:30 AM

Student Name : \_\_\_\_\_\_

Student ID. : \_\_\_\_\_

| Question | Max Points | Score |
|----------|------------|-------|
| Q1       | 30         |       |
| Q2       | 14         |       |
| Q3       | 15         |       |
| Q4       | 10         |       |
| Q5       | 15         |       |
| Q6       | 16         |       |
| Total    | 100        |       |

Dr. Aiman El-Maleh

### [30 Points]

(Q1) Consider the single-cycle datapath and control given below along with ALU and Next PC blocks design for the MIPS processor implementing a subset of the instruction set:





Details of Next PC

(i) Show the control signals generated for the execution of the following instructions by filling the table given below:

| Ор  | RegDst | RegWrite | ExtOp | ALUSrc | ALUOp | Beq | Bne | J | MemRead | MemWrite | MemtoReg |
|-----|--------|----------|-------|--------|-------|-----|-----|---|---------|----------|----------|
| add |        |          |       |        |       |     |     |   |         |          |          |
| ori |        |          |       |        |       |     |     |   |         |          |          |
| SW  |        |          |       |        |       |     |     |   |         |          |          |
| bne |        |          |       |        |       |     |     |   |         |          |          |
| j   |        |          |       |        |       |     |     |   |         |          |          |

The format of these instructions is given below for your reference:

|     | Instruction                | Meaning                 | Format     |                 |                 |                   |   |      |
|-----|----------------------------|-------------------------|------------|-----------------|-----------------|-------------------|---|------|
| add | rd, rs, rt                 | rd = rs + rt            | $Op^6 = 0$ | rs <sup>5</sup> | rt <sup>5</sup> | rd <sup>5</sup>   | 0 | 0x20 |
| ori | rt, rs, imm <sup>16</sup>  | $rt = rs \mid imm^{16}$ | 0x0d       | rs <sup>5</sup> | rt <sup>5</sup> | imm <sup>16</sup> |   |      |
| sw  | rt, imm <sup>16</sup> (rs) | MEM[rs+imm16]=rt        | 0x2b       | rs <sup>5</sup> | rt <sup>5</sup> | imm <sup>16</sup> |   |      |
| bne | rs, rt, label              | branch if (rs != rt)    | 0x05       | rs <sup>5</sup> | rt <sup>5</sup> | imm <sup>16</sup> |   |      |
| j   | label                      | Jump to label           | 0x02       |                 |                 | imm <sup>26</sup> |   |      |

(ii) We wish to add the following instructions to the MIPS single-cycle datapath. Add any necessary datapath modifications and control signals needed for the implementation of these instructions. Show only the <u>modified</u> and <u>added</u> components to the datapath. Show the values of the control signals to control the execution of each instruction.

#### a. sra

|     | Instruction Meaning      |                           | Format     |   |                 |                 |                  |                   |  |  |
|-----|--------------------------|---------------------------|------------|---|-----------------|-----------------|------------------|-------------------|--|--|
| sra | rd, rt, imm <sup>5</sup> | rd= rt>>imm <sup>16</sup> | $Op^6 = 0$ | 0 | rt <sup>5</sup> | rd <sup>5</sup> | Imm <sup>5</sup> | f <sup>5</sup> =3 |  |  |

#### **b.** bgtz

| Instruction |           | Meaning          | Format            |                   |  |  |  |
|-------------|-----------|------------------|-------------------|-------------------|--|--|--|
| bgtz        | rs, label | branch if (rs>0) | $Op^6 = 7 rs^5 0$ | imm <sup>16</sup> |  |  |  |

c. jal

| Inst | ruction | Meaning         |            | Format            |
|------|---------|-----------------|------------|-------------------|
| jal  | label   | \$31=PC+4, jump | $op^6 = 3$ | imm <sup>26</sup> |

d. jr

| Instruction | Meaning | Format     |                 |   |   |   |   |
|-------------|---------|------------|-----------------|---|---|---|---|
| jr rs       | PC=rs   | $op^6 = 0$ | rs <sup>5</sup> | 0 | 0 | 0 | 8 |

- (iii) Assume that the propagation delays for the major components used in the datapath are as follows:
  - Instruction and data memories: 100 ps
  - ALU and adders: 40 ps
  - Register file access (read or write): 10 ps
  - Main control: 15 ps
  - ALU control: 15 ps

Ignore the delays in the multiplexers, PC access, extension logic, and wires. What is the cycle time for the single-cycle datapath given above? Page 5 of 14

Page 6 of 14

Page 7 of 14

#### [14 Points]



## (Q2) Consider the pipelined MIPS processor design given below:

- (i) Show the control signals that will be used for forwarding along with their conditions. In case both forwarding conditions from the ALU and Memory Mux are met, which one should be allowed to forward?
- (ii) Show the control signals that will be used for stalling the pipeline along with their conditions.

Page 9 of 14

### [15 Points]

(Q3) Consider the code given below:

add \$1, \$1, \$2 sub \$1, \$1, \$3 lw \$2, (\$1) addi \$2, \$2, 4 sw \$2, (\$1)

- (i) Identify all the **RAW** data dependencies in the above code. Which dependencies are data hazards that will be resolved by forwarding? Which dependencies are data hazards that will cause a stall?
- (ii) Using a multiple-clock-cycle graphical representation, show the instruction execution across the pipeline including forwarding paths and stalled cycles if any. How many clock cycles will be needed to execute the instructions?

(Q4) Given a 1M x 1 memory block as shown below. Use this block to implement a 4M x 4 memory block.



(Q5) Assume that you have a 32-bit address and a cache with 4K byte data size (i.e. not including tag and valid bits).

- (i) Assuming that the cache is organized as **direct-mapped** with **4-byte block size**, determine the number of bits in the offset, index and tag fields.
- (ii) Assuming that the cache is organized as **four-way set associative** with **4-byte block size**, determine the number of bits in the offset, index and tag fields.
- (iii) Show the organization of the cache organized as four-way set associative with 4-byte block size.

Page 13 of 14

- (Q6) A processor runs at 3 GHz and has a CPI=2 for a perfect cache (i.e. without including the stall cycles due to cache misses). Assume that load and store instructions are 25% of the instructions. The processor has an I-cache with a 5% miss rate and a D-cache with 2.5% miss rate. The hit time is 1 clock cycle. Assume that the time required to transfer a block of data from the RAM to the cache, i.e. miss penalty, is 40 ns.
  - (i) What is the average memory access time for instruction access in clock cycles?
  - (ii) What is the average memory access time for data access in clock cycles?
  - (iii) What is the number of stall cycles per instruction and the overall CPI?
  - (iv) A new technology is proposed that can make the processor run at 4 GHz. The only impact of this technology is that the cache size has to be decreased to keep a hit time of one clock cycle. Assume that the time required to transfer a block of data from the RAM to the cache is reduced to 30 ns. What should be the number of stalls per instruction in the new processor to be faster by a factor of at least 1.2. What should be the instruction miss rate in the new technology if the data miss rate is 4%.