## ICS 233, Term 172

# Computer Architecture \& Assembly Language 

## HW\# 7

Q.1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by forwarding? Which dependencies are data hazards that will cause a stall? Using a multiple-clock-cycle graphical representation, show the forwarding paths and stalled cycles if any.

```
add $3, $4, $2
sub $5, $3, $1
lw $6, 200($3)
add $7, $3, $6
```

Q.2. We have a program of $10^{6}$ instructions in the format of " $1 \mathbf{w}$, add, $1 \mathbf{w}$, add, ...". The add instruction depends only on the $1 \mathbf{w}$ instruction right before it. The $1 \mathbf{w}$ instruction also depends only on the add instruction right before it. If this program is executed on the 5 -stage MIPS pipeline:
(i) Without forwarding, what would be the actual CPI?
(ii) With forwarding, what would be the actual CPI?
Q.3. We have a program core consisting of five conditional branches. The program core will be executed millions of times. Below are the outcomes of each branch for one execution of the program core ( T for taken and N for not taken).

Branch 1: T-T-T
Branch 2: T-N-N-N
Branch 3: T-N-T-N-T-N
Branch 4: T-T-T-N-T
Branch 5: T-T-N-T-T-N-T
Assume that the behavior of each branch remains the same for each program core execution. For dynamic branch prediction schemes, assume that each branch has its own prediction buffer and each buffer is initialized to the same state before each execution. List the predictions and the accuracies for each of the following branch prediction schemes:
(i) Always taken
(ii) Always not taken
(iii) 1-bit predictor, initialized to predict taken
(iv) 2-bit predictor, initialized to weakly predict taken
Q.4. Suppose a computer's address is $k$ bits (using byte addressing), the cache data size is $S$ bytes, the block size is $B=2^{b}$ bytes, and the cache is $m$-way set-associative. Figure out what the following quantities are in terms of $S, B, m, b$, and $k$.
(i) The number of sets in the cache.
(ii) The number of index bits in the address
(iii) The number of tag bits in the address
(iv) The total number of bits required to store all the valid and tag bits in the cache
Q.5. Consider a processor with a 2 ns clock cycle, a miss penalty of 20 clock cycles, a miss rate of 0.05 misses per instruction, and a cache access time (hit time) of 1 clock cycle. Assume that the read and write miss penalties are the same.
(i) Find the average memory access time (AMAT).
(ii) Suppose we can improve the miss rate to 0.03 misses per instruction by doubling the cache size. However, this causes the cache access time to increase to 1.2 cycles. Using the AMAT as a metric, determine if this is a good trade-off.
(iii) If the cache access time determines the processor's clock cycle time, which is often the case, AMAT may not correctly indicate whether one cache organization is better than another. If the processor's clock cycle time must be changed to match that of a cache, is this a good trade-off? Assume that the processors in part (i) and (ii) are identical, except for the clock rate and the cache miss rate. Assume 1.5 references per instruction (for both I-cache and D-cache) and a CPI without cache misses of 2. The miss penalty is 20 cycles for both processors.

