# Multicycle Implementation

CSE 308 - Computer Architecture

Prof. Muhamed Mudawar

Computer Engineering Department

King Fahd University of Petroleum and Minerals

#### Drawbacks of Single Cycle Processor

- Long cycle time
  - \* All instructions take as much time as the slowest

Arithmetic & Logical



- Functional units are duplicated raising cost
  - \* Each functional unit can be used once per clock cycle

Multicycle Implementation © Muhamed Mudawar, COE 308 – KFUPM Slide 2

## Solution = Multicycle Implementation

- Break instruction execution into five steps
  - \* Instruction fetch
  - \* Instruction decode and register read
  - **★** Execution, memory address calculation, or branch completion
  - ★ Memory access or ALU instruction completion
  - **★** Load instruction completion
- One step = One clock cycle (clock cycle is reduced)
  - **★** First 2 steps are the same for all instructions

| Instruction | # cycles | Instruction | # cycles |
|-------------|----------|-------------|----------|
| ALU         | 4        | Branch      | 3        |
| Load        | 5        | Store       | 4        |

Multicycle Implementation

© Muhamed Mudawar, COE 308 – KFUPM



#### Multicycle Datapath Changes

- Eliminating some of the components
  - \* Single memory unit for both instructions and data
  - ★ Single ALU eliminating branch address adder and PC adder

Note: modern CPUs maintain separate instruction and data memories as well as separate address adders, but we reduce them here because the same component can be used for different purposes in different cycles

- Adding temporary registers
  - \* Instruction Register: IR
  - ★ Memory Data Register: MDR
  - \* Register file output data registers: A and B
  - \* ALU output register: ALUout
  - \* Required to store major unit output values for use in next cycle

Multicycle Implementation

© Muhamed Mudawar, COE 308 – KFUPM

Slide 5

#### Multicycle Datapath Changes - cont'd

- This multicycle design can accommodate
  - \* One memory access per cycle
    - → IR register saves fetched instruction
    - ♦ MDR register saves the read memory data
  - ★ One register file access per cycle
    - ♦ Two registers can be read concurrently into A and B registers
  - ★ One ALU operation per cycle
    - ♦ ALUout register saves the ALU output
- Additional multiplexers are also needed
  - ★ Mux before the memory address to select PC or ALUout address
  - **★** Mux before 1st ALU input to select PC to increment or A register
  - \* Extended mux before PC to increment PC, branch, or jump

Multicycle Implementatio

© Muhamed Mudawar, COE 308 – KFUPM



| Control Signals           |                                               |                                                                  |                                             |  |  |
|---------------------------|-----------------------------------------------|------------------------------------------------------------------|---------------------------------------------|--|--|
| Signal                    | Effect                                        | when '0'                                                         | Effect when '1'                             |  |  |
| RegDst                    | Destination register = Rt                     |                                                                  | Destination register = Rd                   |  |  |
| RegWrite                  | None                                          |                                                                  | Register(RW) ← BusW                         |  |  |
| ExtOp                     | 16-bit immediate is zero-extended             |                                                                  | 16-bit immediate is sign-extended           |  |  |
| ALUSrcA                   | 1st ALU                                       | operand is PC (upper 30-bit)                                     | 1st ALU operand is the A register           |  |  |
| ALUSrcB                   | 2 <sup>nd</sup> ALU operand is the B register |                                                                  | 2 <sup>nd</sup> ALU input is extended-imm16 |  |  |
| MemRead                   | None                                          |                                                                  | MemData ← Memory[address]                   |  |  |
| MemWrite                  | None                                          |                                                                  | Memory[address] ← Data_in                   |  |  |
| MemtoReg                  | BusW = ALUout                                 |                                                                  | BusW = MDR                                  |  |  |
| IorD                      | Memory Address = PC                           |                                                                  | Memory Address = ALUout                     |  |  |
| IRWrite                   | None                                          |                                                                  | IR ← MemData                                |  |  |
| PCWrite                   | None                                          |                                                                  | PC ← NextPC                                 |  |  |
| Signal                    | Value                                         | Effect                                                           |                                             |  |  |
| PCSource                  | 00                                            | NextPC = PC[31:2] + 1 (increment upper 30 bits of PC)            |                                             |  |  |
|                           | 01                                            | NextPC = ALUout = PC[31:2] + 1 + sign-extend(imm16) (for branch) |                                             |  |  |
|                           | 10                                            | NextPC = PC[31:28], imm26 (for jump)                             |                                             |  |  |
| Multicycle Implementation |                                               |                                                                  |                                             |  |  |

















#### Instruction Execution Summary

| Cycle | Action                                                                                                                                          | Register Transfers                                                                                            |
|-------|-------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
| 1     | Fetch instruction                                                                                                                               | $IR \leftarrow Memory[PC]$ , $PC \leftarrow PC + 4$                                                           |
| 2     | Decode instruction Fetch registers Compute branch address in advance Jump completion (case of a jump)                                           | Generate control signals A ← Reg[Rs], B ← Reg[Rt] ALUout ← PC[31:2] + sign-extend(Imm16) PC ← PC[31:28], Im26 |
| 3     | Case 1: Execute R-type ALU Case 2: Execute I-type ALU Case 3: Compute load/store address Case 4: Branch completion                              | ALUout ← A funct B ALUout ← A op extend(Imm16) ALUout ← A + sign-extend(Imm16) if (Branch) PC ← ALUout        |
| 4     | Case 1: Write ALU result for R-type<br>Case 2: Write ALU result for I-type<br>Case 3: Access memory for load<br>Case 4: Access memory for store | Reg[Rd] ← ALUout<br>Reg[Rt] ← ALUout<br>MDR ← Memory[ALUout]<br>Memory[ALUout] ← B                            |
| 5     | Load instruction completion                                                                                                                     | Reg[Rt] ← MDR                                                                                                 |

Multicycle Implementation

© Muhamed Mudawar, COE 308 – KFUPM

Slide 17

## Defining the Control

- Control for multicycle datapath is more complex
  - ★ Because instruction is executed as a sequence of steps
- Values of control signals depend upon:
  - \* What instruction is being executed
  - ★ Which cycle is being performed
- ❖ Multicycle control is a Finite State Machine (FSM)
  - ★ While single-cycle control is a combinational logic
- Two implementation techniques for multicycle control
  - \* Set of states and transitions implemented directly in logic
  - **★** Microprogramming: a programming representation for control

Multicycle Implementation

© Muhamed Mudawar, COE 308 - KFUPM









# Multicycle Implementation Summary

- Reduces hardware
  - \* One unified memory for instruction and data, and one ALU
- Reduces clock cycle and time
  - **★** When compared to single-cycle implementation
- ❖ Breaks instruction execution into steps (step = 1 cycle)
- Internal registers in datapath
  - \* Save intermediate data for later cycles
- Finite State Machine (FSM) specification of control
- Implementation of control
  - ★ Hardwired control ⇒ a sequential machine
  - ★ Microprogramming (covered in textbook)

Multicycle Implementation

© Muhamed Mudawar, COE 308 - KFUPM