Modules # 5

Modules # 2 (Continued)

Reduced Instruction Set Computers

Important Architectural Development

· Processor-Memory Interface (memory interleaving, Cache, Register File)

· Pipelined Processor

· RISC Vs CISC.

· Multiprocessor Systems

Reduced Instruction Set Computers (RISCs)

1. Common characteristics shared by MOST RISC designs

· Limited and simple instruction set

· Large number of general purpose registers and/or the use of compiler technology to optimize register usage

· Optimization of the instruction pipeline

2. Instruction Execution Characteristics

· Semantic Gap

The difference between the operations provided in the HLLs and those provided in computer architecture leads to

1. execution inefficiency,

2. excessive machine program size, and

3. compiler complexity.

· Designers’ Response

Make the architecture complex

1. Large instruction sets, and

2. Dozens of addressing modes.

· Aspects of Computation Studied

I. Operations

Language	Pascal	FORTRAN	C
Assignment	74	67	38
Loop	4	3	3
Call	1	3	12
IF	20	11	43
GOTO	2	9	3
Other	-	7	1

Observations

1. Assignment statements predominates, i.e. simple movement of data is of high importance,

2. Conditional statements are substantial, i.e. sequencing of instructions is importance.

3. Procedure call/return was found to be the most time-consuming, i.e. it causes the execution of the most machine-language instructions.

II. Operands

· Majority of references are to simple scalar variables (about 60%)

· More than 80% of scalars are local variables (to procedures).

Implications:

A prime candidate for optimization is the mechanism for storing and accessing local scalar variables.

Procedure call/return

· a typical procedure employs only a few passed parameters and local variables (typically < 6 arguments and < 6 local variables)

· the depth of procedure activation fluctuates within a relatively narrow range (5 to 8).

Implications:

Do not make instruction set architecture closer to HLLs, rather optimize the performance of the most time-consuming features of typical HLL programs

Make the architecture simpler, not complex

Possible Techniques

1. Use a large number of registers to optimize operand referencing by reducing the memory references

2. Pay careful attention to the design of instruction pipelines, i.e. rearrange instructions for optimization of using smart compilers, e.g. delayed branch.

3. Simplify instruction set

The main philosophy becomes

Keep the most frequently accessed operands in registers and minimize register-memory operations

Two approaches are possible to achieve this

1. Software Approach

Use the compiler to maximize register usage by allocating registers to those variables that will be used the most in a given time period (this is the philosophy used in Stanford MIPs machine).

2. Hardware Approach

Use more registers so that more variables can be held in registers for larger periods of time (this is the philosophy used in the Berkeley RISC machines)

Register Windows

· Organize registers such that memory access is minimized.

· Multiple small sets of registers are used, each assigned to a different procedure.

· A procedure call automatically switches the CPU to use a different fixed-size window of registers rather than saving registers in memory at the call time

· At any time, only ONE window of registers is visible and is addressed as if it were the only set of registers

· Window Overlapping:

Temporary registers at one level are physically the same as the parameter registers at the next level. This overlap allows parameters to be passed without the actual movement of data.

Examples:

Berkeley RISC: 8 windows of 16 registers each

Pyramid RISC: 16 windows of 32 registers each

· A set of fixed number of registers in the CPU are identified as global registers and are available to all procedures, e.g. references to registers 0 through 7 could refer to unique global registers, and references to registers 8 through 31 could be registers in the current window.

Large Register File	Cache
All local scalars	Recently-used local scalars
Individual variables	Blocks of memory
Compiler-assigned global variables	Recently-used global variables
Register addressing	Memory addressing

Characteristics of RISC Architectures

1. One Instruction per machine cycle

2. Register-to-register operations

3. Simple addressing modes

4. Simple instruction formats

Summary Table for the features of a number of RISC and a CISC

	Motorola 88110	Alpha AXP 21064	Pentium	Power PC 601
Company	Motorola	DEC	Intel	IBM
Year	91	92	93	93
Architecture	RICS	RICS	CICS	RICS
# Registers(I)	32	32	64	32
Cache I/D	8/8 KB	8/8 KB	8/8 KB	32
# Registers (GP/FP)	32/32	32/32	8/8	32/32
# Inst/cycle	2	2	2	3
# pipelines (I/FP)	NS	7/10	5/8	4/6
Multiprocessing Support	No	Yes	Yes	Yes
Technology	CMOS	CMOS	BiCMOS	CMOS
# Transistors	1.3 m	1.68 m	3.1 m	2.8 m
Clock MHZ	33	200	66	80

Characteristic	VAX-11 (CISC)	RISC-1 (RISC)
Year Developed	1978	1981
No. Instructions	303	31
Instruction size (bits)	16-456	32
Addressing Modes	22	3
No. General purpose registers	16	138
Control memory size	480 Kb	0
Cache Size	64 Kb	0

RISC Example 1: The SPARC (Scalable Processor ARChitecture)

· SPARC is closely based on the Berkeley RISC architecture.

· Technical Overview

¨All instructions and registers, including floating point registers, are32 bits.

¨It is a LOAD/STORE architecture, i.e. all operations take place on operands located in the 32-bit registers.

¨It is based on a four-stage pipeline, Fetch, Decode, Execute, and Write.

¨It uses an overlapping register window scheme with 32 registers visible at any instant. A 5-bit variable, called Current Window Pointer (CWP) is used to point to the current register set.

¨All SPARC instructions occupy a full word (32 bits).

¨All arithmetic and logical instructions have three operands and have the form

Destination := sourece1 op source 2

2 5 6 5 1 8 5

SRC 2 DST Op-Code SRC 1 0 FP-OP

DST Op-Code SRC 1 1 Immediate Constant

¨The LOAD and STORE instructions may use either of the above formats with DST being the register to be loaded or stored. The low order 19 bits of the instructions determine the effective address as follows:

1. Memory address = sum of SRC1 and SRC 2

2. Memory address = sum SRC1 and a signed 13-bit constant.

¨Instructions load and store 8-, 16-, 32-, and 64-bit quantities into 32-bit registers.

¨Two ways are provided for calling procedures.

1. The CALL instruction uses a 30-bit PC relative offset.

2 30

PC-Relative Displacement

2. The JMPL which uses any of the instruction formats used for arithmetic and logical operations and which allows the return address to be put in any register.

¨The SAVE and the RESTORE instructions manipulate the register window and stack pointer. Both of them trap when the next (previous) window is not available.

¨A summary of the SPARC integer instructions are shown in the handout.

¨SPARC Floating Point

* The FPU contains 32 32-bit registers to hold 32 single precision (32-bit) floating point operands, 16 double-precision (64-bit) operands, or 8 extended-precision (128-bit) operands.

* The FPU can execute about 20 floating-point instructions most of them in single-, double, or extended precision using the first instruction format used for aritmetic.

* In addition to instructions for loading, and storing FPU’s registers, the CPU can also test FPU’s registers and branch conditionally on results.

¨ SPARC Memory Management

* A conventional MMU supporting a single paged 32-bit address space using three levels of address translation.

12 8 6 6 12

Context Index 1 Index 2 Index 3 Offset

Level 1

4K page

Context

Table (4K)

RISC Example 2: The MIPS (Microprocessor without Interlock Pipe Stages)

· Technical Overview

· A 32-bit five-stage pipelined LOAD/STORE machine which executes one simple instruction per cycle.

· The five stages are Fetch, Decode & read operand register, ALU operation, start data write, and store ALU output in destination register.

· MIPS has a single set of 32 general-purpose registers. It does not have overlapping. The MIPS compiler optimizes the use of registers in whatever way best for the program currently being compiled. Notice that the MIPS approaches uses a Software approach.

· Consider the call graph shown below.

Main

A B

C D E

Main Main Main Main

A A A

C D

· MIPS Instruction Set

6 5 5 5 5 6

Op-Code SRC 1 SRC 2 DEST SHIFT Function

6 5 5 16

Op-Code SRC DST Immediate Constant

6 26

Op-Code Jump Target

· LOAD/STORE instructions use indexed addressing by adding a 16-bit signed constant to a register using format 2.

· Surprisingly, non-RISC instructions such as MULT and DIV were included and they use special functional units. The contents of two registers can be multiplied or divided and the 64-bit product is kept in two special registers LO and HI.

· MIPS Memory management

The 32-bit virtual address are divided into a 20-bit virtual page number and a 12-bit offset within the page. A TLB is used and it is right on the CPU chip.

· A summary of the MIPS instructions is given in the attached handout.

King Fahd University of Petroleum and Minerals

College of Computer Sciences and Engineering

Department of Computer Engineering

COE 308 Computer Architecture (993)

Instructor: Mostafa Abd-El-Barr

Module # 5: RISC vs CISC

Summary

1. Common characteristics shared by MOST RISC designs

· Limited and simple instruction set

· Large number of general purpose registers and/or the use of compiler technology to optimize register usage

· Optimization of the instruction pipeline

Characteristic	VAX-11 (CISC)	RISC-1 (RISC)
Year Developed	1978	1981
No. Instructions	303	31
Instruction size (bits)	16-456	32
Addressing Modes	22	3
No. General purpose registers	16	138
Control memory size	480 Kb	0
Cache Size	64 Kb	0

2. Instruction Execution Characteristics

· Semantic Gap: The difference between the operations provided in the HLLs and those provided in computer architecture leads to (a) execution inefficiency, (b) excessive machine program size, and (c) compiler complexity.

· Designers’ Response: Make the architecture complex: (a) Large instruction sets, (b) Dozens of addressing modes

· Aspects of Computation Studied

Operations

Language	Pascal	FORTRAN	C
Assignment	74	67	38
Loop	4	3	3
Call	1	3	12
IF	20	11	43
GOTO	2	9	3
Other	-	7	1

Observations

(a) Assignment statements predominates, i.e. simple movement of data is of high importance, (b) Conditional statements are substantial, i.e. sequencing of instructions is importance. (c) Procedure call/return was found to be the most time-consuming, i.e. it causes the execution of the most machine-language instructions.

Operands (a) Majority of references are to simple scalar variables (about 60%) (b) More than 80% of scalars are local variables (to procedures).

Implications:

A prime candidate for optimization is the mechanism for storing and accessing local scalar variables.

Procedure call/return (a) a typical procedure employs only a few passed parameters and local variables (typically < 6 arguments and < 6 local variables). (b) depth of procedure activation fluctuates within a relatively narrow range (5-8).

Implications:

Do not make instruction set architecture closer to HLLs, rather optimize the performance of the most time-consuming features of typical HLL programs. Make the architecture simpler, not complex

Possible Techniques: (a) Use a large number of registers to optimize operand referencing by reducing the memory references (b) Pay careful attention to the design of instruction pipelines, i.e. rearrange instructions for optimization of using smart compilers, e.g. delayed branch. (c) Simplify instruction set

The main philosophy becomes

Keep the most frequently accessed operands in registers and minimize register-memory operations

Two approaches are possible to achieve this (a) Software Approach: Use the compiler to maximize register usage by allocating registers to those variables that will be used the most in a given time period (this is the philosophy used in Stanford MIPs machine). (b) Hardware Approach: Use more registers so that more variables can be held in registers for larger periods of time (this is the philosophy used in the Berkeley RISC machines)

Register Windows

· Organize registers such that memory access is minimized.

· Multiple small sets of registers are used, each assigned to a different procedure.

· A procedure call automatically switches the CPU to use a different fixed-size window of registers rather than saving registers in memory at the call time

· At any time, only ONE window of registers is visible and is addressed as if it were the only set of registers

· Window Overlapping: Temporary registers at one level are physically the same as the parameter registers at the next level. This overlap allows parameters to be passed without the actual movement of data.

Berkeley RISC: 8 windows of 16 registers each

Pyramid RISC: 16 windows of 32 registers each

Large Register File	Cache
All local scalars	Recently-used local scalars
Individual variables	Blocks of memory
Compiler-assigned global variables	Recently-used global variables
Register addressing	Memory addressing

Characteristics of RISC Architectures: (a) One Instruction per machine cycle (b) Register-to-register operations (c) Simple addressing modes (d) Simple instruction formats.

RISC VS CISC Controversy

1. Performance

· Normalized Execution Times

Benchmark	RISC	VAX-11/780
C	1.0	2.1
Assembly	0.9	0.9

2. Implementation

Characteristic	RISC	MC68000
Number of devices	44K	68K
Regularity Factor	25	12
Percentage of Area consumed by Control Unit	6	50
Design time in months	19	30
Design effort in man-month	15	100
Layout effort in man-month	12	70

RISC Example 1: The SPARC (Scalable Processor ARChitecture)

· SPARC is closely based on the Berkeley RISC architecture.

· Technical Overview

¨All instructions and registers, including floating point registers, are32 bits.

¨It is a LOAD/STORE architecture, i.e. all operations take place on operands located in the 32-bit registers.

¨It is based on a four-stage pipeline, Fetch, Decode, Execute, and Write.

¨It uses an overlapping register window scheme with 32 registers visible at any instant. A 5-bit variable, called Current Window Pointer (CWP) is used to point to the current register set.

¨All SPARC instructions occupy a full word (32 bits).

¨All arithmetic and logical instructions have three operands and have the form

Destination := sourece1 op source 2

2 5 6 5 1 8 5

DST Op-Code SRC 1 0 FP-OP SRC 2

DST Op-Code SRC 1 1 Immediate Constant

3. Memory address = sum of SRC1 and SRC 2

4. Memory address = sum SRC1 and a signed 13-bit constant.

¨Instructions load and store 8-, 16-, 32-, and 64-bit quantities into 32-bit registers.

¨Two ways are provided for calling procedures.

3. The CALL instruction uses a 30-bit PC relative offset.

2 30

PC-Relative Displacement

2. The JMPL which uses any of the instruction formats used for arithmetic and logical operations and which allows the return address to be put in any register.

¨The SAVE and the RESTORE instructions manipulate the register window and stack pointer. Both of them trap when the next (previous) window is not available.

¨A summary of the SPARC integer instructions are shown in the handout.

¨SPARC Floating Point

* The FPU contains 32 32-bit registers to hold 32 single precision (32-bit) floating point operands, 16 double-precision (64-bit) operands, or 8 extended-precision (128-bit) operands.

* The FPU can execute about 20 floating-point instructions most of them in single-, double, or extended precision using the first instruction format used for aritmetic.

* In addition to instructions for loading, and storing FPU’s registers, the CPU can also test FPU’s registers and branch conditionally on results.

¨ SPARC Memory Management

* A conventional MMU supporting a single paged 32-bit address space using three levels of address translation.

12 8 6 6 12

Context Index 1 Index 2 Index 3 Offset

Level 1

4K page

Context

Table (4K)

RISC Example 2: The MIPS (Microprocessor without Interlock Pipe Stages)

· Technical Overview

· A 32-bit five-stage pipelined LOAD/STORE machine which executes one simple instruction per cycle.

· The five stages are Fetch, Decode & read operand register, ALU operation, start data write, and store ALU output in destination register.

· Consider the call graph shown below.

Main

A B

C D E

Main Main main Main

A A A

C D

· MIPS Instruction Se

6 5 5 5 5 6

Op-Code SRC 1 SRC 2 DEST SHIFT Function

6 5 5 16

Op-Code SRC DST Immediate Constant

6 26

Op-Code Jump Target

· LOAD/STORE instructions use indexed addressing by adding a 16-bit signed constant to a register using format 2.

· MIPS Memory management

The 32-bit virtual address are divided into a 20-bit virtual page number and a 12-bit offset within the page. A TLB is used and it is right on the CPU chip.

· A summary of the MIPS instructions is given in the attached handout.

Summary Table for the features of a number of RISC and a CISC

	Motorola 88110	Alpha AXP 21064	Pentium	Power PC 601
Company	Motorola	DEC	Intel	IBM
Year	91	92	93	93
Architecture	RICS	RICS	CICS	RICS
# Registers(I)	32	32	64	32
Cache I/D	8/8 KB	8/8 KB	8/8 KB	32
# Registers (GP/FP)	32/32	32/32	8/8	32/32
# Inst/cycle	2	2	2	3
# pipelines (I/FP)	NS	7/10	5/8	4/6
Multiprocessing Support	No	Yes	Yes	Yes
Technology	CMOS	CMOS	BiCMOS	CMOS
# Transistors	1.3 m	1.68 m	3.1 m	2.8 m
Clock MHZ	33	200	66	80