Modules # 2 (Continued) Reduced Instruction Set
Computers |
Important Architectural Development |
· Processor-Memory Interface
(memory interleaving, Cache, Register File)
· Pipelined Processor
· RISC Vs CISC.
· Multiprocessor Systems
Reduced Instruction Set Computers (RISCs) |
1.
Common characteristics
shared by MOST RISC designs
· Limited and simple
instruction set
· Large number of general
purpose registers and/or the use of compiler technology to optimize register
usage
· Optimization of the instruction pipeline
2.
Instruction Execution
Characteristics
· Semantic Gap
The difference between the
operations provided in the HLLs and those provided in computer architecture
leads to
1.
execution
inefficiency,
2.
excessive
machine program size, and
3.
compiler
complexity.
· Designers’ Response
Make the
architecture complex |
1.
Large
instruction sets, and
2.
Dozens
of addressing modes.
· Aspects of Computation
Studied
I.
Operations
Language |
Pascal |
FORTRAN |
C |
Assignment |
74 |
67 |
38 |
Loop |
4 |
3 |
3 |
Call |
1 |
3 |
12 |
IF |
20 |
11 |
43 |
GOTO |
2 |
9 |
3 |
Other |
- |
7 |
1 |
Observations
1.
Assignment
statements predominates, i.e. simple movement of data is of high importance,
2.
Conditional
statements are substantial, i.e. sequencing of instructions is importance.
3.
Procedure call/return was found to be
the most time-consuming, i.e. it causes the execution of the most
machine-language instructions.
II. Operands
· Majority of references are
to simple scalar variables (about 60%)
· More than 80% of scalars are
local variables (to procedures).
Implications:
A prime candidate for
optimization is the mechanism for storing and accessing local scalar variables.
Procedure call/return
· a typical procedure employs
only a few passed parameters and local variables (typically < 6 arguments
and < 6 local variables)
· the depth of procedure
activation fluctuates within a relatively narrow range (5 to 8).
Implications:
Do not make instruction set architecture closer to
HLLs, rather optimize the performance of the most time-consuming features of
typical HLL programs |
Make the
architecture simpler, not complex |
Possible Techniques
1.
Use a large number of registers to
optimize operand referencing by reducing the memory references
2.
Pay careful attention to the design of
instruction pipelines, i.e. rearrange instructions for optimization of using
smart compilers, e.g. delayed branch.
3.
Simplify instruction set
The main philosophy becomes
Keep the most frequently accessed operands in
registers and minimize register-memory operations |
Two approaches are possible
to achieve this
1.
Software Approach
Use the compiler to maximize
register usage by allocating registers to those variables that will be used the
most in a given time period (this is the philosophy used in Stanford MIPs
machine).
2.
Hardware Approach
Use more registers so that
more variables can be held in registers for larger periods of time (this is the
philosophy used in the Berkeley RISC machines)
Register Windows
· Organize registers such that
memory access is minimized.
· Multiple small sets of
registers
are used, each assigned to a different procedure.
· A procedure call
automatically switches the CPU to use a different fixed-size window of
registers rather than saving registers in memory at the call time
· At any time, only ONE window
of registers is visible and is addressed as if it were the only set of
registers
· Window Overlapping:
Temporary registers at one
level are physically the same as the parameter registers at the next level.
This overlap allows parameters to be passed without the actual movement of
data.
Examples:
Berkeley RISC: 8 windows of
16 registers each
Pyramid RISC: 16 windows of
32 registers each
· A set of fixed number of
registers in the CPU are identified as global registers and are
available to all procedures, e.g. references to registers 0 through 7 could
refer to unique global registers, and references to registers 8 through 31
could be registers in the current window.
Large Register File |
Cache |
All local scalars |
Recently-used local
scalars |
Individual variables |
Blocks of memory |
Compiler-assigned global
variables |
Recently-used global
variables |
Register addressing |
Memory addressing |
Characteristics of RISC
Architectures
1.
One
Instruction per machine cycle
2.
Register-to-register
operations
3.
Simple addressing modes
4.
Simple
instruction formats
Summary Table for the
features of a number of RISC and a CISC
|
Motorola 88110 |
Alpha AXP 21064 |
Pentium |
Power PC 601 |
Company |
Motorola |
DEC |
Intel |
IBM |
Year |
91 |
92 |
93 |
93 |
Architecture |
RICS |
RICS |
CICS |
RICS |
# Registers(I) |
32 |
32 |
64 |
32 |
Cache I/D |
8/8 KB |
8/8 KB |
8/8 KB |
32 |
# Registers (GP/FP) |
32/32 |
32/32 |
8/8 |
32/32 |
# Inst/cycle |
2 |
2 |
2 |
3 |
# pipelines (I/FP) |
NS |
7/10 |
5/8 |
4/6 |
Multiprocessing Support |
No |
Yes |
Yes |
Yes |
Technology |
CMOS |
CMOS |
BiCMOS |
CMOS |
# Transistors |
1.3 m |
1.68 m |
3.1 m |
2.8 m |
Clock MHZ |
33 |
200 |
66 |
80 |
Characteristic |
VAX-11 (CISC) |
RISC-1 (RISC) |
Year Developed |
1978 |
1981 |
No. Instructions |
303 |
31 |
Instruction size (bits) |
16-456 |
32 |
Addressing Modes |
22 |
3 |
No. General purpose registers |
16 |
138 |
Control memory size |
480 Kb |
0 |
Cache Size |
64 Kb |
0 |
RISC Example 1: The SPARC (Scalable
Processor ARChitecture)
·
SPARC is closely based on the Berkeley RISC
architecture.
·
Technical Overview
¨All instructions and
registers, including floating point registers, are32 bits.
¨It is a LOAD/STORE
architecture, i.e. all operations take place on operands located in the 32-bit
registers.
¨It is based on a four-stage
pipeline, Fetch, Decode, Execute, and Write.
¨It uses an overlapping
register window scheme with 32 registers visible at any instant. A 5-bit
variable, called Current Window Pointer (CWP) is used to point to the current
register set.
¨All SPARC instructions
occupy a full word (32 bits).
¨All arithmetic and logical
instructions have three operands and have the form
Destination := sourece1
op source 2
2 5
6
5 1
8
5
SRC 2 DST Op-Code
SRC 1
0
FP-OP
DST
Op-Code
SRC 1 1
Immediate Constant
¨The LOAD and STORE
instructions may use either of the above formats with DST being the register to
be loaded or stored. The low order 19 bits of the instructions determine the
effective address as follows:
1.
Memory address = sum of SRC1 and SRC 2
2.
Memory address = sum SRC1 and a signed
13-bit constant.
¨Instructions load and store
8-, 16-, 32-, and 64-bit quantities into 32-bit registers.
¨Two ways are provided for
calling procedures.
1.
The CALL instruction uses a 30-bit PC
relative offset.
2
30
PC-Relative Displacement
2.
The
JMPL which uses any of the instruction formats used for arithmetic and logical
operations and which allows the return address to be put in any register.
¨The SAVE and the RESTORE
instructions manipulate the register window and stack pointer. Both of them
trap when the next (previous) window is not available.
¨A summary of the SPARC
integer instructions are shown in the handout.
¨SPARC Floating Point
*
The FPU contains 32 32-bit registers to
hold 32 single precision (32-bit) floating point operands, 16 double-precision
(64-bit) operands, or 8 extended-precision (128-bit) operands.
*
The
FPU can execute about 20 floating-point instructions most of them in single-,
double, or extended precision using the first instruction format used for
aritmetic.
*
In
addition to instructions for loading, and storing FPU’s registers, the CPU can
also test FPU’s registers and branch conditionally on results.
¨ SPARC Memory Management
*
A
conventional MMU supporting a single paged 32-bit address space using three
levels of address translation.
12 8
6 6
12
Context
Index 1 Index 2 Index 3
Offset
Level 1
4K
page
Context
Table (4K)
RISC Example 2: The MIPS (Microprocessor
without Interlock Pipe Stages)
· Technical Overview
· A 32-bit five-stage
pipelined LOAD/STORE machine which executes one simple instruction per cycle.
· The five stages are Fetch, Decode & read operand
register, ALU operation, start data write, and store ALU output in destination
register.
· MIPS has a single set of 32 general-purpose registers. It
does not have overlapping. The MIPS compiler optimizes the use of
registers in whatever way best for the program currently being compiled. Notice
that the MIPS approaches uses a Software approach.
· Consider the call graph
shown below.
Main
A
B
C
D
E
Main
Main
Main
Main
A
A
A
C
D
· MIPS Instruction Set
6
5
5
5
5
6
Op-Code
SRC 1 SRC
2 DEST SHIFT Function
6
5
5
16
Op-Code
SRC DST
Immediate Constant
6
26
Op-Code
Jump Target
· LOAD/STORE instructions use
indexed addressing by adding a 16-bit signed constant to a register using
format 2.
· Surprisingly, non-RISC
instructions such as MULT and DIV were included and they use special functional
units. The contents of two registers can be multiplied or divided and the
64-bit product is kept in two special registers LO and HI.
· MIPS Memory management
The 32-bit virtual address
are divided into a 20-bit virtual page number and a 12-bit offset within the
page. A TLB is used and it is right on the CPU chip.
· A summary of the MIPS
instructions is given in the attached handout.
King Fahd University of
Petroleum and Minerals
College of Computer Sciences
and Engineering
Department of Computer
Engineering
COE 308 Computer Architecture (993)
Instructor: Mostafa Abd-El-Barr
Module # 5: RISC vs CISC Summary |
1. Common
characteristics shared by MOST RISC designs
· Limited and simple instruction set
· Large number of general purpose registers and/or the use of compiler technology to optimize register usage
· Optimization of the instruction pipeline
Characteristic |
VAX-11
(CISC) |
RISC-1
(RISC) |
Year Developed |
1978 |
1981 |
No. Instructions |
303 |
31 |
Instruction size (bits) |
16-456 |
32 |
Addressing Modes |
22 |
3 |
No. General purpose registers |
16 |
138 |
Control memory size |
480 Kb |
0 |
Cache Size |
64 Kb |
0 |
2. Instruction
Execution Characteristics
· Semantic Gap: The difference between the operations provided in the HLLs and those provided in computer architecture leads to (a) execution inefficiency, (b) excessive machine program size, and (c) compiler complexity.
· Designers’ Response: Make the architecture complex: (a) Large instruction sets, (b) Dozens of addressing modes
· Aspects
of Computation Studied
Operations
Language |
Pascal |
FORTRAN |
C |
Assignment |
74 |
67 |
38 |
Loop |
4 |
3 |
3 |
Call |
1 |
3 |
12 |
IF |
20 |
11 |
43 |
GOTO |
2 |
9 |
3 |
Other |
- |
7 |
1 |
Observations
(a) Assignment statements predominates, i.e. simple movement of data is of high importance, (b) Conditional statements are substantial, i.e. sequencing of instructions is importance. (c) Procedure call/return was found to be the most time-consuming, i.e. it causes the execution of the most machine-language instructions.
Operands
(a) Majority of references are to simple scalar variables (about
60%) (b) More than 80% of scalars are local variables
(to procedures).
Implications:
A
prime candidate for optimization is the mechanism for storing and accessing
local scalar variables.
Procedure call/return (a) a typical procedure employs only a few passed parameters and local variables (typically < 6 arguments and < 6 local variables). (b) depth of procedure activation fluctuates within a relatively narrow range (5-8).
Implications:
Do not make instruction set
architecture closer to HLLs, rather optimize the performance of the most
time-consuming features of typical HLL programs. Make the architecture
simpler, not complex |
Possible Techniques: (a) Use a large number of registers to optimize operand referencing by reducing the memory references (b) Pay careful attention to the design of instruction pipelines, i.e. rearrange instructions for optimization of using smart compilers, e.g. delayed branch. (c) Simplify instruction set
The main philosophy becomes
Keep the most frequently accessed operands in registers and minimize register-memory operations |
Two approaches are possible to achieve this (a) Software Approach: Use the compiler to maximize register usage by allocating registers to those variables that will be used the most in a given time period (this is the philosophy used in Stanford MIPs machine). (b) Hardware Approach: Use more registers so that more variables can be held in registers for larger periods of time (this is the philosophy used in the Berkeley RISC machines)
Register
Windows
· Organize registers such that memory access is minimized.
· Multiple small sets of registers are used, each assigned to a different procedure.
· A procedure call automatically switches the CPU to use a different fixed-size window of registers rather than saving registers in memory at the call time
· At any time, only ONE window of registers is visible and is addressed as if it were the only set of registers
· Window Overlapping: Temporary registers at one level are physically the same as the parameter registers at the next level. This overlap allows parameters to be passed without the actual movement of data.
Berkeley RISC:
8 windows of 16 registers each
Pyramid RISC:
16 windows of 32 registers each
· A set of fixed number of registers in the CPU are identified as global registers and are available to all procedures, e.g. references to registers 0 through 7 could refer to unique global registers, and references to registers 8 through 31 could be registers in the current window.
Large Register File |
Cache |
All local scalars |
Recently-used local scalars |
Individual variables |
Blocks of memory |
Compiler-assigned global variables |
Recently-used global variables |
Register addressing |
Memory addressing |
Characteristics of RISC Architectures: (a) One Instruction per machine cycle (b) Register-to-register operations (c) Simple addressing modes (d) Simple instruction formats.
RISC VS
CISC Controversy
1. Performance
· Normalized Execution Times
Benchmark |
RISC |
VAX-11/780 |
C |
1.0 |
2.1 |
Assembly |
0.9 |
0.9 |
2. Implementation
Characteristic |
RISC |
MC68000 |
Number of devices |
44K |
68K |
Regularity Factor |
25 |
12 |
Percentage of Area consumed by Control Unit |
6 |
50 |
Design time in months |
19 |
30 |
Design effort in man-month |
15 |
100 |
Layout effort in man-month |
12 |
70 |
RISC
Example 1: The SPARC (Scalable Processor ARChitecture)
·
SPARC is closely based on the Berkeley RISC
architecture.
·
Technical Overview
¨All instructions and
registers, including floating point registers, are32 bits.
¨It is a LOAD/STORE
architecture, i.e. all operations take place on operands located in the 32-bit
registers.
¨It is based on a four-stage
pipeline, Fetch, Decode, Execute, and Write.
¨It uses an overlapping
register window scheme with 32 registers visible at any instant. A 5-bit
variable, called Current Window Pointer (CWP) is used to point to the current
register set.
¨All SPARC instructions
occupy a full word (32 bits).
¨All arithmetic and logical
instructions have three operands and have the form
Destination := sourece1
op source 2
2
5
6
5 1
8
5
DST Op-Code SRC 1
0
FP-OP
SRC 2
DST
Op-Code
SRC 1
1
Immediate Constant
¨The LOAD and STORE
instructions may use either of the above formats with DST being the register to
be loaded or stored. The low order 19 bits of the instructions determine the
effective address as follows:
3.
Memory address = sum of SRC1 and SRC 2
4.
Memory address = sum SRC1 and a signed
13-bit constant.
¨Instructions load and store
8-, 16-, 32-, and 64-bit quantities into 32-bit registers.
¨Two ways are provided for
calling procedures.
3.
The CALL instruction uses a 30-bit PC
relative offset.
2
30
PC-Relative
Displacement
2.
The
JMPL which uses any of the instruction formats used for arithmetic and logical
operations and which allows the return address to be put in any register.
¨The SAVE and the RESTORE
instructions manipulate the register window and stack pointer. Both of them
trap when the next (previous) window is not available.
¨A summary of the SPARC
integer instructions are shown in the handout.
¨SPARC Floating Point
*
The FPU contains 32 32-bit registers to
hold 32 single precision (32-bit) floating point operands, 16 double-precision
(64-bit) operands, or 8 extended-precision (128-bit) operands.
*
The
FPU can execute about 20 floating-point instructions most of them in single-,
double, or extended precision using the first instruction format used for
aritmetic.
*
In
addition to instructions for loading, and storing FPU’s registers, the CPU can
also test FPU’s registers and branch conditionally on results.
¨ SPARC Memory Management
*
A
conventional MMU supporting a single paged 32-bit address space using three
levels of address translation.
12
8
6 6
12
Context
Index 1 Index 2 Index 3
Offset
Level
1
4K page
Context
Table (4K)
RISC Example 2: The MIPS (Microprocessor without
Interlock Pipe Stages)
· Technical Overview
· A 32-bit five-stage
pipelined LOAD/STORE machine which executes one simple instruction per cycle.
· The five stages are Fetch, Decode & read operand
register, ALU operation, start data write, and store ALU output in destination
register.
· MIPS has a single set of 32 general-purpose registers. It
does not have overlapping. The MIPS compiler optimizes the use of
registers in whatever way best for the program currently being compiled. Notice
that the MIPS approaches uses a Software approach.
· Consider the call graph
shown below.
Main
A
B
C
D
E
Main
Main
main
Main
A
A
A
C
D
· MIPS Instruction Se
6
5
5
5
5
6
Op-Code
SRC 1 SRC
2 DEST SHIFT Function
6
5
5
16
Op-Code
SRC DST
Immediate Constant
6
26
Op-Code
Jump Target
· LOAD/STORE instructions use
indexed addressing by adding a 16-bit signed constant to a register using
format 2.
· Surprisingly, non-RISC
instructions such as MULT and DIV were included and they use special functional
units. The contents of two registers can be multiplied or divided and the
64-bit product is kept in two special registers LO and HI.
· MIPS Memory management
The 32-bit virtual address
are divided into a 20-bit virtual page number and a 12-bit offset within the
page. A TLB is used and it is right on the CPU chip.
· A summary of the MIPS
instructions is given in the attached handout.
Summary Table for the
features of a number of RISC and a CISC
|
Motorola 88110 |
Alpha AXP 21064 |
Pentium |
Power PC 601 |
Company |
Motorola |
DEC |
Intel |
IBM |
Year |
91 |
92 |
93 |
93 |
Architecture |
RICS |
RICS |
CICS |
RICS |
# Registers(I) |
32 |
32 |
64 |
32 |
Cache I/D |
8/8 KB |
8/8 KB |
8/8 KB |
32 |
# Registers (GP/FP) |
32/32 |
32/32 |
8/8 |
32/32 |
# Inst/cycle |
2 |
2 |
2 |
3 |
# pipelines (I/FP) |
NS |
7/10 |
5/8 |
4/6 |
Multiprocessing Support |
No |
Yes |
Yes |
Yes |
Technology |
CMOS |
CMOS |
BiCMOS |
CMOS |
# Transistors |
1.3 m |
1.68 m |
3.1 m |
2.8 m |
Clock MHZ |
33 |
200 |
66 |
80 |