Modules # 2 (Continued)

Reduced Instruction Set Computers

 

Important Architectural Development

 

 

                   Processor-Memory Interface (memory interleaving, Cache, Register File)

                   Pipelined Processor

                   RISC Vs CISC.

                   Multiprocessor Systems

 

 

Reduced Instruction Set Computers (RISCs)

 

 

 

 1.  Common characteristics shared by MOST RISC designs

 

         Limited and simple instruction set

 

         Large number of general purpose registers and/or the use of compiler technology to optimize register usage

 

         Optimization of  the instruction pipeline

 

 

 2.  Instruction Execution Characteristics

 

        Semantic Gap

 

The difference between the operations provided in the HLLs and those provided in computer architecture leads to

 

1.      execution inefficiency,

 

2.     excessive machine program size, and

 

3.     compiler complexity.

 

       Designersí Response

 

Make the architecture complex

           

1.     Large instruction sets, and

 

2.     Dozens of addressing modes.

 

       Aspects of Computation Studied

 

  I.  Operations

 

Language

Pascal

FORTRAN

C

Assignment

74

67

38

Loop

4

3

3

Call

1

3

12

IF

20

11

43

GOTO

2

9

3

Other

-

7

1

 

Observations

 

 1. Assignment statements predominates, i.e. simple movement of data is of high importance,

 

 2. Conditional statements are substantial, i.e. sequencing of instructions is importance.

 

 3.  Procedure call/return was found to be the most time-consuming, i.e. it causes the execution of the most machine-language instructions.

II. Operands

 

              Majority of references are to simple scalar variables (about 60%)

              More than 80% of scalars are local variables (to procedures).

 

Implications:

 

A prime candidate for optimization is the mechanism for storing and accessing local scalar variables.

 

Procedure call/return

 

          a typical procedure employs only a few passed parameters and local variables (typically < 6 arguments and < 6 local variables)

 

          the depth of procedure activation fluctuates within a relatively narrow range (5 to 8).

 

Implications:

 

Do not make instruction set architecture closer to HLLs, rather optimize the performance of the most time-consuming features of typical HLL programs

 

Make the architecture simpler, not complex

 

Possible Techniques

 

 1.  Use a large number of registers to optimize operand referencing by reducing the memory references

 

 2.  Pay careful attention to the design of instruction pipelines, i.e. rearrange instructions for optimization of using smart compilers, e.g. delayed branch.

 

 3.  Simplify instruction set

 

The main philosophy becomes

 

Keep the most frequently accessed operands in registers and minimize register-memory operations

Two approaches are possible to achieve this

 

 1.  Software Approach

 

Use the compiler to maximize register usage by allocating registers to those variables that will be used the most in a given time period (this is the philosophy used in Stanford MIPs machine).

 

 2.  Hardware Approach

 

Use more registers so that more variables can be held in registers for larger periods of time (this is the philosophy used in the Berkeley RISC machines)

 

Register Windows

 

       Organize registers such that memory access is minimized.

 

       Multiple small sets of registers are used, each assigned to a different procedure.

 

       A procedure call automatically switches the CPU to use a different fixed-size window of registers rather than saving registers in memory at the call time

 

       At any time, only ONE window of registers is visible and is addressed as if it were the only set of registers

 

       Window Overlapping:

 

Temporary registers at one level are physically the same as the parameter registers at the next level. This overlap allows parameters to be passed without the actual movement of data.

 

Examples:

 

Berkeley RISC: 8 windows of 16 registers each

 

Pyramid RISC: 16 windows of 32 registers each

 

       A set of fixed number of registers in the CPU are identified as global registers and are available to all procedures, e.g. references to registers 0 through 7 could refer to unique global registers, and references to registers 8 through 31 could be registers in the current window.

 

Large Register File

Cache

All local scalars

Recently-used local scalars

Individual variables

Blocks of memory

Compiler-assigned global variables

Recently-used global variables

Register addressing

Memory addressing

 

Characteristics of RISC Architectures

 

 1. One Instruction per machine cycle

 

 2. Register-to-register operations

 

 3.  Simple addressing modes

 

 4. Simple instruction formats

 

Summary Table for the features of a number of RISC and a CISC

 

 

Motorola 88110

Alpha AXP 21064

Pentium

Power PC 601

Company

Motorola

DEC

Intel

IBM

Year

91

92

93

93

Architecture

RICS

RICS

CICS

RICS

# Registers(I)

32

32

64

32

Cache I/D

8/8 KB

8/8 KB

8/8 KB

32

# Registers (GP/FP)

32/32

32/32

8/8

32/32

# Inst/cycle

2

2

2

3

# pipelines (I/FP)

NS

7/10

5/8

4/6

Multiprocessing Support

No

Yes

Yes

Yes

Technology

CMOS

CMOS

BiCMOS

CMOS

# Transistors

1.3 m

1.68 m

3.1 m

2.8 m

Clock MHZ

33

200

66

80

 

 

Characteristic

VAX-11 (CISC)

RISC-1 (RISC)

Year Developed

1978

1981

No. Instructions

303

31

Instruction size (bits)

16-456

32

Addressing Modes

22

3

No. General purpose registers

16

138

Control memory size

480 Kb

0

Cache Size

64 Kb

0

 


RISC Example 1: The SPARC (Scalable Processor  ARChitecture)

 

     SPARC is closely based on the Berkeley RISC architecture.

 

    Technical Overview

 

 

 

 

 

 

 

 

 

 

 


 

®All instructions and registers, including floating point registers, are32 bits.

 

®It is a LOAD/STORE architecture, i.e. all operations take place on operands located in the 32-bit registers.

 

®It is based on a four-stage pipeline, Fetch, Decode, Execute, and Write.

 

®It uses an overlapping register window scheme with 32 registers visible at any instant. A 5-bit variable, called Current Window Pointer (CWP) is used to point to the current register set.

 

®All SPARC instructions occupy a full word (32 bits).

 

®All arithmetic and logical instructions have three operands and have the form

 

Destination := sourece1 op source 2

 2              5                  6                      5         1                              8                                        5          

 


    SRC 2       DST         Op-Code         SRC 1     0                        FP-OP                               

 

 

 

 

 


                     DST            Op-Code      SRC 1    1                  Immediate Constant

 

 

 

®The LOAD and STORE instructions may use either of the above formats with DST being the register to be loaded or stored. The low order 19 bits of the instructions determine the effective address as follows:

 

 1.  Memory address = sum of SRC1 and SRC 2

 

 2.  Memory address = sum SRC1 and a signed 13-bit constant.

 

®Instructions load and store 8-, 16-, 32-, and 64-bit quantities into 32-bit registers.

 

®Two ways are provided for calling procedures.

 

 1.  The CALL instruction uses a 30-bit PC relative offset.

 

      2                                                     30

 


                     PC-Relative Displacement

 

 

 2. The JMPL which uses any of the instruction formats used for arithmetic and logical operations and which allows the return address to be put in any register.

 

®The SAVE and the RESTORE instructions manipulate the register window and stack pointer. Both of them trap when the next (previous) window is not available.

 

®A summary of the SPARC integer instructions are shown in the handout.


 

®SPARC Floating Point

 

  *   The FPU contains 32 32-bit registers to hold 32 single precision (32-bit) floating point operands, 16 double-precision (64-bit) operands, or 8 extended-precision (128-bit) operands.

 

  *  The FPU can execute about 20 floating-point instructions most of them in single-, double, or extended precision using the first instruction format used for aritmetic.

 

  *  In addition to instructions for loading, and storing FPUís registers, the CPU can also test FPUís registers and branch conditionally on results.

 

 ® SPARC Memory Management

  *  A conventional MMU supporting a single paged 32-bit address space using three levels of address translation.

           12                          8             6           6                            12

     Context                    Index 1       Index 2   Index 3            Offset

 


                                   Level 1

                                                                                                 4K page

 

 

 

 

 


        

   Context

            Table (4K)

 

 

 


RISC Example 2: The MIPS (Microprocessor without Interlock Pipe Stages)

 

 

     Technical Overview

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

      A 32-bit five-stage pipelined LOAD/STORE machine which executes one simple instruction per cycle.

 

       The five stages are Fetch, Decode & read operand register, ALU operation, start data write, and store ALU output in destination register.

 

       MIPS has a single set of 32 general-purpose registers. It does not have overlapping. The MIPS compiler optimizes the use of registers in whatever way best for the program currently being compiled. Notice that the MIPS approaches uses a Software approach.

 

 

    Consider the call graph shown below.

 

                                     Main

 

 


                        A                        B

 

 

 

              C              D               E

 

 

 

 


     Main                             Main                           Main               Main

 


                                                 A                            A                        A

 


                                                                                C                        D

 

 

 

 

      MIPS Instruction Set

 

       6               5           5              5               5                 6

 


   Op-Code   SRC 1     SRC 2    DEST     SHIFT     Function

 

 

      6                5           5                                  16

 


  Op-Code    SRC        DST              Immediate Constant

 

 

       6                                                        26

 


  Op-Code                              Jump Target

 

 

      LOAD/STORE instructions use indexed addressing by adding a 16-bit signed constant to a register using format 2.

 

      Surprisingly, non-RISC instructions such as MULT and DIV were included and they use special functional units. The contents of two registers can be multiplied or divided and the 64-bit product is kept in two special registers LO and HI.

 

      MIPS Memory management

 

The 32-bit virtual address are divided into a 20-bit virtual page number and a 12-bit offset within the page. A TLB is used and it is right on the CPU chip.

 

      A summary of the MIPS instructions is given in the attached handout.

 


King Fahd University of Petroleum and Minerals

College of Computer Sciences and Engineering

Department of Computer Engineering

COE 308 Computer Architecture (993)

Instructor: Mostafa Abd-El-Barr

 

Module # 5: RISC vs CISC

Summary

 

 

 

 

1.      Common characteristics shared by MOST RISC designs

         Limited and simple instruction set

         Large number of general purpose registers and/or the use of compiler technology to optimize register usage

         Optimization of  the instruction pipeline

Characteristic

VAX-11 (CISC)

RISC-1 (RISC)

Year Developed

1978

1981

No. Instructions

303

31

Instruction size (bits)

16-456

32

Addressing Modes

22

3

No. General purpose registers

16

138

Control memory size

480 Kb

0

Cache Size

64 Kb

0

2. Instruction Execution Characteristics

         Semantic Gap: The difference between the operations provided in the HLLs and those provided in computer architecture leads to (a) execution inefficiency, (b) excessive machine program size, and (c) compiler complexity.

         Designersí Response: Make the architecture complex: (a) Large instruction sets, (b) Dozens of addressing modes

       Aspects of Computation Studied

Operations

Language

Pascal

FORTRAN

C

Assignment

74

67

38

Loop

4

3

3

Call

1

3

12

IF

20

11

43

GOTO

2

9

3

Other

-

7

1

Observations

(a)    Assignment statements predominates, i.e. simple movement of data is of high importance, (b) Conditional statements are substantial, i.e. sequencing of instructions is importance. (c) Procedure call/return was found to be the most time-consuming, i.e. it causes the execution of the most machine-language instructions.

Operands (a) Majority of references are to simple scalar variables (about 60%) (b) More than 80% of scalars are local variables (to procedures).

Implications:

A prime candidate for optimization is the mechanism for storing and accessing local scalar variables.

Procedure call/return (a) a typical procedure employs only a few passed parameters and local variables (typically < 6 arguments and < 6 local variables). (b) depth of procedure activation fluctuates within a relatively narrow range (5-8).

Implications:

Do not make instruction set architecture closer to HLLs, rather optimize the performance of the most time-consuming features of typical HLL programs. Make the architecture simpler, not complex

Possible Techniques: (a) Use a large number of registers to optimize operand referencing by reducing the memory references (b) Pay careful attention to the design of instruction pipelines, i.e. rearrange instructions for optimization of using smart compilers, e.g. delayed branch. (c) Simplify instruction set

The main philosophy becomes

Keep the most frequently accessed operands in registers and minimize register-memory operations

 

Two approaches are possible to achieve this (a) Software Approach: Use the compiler to maximize register usage by allocating registers to those variables that will be used the most in a given time period (this is the philosophy used in Stanford MIPs machine). (b) Hardware Approach: Use more registers so that more variables can be held in registers for larger periods of time (this is the philosophy used in the Berkeley RISC machines)

Register Windows

         Organize registers such that memory access is minimized.

         Multiple small sets of registers are used, each assigned to a different procedure.

         A procedure call automatically switches the CPU to use a different fixed-size window of registers rather than saving registers in memory at the call time

         At any time, only ONE window of registers is visible and is addressed as if it were the only set of registers

         Window Overlapping: Temporary registers at one level are physically the same as the parameter registers at the next level. This overlap allows parameters to be passed without the actual movement of data.

Berkeley RISC: 8 windows of 16 registers each

Pyramid RISC: 16 windows of 32 registers each

         A set of fixed number of registers in the CPU are identified as global registers and are available to all procedures, e.g. references to registers 0 through 7 could refer to unique global registers, and references to registers 8 through 31 could be registers in the current window.

Large Register File

Cache

All local scalars

Recently-used local scalars

Individual variables

Blocks of memory

Compiler-assigned global variables

Recently-used global variables

Register addressing

Memory addressing

Characteristics of RISC Architectures: (a) One Instruction per machine cycle (b) Register-to-register operations (c) Simple addressing modes (d) Simple instruction formats.

RISC VS CISC Controversy

1.       Performance

         Normalized Execution Times

Benchmark

RISC

VAX-11/780

C

1.0

2.1

Assembly

0.9

0.9

 

2.      Implementation

Characteristic

RISC

MC68000

Number of devices

44K

68K

Regularity Factor

25

12

Percentage of Area consumed by Control Unit

6

50

Design time in months

19

30

Design effort in man-month

15

100

Layout effort in man-month

12

70

 

 

 

 

 

 

RISC Example 1: The SPARC (Scalable Processor ARChitecture)

 

     SPARC is closely based on the Berkeley RISC architecture.

 

    Technical Overview

 

 

 

 

 

 

 

 

 

 

 


 

®All instructions and registers, including floating point registers, are32 bits.

 

®It is a LOAD/STORE architecture, i.e. all operations take place on operands located in the 32-bit registers.

 

®It is based on a four-stage pipeline, Fetch, Decode, Execute, and Write.

 

®It uses an overlapping register window scheme with 32 registers visible at any instant. A 5-bit variable, called Current Window Pointer (CWP) is used to point to the current register set.

 

®All SPARC instructions occupy a full word (32 bits).

 

®All arithmetic and logical instructions have three operands and have the form

 

Destination := sourece1 op source 2

 

       2              5                  6                      5         1                              8                                   5

 


                    DST         Op-Code         SRC 1        0                          FP-OP                        SRC 2

 

 

 

 

 


                     DST            Op-Code      SRC 1       1                  Immediate Constant

 

 

 

®The LOAD and STORE instructions may use either of the above formats with DST being the register to be loaded or stored. The low order 19 bits of the instructions determine the effective address as follows:

 

 3.  Memory address = sum of SRC1 and SRC 2

 

 4.  Memory address = sum SRC1 and a signed 13-bit constant.

 

®Instructions load and store 8-, 16-, 32-, and 64-bit quantities into 32-bit registers.

 

®Two ways are provided for calling procedures.

 

 3.  The CALL instruction uses a 30-bit PC relative offset.

 

      2                                                     30

 


                     PC-Relative Displacement

 

 

2.    The JMPL which uses any of the instruction formats used for arithmetic and logical operations and which allows the return address to be put in any register.

®The SAVE and the RESTORE instructions manipulate the register window and stack pointer. Both of them trap when the next (previous) window is not available.

®A summary of the SPARC integer instructions are shown in the handout.


 

®SPARC Floating Point

 

  *   The FPU contains 32 32-bit registers to hold 32 single precision (32-bit) floating point operands, 16 double-precision (64-bit) operands, or 8 extended-precision (128-bit) operands.

 

  *  The FPU can execute about 20 floating-point instructions most of them in single-, double, or extended precision using the first instruction format used for aritmetic.

 

  *  In addition to instructions for loading, and storing FPUís registers, the CPU can also test FPUís registers and branch conditionally on results.

 

 ® SPARC Memory Management

  *  A conventional MMU supporting a single paged 32-bit address space using three levels of address translation.

           12                          8             6           6                            12

     Context                    Index 1       Index 2   Index 3            Offset

 


                                   Level 1

                                                                                                 4K page

 

 

 

 

 


        

   Context

            Table (4K)

 

 

 


RISC Example 2: The MIPS (Microprocessor without Interlock Pipe Stages)

 

 

     Technical Overview

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

      A 32-bit five-stage pipelined LOAD/STORE machine which executes one simple instruction per cycle.

 

       The five stages are Fetch, Decode & read operand register, ALU operation, start data write, and store ALU output in destination register.

 

       MIPS has a single set of 32 general-purpose registers. It does not have overlapping. The MIPS compiler optimizes the use of registers in whatever way best for the program currently being compiled. Notice that the MIPS approaches uses a Software approach.

 

 

    Consider the call graph shown below.

 

 

 

                                     Main

 

 


                        A                        B

 

 

 

              C              D               E

 

 

 

 

 


     Main                               Main                         main                Main

 


                                                 A                                  A                    A

 


                                                                                    C                       D

 

 

 

 

 

 

 

 

 

 

 

 

 

      MIPS Instruction Se

 

       6               5           5              5               5                 6

 


   Op-Code   SRC 1     SRC 2    DEST     SHIFT     Function

 

 

      6                5           5                                  16

 


  Op-Code    SRC        DST              Immediate Constant

 

 

       6                                                        26

 


  Op-Code                              Jump Target

 

 

      LOAD/STORE instructions use indexed addressing by adding a 16-bit signed constant to a register using format 2.

 

      Surprisingly, non-RISC instructions such as MULT and DIV were included and they use special functional units. The contents of two registers can be multiplied or divided and the 64-bit product is kept in two special registers LO and HI.

 

      MIPS Memory management

 

The 32-bit virtual address are divided into a 20-bit virtual page number and a 12-bit offset within the page. A TLB is used and it is right on the CPU chip.

 

      A summary of the MIPS instructions is given in the attached handout.

 


 

Summary Table for the features of a number of RISC and a CISC

 

 

Motorola 88110

Alpha AXP 21064

Pentium

Power PC 601

Company

Motorola

DEC

Intel

IBM

Year

91

92

93

93

Architecture

RICS

RICS

CICS

RICS

# Registers(I)

32

32

64

32

Cache I/D

8/8 KB

8/8 KB

8/8 KB

32

# Registers (GP/FP)

32/32

32/32

8/8

32/32

# Inst/cycle

2

2

2

3

# pipelines (I/FP)

NS

7/10

5/8

4/6

Multiprocessing Support

No

Yes

Yes

Yes

Technology

CMOS

CMOS

BiCMOS

CMOS

# Transistors

1.3 m

1.68 m

3.1 m

2.8 m

Clock MHZ

33

200

66

80