Mips 64

IQxplorer

MIPS 64-bit processorsProject – MSCS 521 Computer Architecture MANAN SHAH ( Block Diagram & its detailed explanation, Instruction set)CHINTAN SHIHORA (Overview, Features, Intro to 64-bit processing , pipelining information, Pros & Cons)

OVERVIEW

Overview

The MIPS Instruction Set Architecture has evolved over time from the original MIPS 1 ISA, through the MIPS 5 ISA, to the current MIPS32 and MIPS64 Architectures. All extensions have been backward compatible with previous versions of the Instruction Set Architecture .

In the MIPS 3 level of the Instruction Set Architecture, 64-bit integers and addresses were added to the instruction set., while in MIPS 4 and MIPS 5 levels of the Instruction Set Architecture added improved floating point operations, as well as a set of instructions intended to improve the efficiency of generated code and of data movement.

Overview (cont.)

The 64 bit MIPS Architecture is based on the MIPS 5 ISA and is backward compatible with the MIPS32 Architecture. Both the MIPS32 and MIPS64 Architectures bring the privileged environment into the Architecture definition to address the needs of operating systems and other kernel software. The MIPS64 Architectures are intended to address the need for a high-performance but cost-sensitive MIPS instruction set.

It include facilities like adding MIPS Application Specific Extensions , User Defined Instructions, and custom coprocessors to address the specific needs.

INTRODUCTION TO 64 BIT PROCESSOR

It refers to the number of bits that can be processed or transmitted in parallel, in short a microprocessor that indicates the width of the registers; a special high-speed storage area within the CPU.

64-bit therefore refers to a processor with registers that store 64-bit numbers. 64-bit architecture would double the amount of data a CPU can process per clock cycle.

What 64-bit refers to?

Need of 64-bit processor

It is needed for the applications that address large amounts of data and memory, such as high-performance servers, database management systems, CAD tools, and digital content creation tools.

One reason why one can need 64-bit processors is because of their enlarged address spaces. Thirty-two-bit chips are limited to a maximum of 2 GB or 4 GB of RAM access. However, a 4-GB limit can be a severe problem for server machines and machines running large databases. A 64-bit chip has none of these constraints because a 64-bit RAM address space is essentially infinite 2^64 bytes of RAM.

FEATURESOF

MIPS 64-bit processors

Features:

There are 64-bit virtual addresses There is a 64-bit instruction pointer . New RIP-relative data addressing mode. Flat address space with single code, data, and

stack space. Dual-Issue 64-bit superscalar architecture High-performance 64-bit integer unit. High-throughput fully pipelined 64 bit floating

point unit . High performance SysAD interface.

Features: (cont.)

32-bit or 64-bit multiplexed system address/data

bus for optimum price/performance.

Available with 32-bit or 64-bit external bus

interface.

Supports fractional clock ratios.

JTAG boundary scan.

Integrated primary caches:

32 KB instruction and data are 2-way set

associative.

Virtually indexed & physically tagged.

Write-back and write-through on per-page

basis.

Index address modes (register + register).

Pipeline restart on first double word for data

cache misses.

64-bit MIPS instruction set architecture

Floating point multiply-add instruction

increases performance in signal processing

and Graphics applications.

Conditional moves to reduce branch

frequency.

Features: (cont.)

INSTRUCTION SETFOR


MIPS 64-bit processorsInstructions:

BLOCK DIAGRAMFOR


Block Diagram:

It supports four floating-point multiply-add/subtract instructions which allow two separate floating-point computations to be performed with one instruction. The four instructions are :1. Multiply-add (MADD)2. Multiply-subtract (MSUB)3. Negative Multiply-add (NMADD)4. Negative Multiply-Subtract (NMSUB)

Index : 1 ) Large On-chip Caches2) Dual Entry TLB3) Write Buffer4) Pipelining5) Dual-Issue Mechanism6) Dedicated Integer and FP ALU’s7) Separate FP Execution Units8) Scaleable for Multiple Processors9) Secondary Cache Support10) Multiple Cache Sizes11) Simultaneous Access12) Flexible Clocking Mechanism13) On-chip Clock Multiplication Circuitry

Detailed Explanation (For Block Diagram)

MIPS 64 bit processor contains separate 32 kB data and instruction caches. Each cache is 2-way set associative, which helps to increase the hit rate over a direct-mapped implementation Cache lines may be classified as write-through or write-back on a per-page basis. Both caches are virtually indexed and physically tagged.

a) A virtually indexed cache allows the cache access to begin as soon as the virtual address is generated, as opposed to waiting for the virtual to physical translation. The cache is accessed at the same time as the address translation is performed. The physical address is then compared against the corresponding instruction or data cache tag. If the compare is valid, the data which has been retrieved from the cache is used. If the compare is not valid, meaning that the address requested does not reside in the cache, the data is not used and a cache miss is generated.

Large on- chip Caches:(Detailed explanation- Block diagram)

b) While in Physically tagged data cache allows for coherency between the primary and secondary caches in a system.

Having large primary caches allows more of the application to be executed on-chip, reducing accesses to slower secondary cache and main memory. This in turn reduces bus utilization and allows the application to run faster since fewer off-chip accesses are required.

Large on- chip Caches: (cont.)(Detailed explanation- Block diagram)

The TLB of the MIPS 64 bit processor contains 48 dual

entries. This implementation is equivalent to a 96-entry

TLB

Each virtual page number entry equates to two

physical frame numbers one even and one odd.

The lower bit of the Virtual Page Number is used to

determine whether the even or odd PFN will be used.

The TLB is fully-associative.

Dual Entry TLB:(Detailed explanation- Block diagram)

Writes to external memory

The write buffer holds up to four 64-bit address

and data pairs, or one cache line to be written out.

Since data cache writebacks are typically

performed on a line basis, an entire line can be

written to the buffer, allowing the CPU to resume

normal execution.

Without a write buffer, the CPU would have to

write a single 64-bit doubleword, then wait until the

memory operation completes, before writing

another.

Write Buffer:(Detailed explanation- Block diagram)

The write buffer allows the CPU to write data into the

buffer without accessing the system bus.

For uncached write cycles, the write buffer can

significantly increase performance by allowing the

pipelining of multiple writes.

With cacheable write cycles, the buffer allows the

CPU to write data to the buffer and immediately begin

processing the next write data.

Without the buffer, the CPU would output the write

data, then be forced to wait until the uncached write

operation has completed before processing the next

write.

Write Buffer: (cont.)(Detailed explanation- Block diagram)

Write cycles can be performed back-to-back without

any dead clocks between cycles.

In the original R4000 architecture there is a two

clock delay between the generation of back-to-back

addresses. This results in two dead clocks between

back-to-back cycles.

The pipelined write protocol also uses the write

buffer to allow pipelining of write cycles.

In the MIPS 64 bit processor, performance is

significantly increased by eliminating the two null

cycles between each write cycle.

Pipelined Writes:(Detailed explanation- Block diagram)

A pipeline is divided into :

Fetch Arithmetic operation Memory access Write back

Pipelining :(Detailed explanation- Block diagram)

A non-pipelined execution

Pipelined execution

Pipelining (cont.)

In the example shown in Figure , each stage takes one processor clock cycle to complete.

Thus it takes four clock cycles (ignoring delays or stalls) for the instruction to complete. In this example, the execution rate of the pipeline is one instruction every four clock cycles.

Conversely, because only a single execution can be fetched before completion, only one stage is active at any time.

Parallel Pipelining

Instead of waiting for an instruction to be completed before the next instruction can be fetched , a new instruction is fetched each clock cycle. There are four stages to the pipeline so the four instructions can be executed simultaneously, one at each stage of the pipeline. Instructions in Figure are executed at a rate four times that of the pipeline shown in the previous figure.

SuperPipeline

Figure below shows a superpipelined architecture.

Each stage is designed to take only a fraction of an external clock cycle—in this case, half a clock.

Therefore more than one instruction can be completed each cycle.

SuperScalar Pipeline

A superscalar architecture also allows more than one instruction to be completed each clock cycle.

How Pipelining Works:

The processor fetches and decodes four instructions per cycle and then appends them to one of the three instruction queue. Each queue determines the execution order based on the availability of the required FUs. Though initially fetched and decoded in order, processor to have up to 32 instructions in various stages of execution.

How Pipelining Works: (cont.)

Initially, Instructions proceed through the instruction fetch pipeline which consist of fetch, decode, and issue stages:

in the fetch stage. Four instructions are fetched and aligned.

in the decode stage, the instructions are decoded, register renaming as performed, and branch instructions are predicted

in the issue stage (first half), the instructions are written to one of three 16-entry instructions queue, the availability of the operands is also determined.

(second half is on the next slide)

How Pipelining Works: (cont.) Depending on the type, the instruction proceeds to one of the five instruction pipelines.

There are two integer and two floating-point pipelines, and one load/store execution pipeline.

Each of these pipelines begins when a queue issue and instruction and continue as follows: in the issue stage (second half ), the processor reads operands from the register files, the execution begins and takes

a) one stages in the case of integer pipelinesb) two stages in the case of the load/store

pipelinec) three stages in the case of floating-point

pipeline

Floating point Co-processor:

Performance is gained on floating-point codes by allowing the integer unit to execute the necessary loads and stores of floating-point values. As well as index register updates and branching.

The issue logic allows the dual of the integer instruction and a floating-point instruction.

The dual-issue mechanism implemented in 64 bit MIPS processor allows a floating-point ALU instruction to be issued simultaneously with any other instruction type.

Whenever a floating-point ALU instruction is fetched with any non- FP-ALU instruction, both instructions can be issued in the same cycle.

Load and store instructions in one pipeline usually provide enough data bandwidth to permit a new instruction to be issued every cycle for a fix period.

Well structured code can take full advantage of this pipeline structure.

Dual Issue Mechanism:(Detailed explanation- Block diagram)

Separate Integer and FP ALU’s allow instructions

of both types to be performed simultaneously.

Integer instructions are not stalled while long

latency floating-point operations are being executed.

Use: Running CAD-type applications as both fixed-

point and floating-point math calculations.

Dedicated Integer & FP ALU:(Detailed explanation- Block diagram)

The 64 bit MIPS processor incorporates 8 external

signals.

These signals allow for arbitration and data

coherency between processors.

Therefore, Symmetric multiprocessing systems

implementing the full Modified Exclusive Shared

Invalid cache consistency protocol in both primary

and secondary caches, as well as other styles of

multiprocessing will be supported.

Scalable for Multiple processor:(Detailed explanation- Block diagram)

In addition to the dual-issue mechanism, the 64

bit MIPS processor also contains separate

acceleration hardware for most floating-point ALU

instructions.

This allows long-latency operations such as

divide and square-root to be performed in a

dedicated unit, thereby allowing other shorter-

latency operations such as MADD and subtract to

be overlapped while the divide or square-root

operation is in progress.

Separate FP Execution Units:(Detailed explanation- Block diagram)

The 64 bit MIPS processor contains a dedicated

secondary cache interface.

These signals provide an efficient interface between

the processor, the secondary cache, and the

secondary cache tag RAM.

All AM interface signals such as data and chip

enables, output enable, address match, cache valid,

line index, and word index are provided by the

processor.

The secondary cache also supports multiple cache

sizes and both the write-through and write-back data

transfer protocols.

Data transfers to the secondary cache share the 64-

bit system bus.

Secondary Cache Support:(Detailed explanation- Block diagram)

The secondary cache can be configured as 512

kB, 1Mbyte, or 2 Mbyte, allowing large applications

to run within the secondary cache, reducing the

number of accesses to slower main memory.

The secondary cache is accessed through the

system bus.

Uncached bus cycles are not evaluated by the

secondary cache control logic as they travel to the

external agent.

Uncached operations such as video screen

updates can be passed directly to the

system logic responsible for routing the data to

the screen without any delays from the

secondary cache logic.

Multiple Cache Sizes:(Detailed explanation- Block diagram)

To maximize data throughput, the main

memory accesses can be initiated while the

secondary cache tag is being compared.

If the requested address is found to be in the

secondary cache, the memory access is aborted

& if the address is not found in the secondary

cache, then main memory access can be initiated

and the data can be retrieved more quickly.

Simultaneous Access:(Detailed explanation- Block diagram)

The clocking mechanism in the 64 bit MIPS processor offers a number of pipeline frequencies based on the frequency of the input clock.

Single External Clock Signal

A single clock signal is used for the system

interface, as opposed to three. The processor

eliminates the Rclock, Tclock, and MasterOut clock

signals that existed in the previous processors.

Having only one clock simplifies system design, as

well as reducing the circuit complexity of the internal

clock mechanism.

Flexible Clocking Mechanism:(Detailed explanation- Block diagram)

The 64 bit processor includes on-chip clock

frequency multiplication circuitry to support 200-MHz

internal operation from an external 50-MHz clock.

The processor has the option of operating internally

at 2, 3, or 4 times the frequency of the external clock.

Maximum bus speed of the system interface is 100

MHz.

On Chip Clock Multiplication Circuitry:(Detailed explanation- Block diagram)

PROS & CONS

Advantages:

It can handle more memory and larger files. 64-bit architecture will allow systems to address

up to 1 terabyte (1000GB) of memory 64-bit machines also offer faster I/O speeds to

things like hard disk drives and video cards. These features can greatly increase system performance.

Disadvantages:

The same data occupies more space in memory. This increases the memory requirements of a given process and can create problems for efficient processor cache utilization.

64-bit systems sometimes lack equivalents to software that is written for 32-bit architectures. The most severe problem is incompatible device drivers. Although most software can run in a 32-bit compatibility mode, it is usually impossible to run a driver in that mode.

References:1) http://en.wikipedia.org/wiki/MIPS_architecture

2) http://en.wikipedia.org/wiki/Superscalar

3) http://www.intel.com/cd/ids/developer/asmo-na/eng/ microprocessors/ia32/pentium4/optimization/44015.htm

4)“MIPS Architecture.” 17 April 2004. Wikipedia, The Free Encyclopedia http://en.wikipedia.org/wiki/Main_Page 23 April 2004 http://en.wikipedia.org/wiki/MIPS_architecture.

5) http://www.google.com/search?hl=en&q=2010740_004404%5B1%5D.pdf

6) http://books.google.com/books?id=Nibfj2aXwLYC&pg=PA384&dq=MIPS+R5000+ Microprocessor+and+pipelining+operation&sig=nYGolNlOk5S_ePkXDKiVdnfORDY

7) http://books.google.com/books?id=JEYKyfZ3yF0C&pg=PA195&dq= MIPS+R5000+Microprocessor+and+pipelining+operation&sig= qr82jZMTWo8Z0YWqMWScerbF0XQ#PPA195,M1

Date post:	11-May-2015
Category:	Business
Upload:	nayakslideshare
View:	4,547 times
Download:	6 times

Mips 64

Business