Date post: | 11-May-2015 |
Category: |
Business |
Upload: | nayakslideshare |
View: | 4,547 times |
Download: | 6 times |
IQxplorer
MIPS 64-bit processorsProject – MSCS 521 Computer Architecture MANAN SHAH ( Block Diagram & its detailed explanation, Instruction set)CHINTAN SHIHORA (Overview, Features, Intro to 64-bit processing , pipelining information, Pros & Cons)
OVERVIEW
Overview
The MIPS Instruction Set Architecture has evolved over time from the original MIPS 1 ISA, through the MIPS 5 ISA, to the current MIPS32 and MIPS64 Architectures. All extensions have been backward compatible with previous versions of the Instruction Set Architecture .
In the MIPS 3 level of the Instruction Set Architecture, 64-bit integers and addresses were added to the instruction set., while in MIPS 4 and MIPS 5 levels of the Instruction Set Architecture added improved floating point operations, as well as a set of instructions intended to improve the efficiency of generated code and of data movement.
Overview (cont.)
The 64 bit MIPS Architecture is based on the MIPS 5 ISA and is backward compatible with the MIPS32 Architecture. Both the MIPS32 and MIPS64 Architectures bring the privileged environment into the Architecture definition to address the needs of operating systems and other kernel software. The MIPS64 Architectures are intended to address the need for a high-performance but cost-sensitive MIPS instruction set.
It include facilities like adding MIPS Application Specific Extensions , User Defined Instructions, and custom coprocessors to address the specific needs.
INTRODUCTION TO 64 BIT PROCESSOR
It refers to the number of bits that can be processed or transmitted in parallel, in short a microprocessor that indicates the width of the registers; a special high-speed storage area within the CPU.
64-bit therefore refers to a processor with registers that store 64-bit numbers. 64-bit architecture would double the amount of data a CPU can process per clock cycle.
What 64-bit refers to?
Need of 64-bit processor
It is needed for the applications that address large amounts of data and memory, such as high-performance servers, database management systems, CAD tools, and digital content creation tools.
One reason why one can need 64-bit processors is because of their enlarged address spaces. Thirty-two-bit chips are limited to a maximum of 2 GB or 4 GB of RAM access. However, a 4-GB limit can be a severe problem for server machines and machines running large databases. A 64-bit chip has none of these constraints because a 64-bit RAM address space is essentially infinite 2^64 bytes of RAM.
FEATURESOF
MIPS 64-bit processors
Features:
There are 64-bit virtual addresses There is a 64-bit instruction pointer . New RIP-relative data addressing mode. Flat address space with single code, data, and
stack space. Dual-Issue 64-bit superscalar architecture High-performance 64-bit integer unit. High-throughput fully pipelined 64 bit floating
point unit . High performance SysAD interface.
Features: (cont.)
32-bit or 64-bit multiplexed system address/data
bus for optimum price/performance.
Available with 32-bit or 64-bit external bus
interface.
Supports fractional clock ratios.
JTAG boundary scan.
Integrated primary caches:
32 KB instruction and data are 2-way set
associative.
Virtually indexed & physically tagged.
Write-back and write-through on per-page
basis.
Index address modes (register + register).
Pipeline restart on first double word for data
cache misses.
64-bit MIPS instruction set architecture
Floating point multiply-add instruction
increases performance in signal processing
and Graphics applications.
Conditional moves to reduce branch
frequency.
Features: (cont.)
INSTRUCTION SETFOR
MIPS 64-bit processors
MIPS 64-bit processorsInstructions:
BLOCK DIAGRAMFOR
MIPS 64-bit processors
Block Diagram:
It supports four floating-point multiply-add/subtract instructions which allow two separate floating-point computations to be performed with one instruction. The four instructions are :1. Multiply-add (MADD)2. Multiply-subtract (MSUB)3. Negative Multiply-add (NMADD)4. Negative Multiply-Subtract (NMSUB)
Index : 1 ) Large On-chip Caches2) Dual Entry TLB3) Write Buffer4) Pipelining5) Dual-Issue Mechanism6) Dedicated Integer and FP ALU’s7) Separate FP Execution Units8) Scaleable for Multiple Processors9) Secondary Cache Support10) Multiple Cache Sizes11) Simultaneous Access12) Flexible Clocking Mechanism13) On-chip Clock Multiplication Circuitry
Detailed Explanation (For Block Diagram)
MIPS 64 bit processor contains separate 32 kB data and instruction caches. Each cache is 2-way set associative, which helps to increase the hit rate over a direct-mapped implementation Cache lines may be classified as write-through or write-back on a per-page basis. Both caches are virtually indexed and physically tagged.
a) A virtually indexed cache allows the cache access to begin as soon as the virtual address is generated, as opposed to waiting for the virtual to physical translation. The cache is accessed at the same time as the address translation is performed. The physical address is then compared against the corresponding instruction or data cache tag. If the compare is valid, the data which has been retrieved from the cache is used. If the compare is not valid, meaning that the address requested does not reside in the cache, the data is not used and a cache miss is generated.
Large on- chip Caches:(Detailed explanation- Block diagram)
b) While in Physically tagged data cache allows for coherency between the primary and secondary caches in a system.
Having large primary caches allows more of the application to be executed on-chip, reducing accesses to slower secondary cache and main memory. This in turn reduces bus utilization and allows the application to run faster since fewer off-chip accesses are required.
Large on- chip Caches: (cont.)(Detailed explanation- Block diagram)
The TLB of the MIPS 64 bit processor contains 48 dual
entries. This implementation is equivalent to a 96-entry
TLB
Each virtual page number entry equates to two
physical frame numbers one even and one odd.
The lower bit of the Virtual Page Number is used to
determine whether the even or odd PFN will be used.
The TLB is fully-associative.
Dual Entry TLB:(Detailed explanation- Block diagram)
Writes to external memory
The write buffer holds up to four 64-bit address
and data pairs, or one cache line to be written out.
Since data cache writebacks are typically
performed on a line basis, an entire line can be
written to the buffer, allowing the CPU to resume
normal execution.
Without a write buffer, the CPU would have to
write a single 64-bit doubleword, then wait until the
memory operation completes, before writing
another.
Write Buffer:(Detailed explanation- Block diagram)
The write buffer allows the CPU to write data into the
buffer without accessing the system bus.
For uncached write cycles, the write buffer can
significantly increase performance by allowing the
pipelining of multiple writes.
With cacheable write cycles, the buffer allows the
CPU to write data to the buffer and immediately begin
processing the next write data.
Without the buffer, the CPU would output the write
data, then be forced to wait until the uncached write
operation has completed before processing the next
write.
Write Buffer: (cont.)(Detailed explanation- Block diagram)
Write cycles can be performed back-to-back without
any dead clocks between cycles.
In the original R4000 architecture there is a two
clock delay between the generation of back-to-back
addresses. This results in two dead clocks between
back-to-back cycles.
The pipelined write protocol also uses the write
buffer to allow pipelining of write cycles.
In the MIPS 64 bit processor, performance is
significantly increased by eliminating the two null
cycles between each write cycle.
Pipelined Writes:(Detailed explanation- Block diagram)
A pipeline is divided into :
Fetch Arithmetic operation Memory access Write back
Pipelining :(Detailed explanation- Block diagram)
A non-pipelined execution
Pipelined execution
Pipelining (cont.)
In the example shown in Figure , each stage takes one processor clock cycle to complete.
Thus it takes four clock cycles (ignoring delays or stalls) for the instruction to complete. In this example, the execution rate of the pipeline is one instruction every four clock cycles.
Conversely, because only a single execution can be fetched before completion, only one stage is active at any time.
Parallel Pipelining
Instead of waiting for an instruction to be completed before the next instruction can be fetched , a new instruction is fetched each clock cycle. There are four stages to the pipeline so the four instructions can be executed simultaneously, one at each stage of the pipeline. Instructions in Figure are executed at a rate four times that of the pipeline shown in the previous figure.
SuperPipeline
Figure below shows a superpipelined architecture.
Each stage is designed to take only a fraction of an external clock cycle—in this case, half a clock.
Therefore more than one instruction can be completed each cycle.
SuperScalar Pipeline
A superscalar architecture also allows more than one instruction to be completed each clock cycle.
How Pipelining Works:
The processor fetches and decodes four instructions per cycle and then appends them to one of the three instruction queue. Each queue determines the execution order based on the availability of the required FUs. Though initially fetched and decoded in order, processor to have up to 32 instructions in various stages of execution.
How Pipelining Works: (cont.)
Initially, Instructions proceed through the instruction fetch pipeline which consist of fetch, decode, and issue stages:
in the fetch stage. Four instructions are fetched and aligned.
in the decode stage, the instructions are decoded, register renaming as performed, and branch instructions are predicted
in the issue stage (first half), the instructions are written to one of three 16-entry instructions queue, the availability of the operands is also determined.
(second half is on the next slide)
How Pipelining Works: (cont.) Depending on the type, the instruction proceeds to one of the five instruction pipelines.
There are two integer and two floating-point pipelines, and one load/store execution pipeline.
Each of these pipelines begins when a queue issue and instruction and continue as follows: in the issue stage (second half ), the processor reads operands from the register files, the execution begins and takes
a) one stages in the case of integer pipelinesb) two stages in the case of the load/store
pipelinec) three stages in the case of floating-point
pipeline
Floating point Co-processor:
Performance is gained on floating-point codes by allowing the integer unit to execute the necessary loads and stores of floating-point values. As well as index register updates and branching.
The issue logic allows the dual of the integer instruction and a floating-point instruction.
The dual-issue mechanism implemented in 64 bit MIPS processor allows a floating-point ALU instruction to be issued simultaneously with any other instruction type.
Whenever a floating-point ALU instruction is fetched with any non- FP-ALU instruction, both instructions can be issued in the same cycle.
Load and store instructions in one pipeline usually provide enough data bandwidth to permit a new instruction to be issued every cycle for a fix period.
Well structured code can take full advantage of this pipeline structure.
Dual Issue Mechanism:(Detailed explanation- Block diagram)
Separate Integer and FP ALU’s allow instructions
of both types to be performed simultaneously.
Integer instructions are not stalled while long
latency floating-point operations are being executed.
Use: Running CAD-type applications as both fixed-
point and floating-point math calculations.
Dedicated Integer & FP ALU:(Detailed explanation- Block diagram)
The 64 bit MIPS processor incorporates 8 external
signals.
These signals allow for arbitration and data
coherency between processors.
Therefore, Symmetric multiprocessing systems
implementing the full Modified Exclusive Shared
Invalid cache consistency protocol in both primary
and secondary caches, as well as other styles of
multiprocessing will be supported.
Scalable for Multiple processor:(Detailed explanation- Block diagram)
In addition to the dual-issue mechanism, the 64
bit MIPS processor also contains separate
acceleration hardware for most floating-point ALU
instructions.
This allows long-latency operations such as
divide and square-root to be performed in a
dedicated unit, thereby allowing other shorter-
latency operations such as MADD and subtract to
be overlapped while the divide or square-root
operation is in progress.
Separate FP Execution Units:(Detailed explanation- Block diagram)
The 64 bit MIPS processor contains a dedicated
secondary cache interface.
These signals provide an efficient interface between
the processor, the secondary cache, and the
secondary cache tag RAM.
All AM interface signals such as data and chip
enables, output enable, address match, cache valid,
line index, and word index are provided by the
processor.
The secondary cache also supports multiple cache
sizes and both the write-through and write-back data
transfer protocols.
Data transfers to the secondary cache share the 64-
bit system bus.
Secondary Cache Support:(Detailed explanation- Block diagram)
The secondary cache can be configured as 512
kB, 1Mbyte, or 2 Mbyte, allowing large applications
to run within the secondary cache, reducing the
number of accesses to slower main memory.
The secondary cache is accessed through the
system bus.
Uncached bus cycles are not evaluated by the
secondary cache control logic as they travel to the
external agent.
Uncached operations such as video screen
updates can be passed directly to the
system logic responsible for routing the data to
the screen without any delays from the
secondary cache logic.
Multiple Cache Sizes:(Detailed explanation- Block diagram)
To maximize data throughput, the main
memory accesses can be initiated while the
secondary cache tag is being compared.
If the requested address is found to be in the
secondary cache, the memory access is aborted
& if the address is not found in the secondary
cache, then main memory access can be initiated
and the data can be retrieved more quickly.
Simultaneous Access:(Detailed explanation- Block diagram)
The clocking mechanism in the 64 bit MIPS processor offers a number of pipeline frequencies based on the frequency of the input clock.
Single External Clock Signal
A single clock signal is used for the system
interface, as opposed to three. The processor
eliminates the Rclock, Tclock, and MasterOut clock
signals that existed in the previous processors.
Having only one clock simplifies system design, as
well as reducing the circuit complexity of the internal
clock mechanism.
Flexible Clocking Mechanism:(Detailed explanation- Block diagram)
The 64 bit processor includes on-chip clock
frequency multiplication circuitry to support 200-MHz
internal operation from an external 50-MHz clock.
The processor has the option of operating internally
at 2, 3, or 4 times the frequency of the external clock.
Maximum bus speed of the system interface is 100
MHz.
On Chip Clock Multiplication Circuitry:(Detailed explanation- Block diagram)
PROS & CONS
Advantages:
It can handle more memory and larger files. 64-bit architecture will allow systems to address
up to 1 terabyte (1000GB) of memory 64-bit machines also offer faster I/O speeds to
things like hard disk drives and video cards. These features can greatly increase system performance.
Disadvantages:
The same data occupies more space in memory. This increases the memory requirements of a given process and can create problems for efficient processor cache utilization.
64-bit systems sometimes lack equivalents to software that is written for 32-bit architectures. The most severe problem is incompatible device drivers. Although most software can run in a 32-bit compatibility mode, it is usually impossible to run a driver in that mode.
References:1) http://en.wikipedia.org/wiki/MIPS_architecture
2) http://en.wikipedia.org/wiki/Superscalar
3) http://www.intel.com/cd/ids/developer/asmo-na/eng/ microprocessors/ia32/pentium4/optimization/44015.htm
4)“MIPS Architecture.” 17 April 2004. Wikipedia, The Free Encyclopedia http://en.wikipedia.org/wiki/Main_Page 23 April 2004 http://en.wikipedia.org/wiki/MIPS_architecture.
5) http://www.google.com/search?hl=en&q=2010740_004404%5B1%5D.pdf
6) http://books.google.com/books?id=Nibfj2aXwLYC&pg=PA384&dq=MIPS+R5000+ Microprocessor+and+pipelining+operation&sig=nYGolNlOk5S_ePkXDKiVdnfORDY
7) http://books.google.com/books?id=JEYKyfZ3yF0C&pg=PA195&dq= MIPS+R5000+Microprocessor+and+pipelining+operation&sig= qr82jZMTWo8Z0YWqMWScerbF0XQ#PPA195,M1