Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | kimberly-grant |
View: | 215 times |
Download: | 0 times |
PIPELINED PROCESSORS
Chapter No. 5
Pipeline Evolution in Processors
First appeared in at the end of 1960s in the first supercomputers of that time such as IBM 360/91 (1967) and the CDC 7600 (1970).
In 1970 the use of pipelining at instruction level in mainframe B7700.
Principle of Pipelining
A number of functional units are employed in sequence to perform a single computation.
Each functional unit represent a certain stage of computation.
Pipeline allows overlapped execution of instructions or temporal overlapping of processing.
It increases the overall processor’s throughput. In pipelined operation each task is divided into a
number of subtasks.
Principle of Pipelining
Each stage of pipeline is associated with with each subtask which performs required operation.
For a basic pipeline same amount of time is available in each stage for performing a certain task.
All the pipeline stages operate like assembly line, that is , receiving input typically from previous stage and delivering their output to the next stage.
The basic pipeline operates clocked (synchronously), that is each stage accepts a new input at the start of the clock cycle.
Principle of Pipelining
Pipelined Operation
Pipelined Operation
Pipelined and Unpipelined Processing
Processor Pipelines in Reality
A real pipeline may include a few extensions to basic pipeline.
Pipelined execution is also often performed using half-cycles. and in certain cases, one or more pipeline stages may have to be recycled to accomplish a given task.
These additional cycles may be required to perform certain arithmetic operations
Logical Layout of Pentium Pipeline
Logical Layout of PowerPC 604 Pipeline
General Structure of Pipelines
Pipeline consists of a number of stages, one for each subtask. The stages are decoupled from each other by registers, called latches.
As each clock cycle ends, the latches gates in their inputs and forward them into the associated stage where the required operation is performed.
In reality, each stage is often implemented by a number of different FUs/Eus in performing the required operations.
The latches are extended with multiplexers that selects and transfer data from the outputs of preceding Eus to input the subsequent execution units.
General Structure of Pipelines
Pipeline Performance Measures
Non-pipelined processor characteristic is instruction cycle time and
execution time Pipelined processor
no importance of execution time three different measures in pipelined
processors: cycle time, latency and repetition rate
Cycle time specifies the time available for each stage to
accomplish the required operations
Pipeline Performance Measures
determined by worst-case processing time of the longest stage
latency specifies the amount of time that the result of a
particular instruction takes to become available in the pipeline for a subsequent dependent instruction
used in context of processing subsequent RAW dependent instruction
Two kinds of latencies define-use dependency and load-use dependency (corresponds to two types of RAW dependencies)
Pipeline Performance Measures
define use latencymul r1, r2, r3add r5, r1, r4
define-use delaythe time a subsequent RAW-dependent instruction
has to be stalled in a pipeline
load-use latency r1, xadd r5, r1, r2
Load-use delayinterpreted same as define-use delay
Pipeline Performance Measures
Repetition rate also known as throughput specifies the shortest possible time interval
between the subsequent instructions in pipeline the repetition rate of a basic pipeline is one cycle
repetition rate is the performance potential of a pipeline
Performance potential of a pipeline with no define-use delay or load-use delay exist between instructions can be calculated as:
P= 1/R*tc
Pipeline Performance Measures
where:R:is the repetition rate of the pipeline in cyclestc:is the cycle time of the pipeline
Application Scenarios of Pipelines
Design space of pipelines
Key aspects of the design space of pipelines
Basic Pipeline Layout
Basic Pipeline Layout
The number of pipeline stages when more pipeline stages are used, more
parallel execution and thus a higher performance can be expected
disadvantage: more number of stages results in frequent data and control dependencies which decreases performance
specification of the subtasks to be performed in each stage the specification of the subtasks at a number
of levels of increasing details
Number of Pipeline Stages
Number of Pipeline Stages
Basic Pipeline Layout
Layout of the stage sequence concerns how the pipeline stages are used
use of bypassing intended to reduce or eliminate pipeline stalls due
to RAW dependencies Problem:Unless special arrangements are made,
the results of the operation instruction is written into the register file, or into the memory, and then it is fetched from there as a source operand
Solution:the result of the EU is immediately forwarded to its input for use in the next pipeline cycle
Layout of the Stage Sequence
Bypassing
Basic Pipeline Layout
Its implementation requires an additional data bus for forwarding the results of the execution stage to its input and an appropriate extension of the associated multiplexers and latches
timing of the pipeline operations self-timed(asynchronous) clocked (synchronous)
Timing of Pipeline Operations
Dependency Resolution
Method of dependency resolution
Static resolutionperformed by the compiler
Combined resolution
performed partly by
the compiler &partly by the
hardware
Dynamic resolution
performed by extra hardware
Trend
Overview of Pipelined Instructions
Logical Layout
It specifies the tasks to be accomplished, this includes: the declaration of pipeline to be implemented
usually separate pipelines for the processing of FX and logical data, called FX pipeline, for FP data, the FP pipeline, for loads and stores, L/S pipeline, and for branches , the B pipeline
DEC21164 provides two types of FX integer pipelines
detailed specification of subtasks to be performed and their execution sequence for each pipeline
detailed description of the subtasks to be performed in each stage
Power PC 601 Example
Detailed Description of FX Pipeline
Implementation of Instruction Pipeline
Layout of the Physical Pipelines
Layout of the Physical Pipelines
Multifunction Only one published design of multifunction
pipeline is available and that is MIPS R4200 which implements all the FX, FP, L/S and B instructions
Classical approach/ Master pipeline approach is implemented in IBM 801, MIPS, MIPS-X, MIPS R-series (up to the R6000), i486,& Pentium
Dedicated pipelines dedicated pipelines are implemented in power PC
603, Power PC 604, DEC etc
Multiplicity of Pipelines
multiplicity refers to the concept that whether to use a single instance of physical pipeline or multiple instances of physical pipelines.
Two aspects should be considered while considering pipeline multiplicity frequency of instructions out-of-order execution of instructions due to
multiple pipelines
Multiplicity of Pipelines
Preserving Sequential Consistency
Implementation Pipelined Instruction Processing
Implementation Pipelined Instruction Processing