Date post: | 08-Mar-2016 |
Category: |
Documents |
Upload: | raghu-raman |
View: | 218 times |
Download: | 0 times |
of 12
Microprocessors & Microcontrollers 1 RN Biswas
Cache and Pipeline
Prof. R. N. Biswas
Microprocessors & Microcontrollers 2 RN Biswas
Improvement of Speed by Cache
A Cache is a high-speed Memory interposed
between the processor and the slower Main
Memory, enabling faster access to data/code .
Primary or L1 cache is at the chip level
Secondary or L2 cache is at the board level
Cache reduces access time by exploiting
Locality of Reference
Holds more often used data/code
Frees the external bus for other operations
Microprocessors & Microcontrollers 3 RN Biswas
Block M4095
tag
tag
tag
5 7 4
Structure of
Main Memory Address
Tag Block Word
Main Memory
Cache
Direct-mapped Cache
Block M0
Block C0
Block C127
Block C1
Block M127
Block M1
Block M128
Block M255
Block M3968
tag = 0
tag = 1
tag =
31
Microprocessors & Microcontrollers 4 RN Biswas
tag
tag
tag
12 4
Tag Word
Main Memory Cache
Fully Associative Cache
Structure of Main Memory Address
Block C0
Block C1
Block C127
Block M0
Block M1
Block M2
Block M4095
Block M4094
tag = 0
tag = 1
tag = 2
tag = 4 094
tag = 4 095
Microprocessors & Microcontrollers 5 RN Biswas
Block M127
tag
tag
tag
7 5 4
Tag Set Word
Main Memory tag
tag
tag
Cache
Set 1
Set
31
Set-associative Cache (4 blocks/set)
Structure of Main Memory Address
Block C0
Block C1
Block C126
Block C127
Block C2
Block C3
Block M0
Block M1
Block M128
Block M255
Block M3968
Block M4095
Block C125
Block C124
tag
tag
Set 0
Set
0
Set 31
tag = 0
tag = 1
tag = 127
tag = 0
tag = 127
tag = 127
tag = 0
Microprocessors & Microcontrollers 6 RN Biswas
Cache Access and Update Sequence
CPU floats memory address.
Cache Controller compares the tag field of the address with the tags in the selected set:
Cache miss main memory is accessed and the fetched contents stored in the cache.
Cache hit cache is accessed.
Cache write requires memory update:
Write-back - memory updated only when the location is replaced by a new one from memory.
Write-through - memory updated for every write.
Microprocessors & Microcontrollers 7 RN Biswas
Speed Improvement by Pipelining
Processor speed can be enhanced by having separate hardware units for the different functional blocks, with buffers between the successive units.
The number of unit operations into which the instruction cycle of a processor can be divided for this purpose defines the number of stages in the pipeline.
A processor having an n-stage pipeline would have up to n instructions simultaneously being processed by the different functional units of the processor.
Effective processor speed increases ideally by a factor equal to the number of pipelining stages.
Microprocessors & Microcontrollers 8 RN Biswas
Typical Pipeline Organisation
A common choice is to have four such units :
Fetch: Fetch the instruction code from the memory;
Decode: Decode the Op Code and fetch operand(s);
Operate: Perform operation required by the op code;
Write: Store the result in the destination location.
A four-stage pipeline would require three buffers, each separating two functional units of the processor.
Write cycle of I1, Operate cycle of I2, Decode cycle of I3 and Fetch cycle of I4 take place in the same time slot, and have to be completed within the same time as prescribed by the pipeline design
Microprocessors & Microcontrollers 9 RN Biswas
A Four-stage Pipeline
Microprocessors & Microcontrollers 10 RN Biswas
Data Dependency in Pipelining
If the input data for an instruction depends on the
outcome of the previous instruction, the Write cycle of
the previous instruction has to be over before the
Operate cycle of the next instruction can start. The
pipeline effectively idles through one instruction,
creating a bubble in the pipeline which persists for
several instructions.
F4 D4
O3
F2 D2 idle W2 O2
W4
F3 idle D3 W3
O4
Bubble ends here
F1 D1 O1 W1
Microprocessors & Microcontrollers 11 RN Biswas
Branch Dependency in Pipelining
A Branch instruction can cause a pipeline stall if the
branch is taken, as the next instruction has to be
aborted in that case. If I1 is an unconditional branch
instruction, the next Fetch cycle (F2) can start after D1.
But if I1 is a conditional branch instruction, F2 has to
wait until O1 for the decision as to whether the branch
will be taken or not.
F1 D1 O1 W1
F2 D2 O2 W2 executed if branch is not taken
F2 D2 O2 W2
F2 D2 O2 W2
executed for unconditional branch
for conditional branch, if taken
branch instruction
Microprocessors & Microcontrollers 12 RN Biswas
Avoidance of Pipeline Bubbles
Data Dependency - An instruction unaffected by
the write operation has to be placed in the Load
Delay Slot.
Branch Dependency - The branch instruction
has to perform a delayed branch, with
instructions preceding the branch placed in the
Branch Delay Slots.
Requires optimising compilers to be written
along with the design of the microprocessors.