Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | roderick-shaw |
View: | 215 times |
Download: | 0 times |
Priority encoder
Overview• Priority encoder- theoretic
view• Other implementations• The chosen
implementation- simulations
• Calculations and comparisons
The target of the project
Building priority encoder using the multilevel lookahead and folding techniques
Uses of priority encoding
• INR - interconnection network router
• design of SAE – sequential address encoder of a content associate memory (CAM)
• microcontroller and microprocessor(incrementer / decrementer)
basic concepts of priority encoders
• The i-th output bit EPi = Di * Pi
Di- the input dataPi- the priority token passed into this bit
• the relationship between Pi and Pi-1
Pi = Di-1 * Pi-1 • the generated EPi is EPi = Di * Di-1 * Di-2 … D1 * D0
Different implementations
For 4 bit priority encoder
matrix
Because of a minimal distance needed between the lines the layout is large and complicated.
Sum of minterms, the straight-forward implementation
Basic unitsThe structure is build from equal units. Each unit calculates yi and xpi for the i-th bit
Then, by chaining the units we construct the output
In this implementation we save silicon area, but pay in propagation delay
treeTree of multiplexers implemented by butterfliesEfficient implementation in area and power, has longer propagationthen the folding technique
the multilevel lookahead structure
The output third-level lookahead signal of the ith 8-bit macro is:
LA3i|i=0~n-1 = D8i+7 + D8i+6 + D8i+5 + D8i+4 + D8i+3 + D8i+2 + D8i+1 + D8i + LA3i-1
LA3-1 = 0n = N/8N – number of input bitsThe ith 4-bit sub macrosLA2i = D8i+3+D8i+2+D8i+1+D8i+LA3i-
1
EP8i = D8i * LA3i-1
EP8i+1 = D8i+1 * D8i * LA3i-1
EP8i+2 = D8i+2 * D8i+1 * D8i * LA3i-1
EP8i+3 = D8i+3 * D8i+2 * D8i+1 * D8i * LA3i-1
EP8i+4 = D8i+4 * LA2i
EP8i+5 = D8i+5 * D8i+4 * LA2i
EP8i+6 = D8i+6 * D8i+5 * D8i+4 * LA2i
EP8i+7 = D8i+7 * D8i+6 * D8i+5 * D8i+4 * LA2i
The 8-bit macro formulas
8-bit macro cell
Diagram of 32-bit chain designed encoder
The folding technique-first level folding
• The LA3i that generated by the macro with the higher priority can be connected to other macros with lower priority.
• Such connection can make the critical path shorter
• In this connection we’ll lose the advantage in layout arrangement and wiring complexity
• We’ll connect LA30 to the second and the fourth macros (not to the third) and we’ll get 2x2 matrix
• in this way the fourth macro is connected to 2 neighboring macros
• the number of gate delays is reduced to 4 (<log232 )
Folding - implementation
Block diagram of a 32-bit priority encoder with folding
64 bit priority encoder with first level folding
Multilevel folding
• In order to reduce the gate delay to be less then log2N in grater priority encoders, we can apply the folding technique again & again for example :N=128
• First-Level folding : 8 gate delay• Second-Level folding : 7 gate delay• Third-Level folding :<7 gate delay
64-bit priority encoder with 2 levels of folding
For 256-bit priority encoder the new design can achieve about 10 times performance while spending ½ power consumption.
The implementationWe decided to implement the
project using bottom up architecture, starting with a 1 bit unit.
Each stage will be checked separately.
Moving to the next stage is only after the previous stage is finished
1 bit unitAt first we implemented 1 bit unit and checked it.The circuit:
The simulation:The
output
Lookahead bit
The input
The clock
The 4 – bit unitThe 4 bit unit circuit:
The input signals:
The outputs:
When the lookahead high all the
outputs equals zero
Lookahead
outputs
The 8-bit unit
The output signals
v0
v3Not valid
The next lookahead
v4
v7
The 32-bit chain encoder
The results
The problem we encountered
“glitches”
The “glitch”
clock rising
the glitch starts after clock rising
The widest glitch comes at higher bits
clock
Bit #60
32 bit-folding
64 bit first level folding
64 bit second level folding
64 bit second level folding with one critical path
Propagation delay - reductionTo minimize the propagation delay of the EPwe made the following changes :
- Reduced the clock period from 200ns to 20ns.
- Divide the clock pulse to different periods for low time and high time.
Those changes made under the constrains of :- Keeping the high pulse length 80% of the
base pulse.- Making sure all the requested changes and
currents are stable before clock raising.
- The optimum result we conclude for the clock period: 5ns for low time and 15ns high time.
Results – 32 bit
Results – 64 bit
Results – 64 bit (high)
80% high pulse
The vhdl simulation
The vhdl simulation of a 32 bit priority encoder
Here the lsb of input changes
from 0 to 1, and the output changes
Compare table
unitmatrixtreefolding
Area [mm²]
0.0760.0760.0430.053
Power [10^-
11fw]
149.6173.4112.8127.5
Time [ns]
241.275188
0
20
40
60
80
100
120
140
160
180
power
unit
matrix
tree
folding
0
50
100
150
200
250
pd
unit
matrix
tree
folding
0
10000
20000
30000
40000
50000
60000
70000
80000
area
unit
matrix
tree
folding