Implementation of a Simple 8-bit Microprocessor with Reversible
Energy Recovery Logic
Seokkee Kim and Soo-Ik Chae
System Design GroupSchool of Electrical Engineering
Seoul National University 2005 / 05 / 05
SDGroup, School of Electrical Engineering, SNU 2/15
Contents
• Introduction to nRERL
• 8-bit nRERL Microprocessor
• Phase Scheduling
• Reversibility Breaking
• Measurement Results
• Future Works
SDGroup, School of Electrical Engineering, SNU 3/15
• nRERL is nMOS Reversible Energy Recovery Logic *)
– A Fully adiabatic circuit using reversible logic– Only nMOS SW is used by exploiting Bootstrapped– Phase-pipelining using 6-phase clocked power
Introduction to nRERL (1)
*) J. Lim, D.-G. Kim, and S.-I Chae, “nMOS reversible energy recovery logic for ultra-low-energy
applications,” IEEE Journal of Solid-State Circuits, vol. 35, no. 6, pp. 865-875, June, 2000.
F-1
F
i+4
G-1
i+3
i+2 i+1
G
i+5
H-1
i+4
i+3 i+2
H
i+3 i+2
i+1 i
Xi Xi+1
SDGroup, School of Electrical Engineering, SNU 4/15
Introduction to nRERL (2)
i+1 i
i+2i+3
Xi
MFL
MFLB
MFI
MFIB
MRL
MRLB
MRI
MRIB
n1
Xi
Xi+1
Xi+1
n2
n3
n4
clamp
ReverseLogic switch
ReverseIsolation switch
ForwardLogic switch
ForwardIsolation switch
i
i+1
i+2
T0 T1 T2 T3 T4 T5 T6
Xi
Xi+1
n1
n3
Vdd-Vthb
Vdd-Vthb
Vdd
Vdd
0
0
0
Vdd
0
Vdd
0
Vdd
0
i+3Vdd
0
SDGroup, School of Electrical Engineering, SNU 5/15
8-bit nRERL Microprocessor (1)• Issues
– Area v.s Reversibility
: How we should control the reversibility to integrate the microprocessor in the limited silicon area ?
– Pipelining v.s Energy
: How we should schedule the phase pipelining to minimize the total energy consumption of the microprocessor ?
– Energy v.s Reversibility
: How we could control the reversibility without increasing the total energy consumption of the microprocessor ?
SDGroup, School of Electrical Engineering, SNU 6/15
8-bit nRERL Microprocessor (2)
• A subset of DLX Instruction Set Architecture– No floating point
Instructions– 19 Instructions
• 5 macro-blocks:– IF ID EXE
MEM WB– Fully adiabatic
circuit
• 6-phase CPG is also integrated– A shared off-chip
inductor is used
Register File(16w x 8b)
Register File(16w x 8b)
ALUALU
RA
M(1
28w
x 8
b)R
AM
(128
w x
8b)
ControllerController
6-phase Clocked Power
Generator
clocked power
data flow path
8-bit adiabatic Microprocessor8-bit adiabatic Microprocessor
Off-chipOff-chip
fOSC
fREF
ProgramCounter(PC)
ProgramCounter(PC)
Branch PCGeneratorBranch PCGenerator
ROM(64w x 20b)
ROM(64w x 20b)
SDGroup, School of Electrical Engineering, SNU 7/15
Phase scheduling (1)
RegisterFile
RegisterFile MemoryMemoryALUALU
Register File
Register File MemoryMemoryALUALU Buffer
T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5Phase
Time
RegisterFile
RegisterFile MemoryMemoryALUALU
Writeback data
Writeback data
Writeback data
CASE I: Cycle-based scheduling
CASE II: Phase-based Scheduling(best case)
CASE III: Phase-based Scheduling(worst case)
SDGroup, School of Electrical Engineering, SNU 8/15
Phase scheduling (2)
Register FileRegister File
pc incrementpc increment
ControlControl
ALUALU
EqcheckEqcheck
forwardforwardExternalInstruction
ROMROM
PC registerPC register
Page registerPage register
RAMRAM
branch pc generationbranch pc generation
write to register
BranchFlush
Forward data Write data
Memdata
Buffer MUX Data path Control Signal
T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Instruction FetchInstruction Decoding
/Register Fetch Execution Memory Acess WritebackOverhead
PhaseTime
Decoded Instructions
SDGroup, School of Electrical Engineering, SNU 9/15
Reversibility Breaking (1)
• SERC: Self-Energy Recovery Circuit– Energy recovery with its own data instead of using reversible l
ogic– Nonadiabatic loss exists ( )2
thbCV2
1
T2 T3 T4 T5 T6 T7
04
Data*
5
0
0
Vthb
0
n7 Vthb
T8
thbdd VV
ddV
ddV
ddV
Data*
Data*
n7
n8
45
Data*
4
0
1
1
2
2
3
SERC
Data
SDGroup, School of Electrical Engineering, SNU 10/15
Reversibility Breaking (2)
• Infinite memory cannot be implemented on the limited silicon area
wd[m]_rd_3
wd[m]_
wr_iso
_ 1
wd[m]_rd_iso_2
SERC in Memory Cell
Write port
bit[n]_outRead portwd[m]_
wr_ 2
wd[m] _unwr_4 (ref_4)
wd[m] _unwr_5 (ref_4)
• SERC is used for unwrite and refresh operations.
SDGroup, School of Electrical Engineering, SNU 11/15
Measurement Results
Biasgenerator Bias
CPG
Mem
ory
ALU &Register
file
ROM& PC
Control
6-phase Clocked Power Routing
microprocessor core• ANAM 0.18m (1P6M)
– Core: 2.62 x 2.03 mm2
– CPG: 1.0 x 0.6 mm2
– Vdd=1.8V, Vth0=0.35V
– E=8.5 pJ/cycle (P=7.5 W)
@ Vdd=1.8V, f=880kHz
• E_cpg = 4.97 pJ/cycle (58.5%)
• E_core = 3.53 pJ/cycle(41.5%)
SDGroup, School of Electrical Engineering, SNU 12/15
Hardware Complexity# of transistors
(Portions to Core)Area
(Portions to Core)
ROM (64w x 20b) 10,000 (13.3%) 0.60 x 0.50 mm2 (7.9%)
PC 17,000 (22.6%) 0.60 x 0.58 mm2 (9.2%)
ALU 5,200 (6.9%) 0.50 x 0.60 mm2 (7.9%)
Reg.file (16w x 8b) 7,600 (10.1%) 0.36 x 0.50 mm2 (4.8%)
Forward 400 (0.5%) 0.70 x 0.24 mm2 (4.4%)
RAM (128w x 8b) 28,000 (37.2%) 0.65 x 1.30 mm2 (22.3%)
Control 5,700 (7.6%) 1.60 x 0.70 mm2 (22.2%)
Phase aligning buffers 1,400 (1.9%) -
Microprocessor core 75,300 (100%) 2.62 x 2.03 mm2 (100%)
CPG 2,700 1.00 x 0.60 mm2
Clock routing - 0.4 x 7.0 mm2
Total chip 78,000 4.0 x 4.0 mm2
SDGroup, School of Electrical Engineering, SNU 13/15
Energy Partitions
• The energy portion of CPG is more than a half.– More optimization is required for CPG design.
• At optimal condition, Adiabatic, Leakage, CPG rail-driver energy loss should be same.
< nRERL microprocessor >
ALU8b ®. file
6%
CPG (clk.driver)
58%
Control &others
6%
128x8b RAM21%
64x20b ROM9%
E_core (41.5%)
E_cpg (58.5%)
E_total (8.5pJ/cycle)
<nRERL microprocessor>
CPG,controller16%
CPG,rail-driver35%
SERC8%
leakage20%
adiabatic21%
E_core (41.5%)
E_cpg (58.5%)
E_total (8.5pJ/cycle)
<Partitioned by functional blocks><Partitioned by energy components>
SDGroup, School of Electrical Engineering, SNU 14/15
Comparisons (1): CMOS v.s nRERL• Minimum Energy Consumption
0
10
20
30
40
50
60
CMOS nRERL
Energ
y lo
ss p
er
cyc
le [
pJ/
cyc
le]
ALU8b & reg. file
64x20b ROM
Control & others128x8b RAM
CPG (clk. driver)
8.5pJ
52.0pJ
47.3%
22.1%
16.3%
9.7%
4.6%
SDGroup, School of Electrical Engineering, SNU 15/15
Summary
8-bit nRERL microprocessor
8-bit CMOS microprocessor
HardwareComplexit
y
# of Tr’s 78,000 15,000
Core Area 2.62 x 2.03 mm2 0.82 x 0.51 mm2
OperatingRegion
Supply voltage
1.8V 0.8V ~ 1.8V
Frequency
200kHz ~ 10MHz ~ 1GHz
Minimum energy consumption
(optimal condition)
8.5 pJ/cycle@ Vdd=1.8V,
Vbias=1.5V, f=880kHz
52.0 pJ/cycle@ Vdd=0.65V,
f=200kHz ~1MHz
SDGroup, School of Electrical Engineering, SNU 16/15
Future Works
• More energy-efficient CPG design is required.• More study on the complexity reduction is
required for the implementation of more complex circuits.*)
*) Seokkee Kim and S.-I Chae, “Complexity reduction in an adiabatic microprocessor using reversible logic,” will be published on proc. International Symposium on Low Power
Electronics and Design, Aug., 2005.