Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | sprtoshbti |
View: | 304 times |
Download: | 26 times |
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0101-1
Low power On-Chip Amplifier for CCD Array
Er. Rahul Malhotra*, Er. Amit Kumar** Bhai Maha Simgh College of Engineering, Sri Muktsar Sahib, India
*[email protected], **[email protected]
Abstract— The field of Analog VLSI design is an essential
part of any electronics system because of our real world is
analog, In this paper low power amplifier is presented for
CCD array [1]. CCD are used to capture the images modern
digital cameras and high resolution cameras consists of CCD
array but all the performance of the CCD array is depends on
the performance of On-Chip amplifier which is placed at the
end of the array in this paper single and two stage amplifier
are simulated and the result is presented for the power and
bandwidth by varying the sizes of the different transistors all
the results are verified by using the Tanner tool (version 7.1)
[11]. There are number of analysis presented by the
researchers in the literature to improve the power dissipation
but most of the structure are compromise sometimes with the
area or sometimes with the bandwidth here we have achieve
the lesser power dissipation but with the handsome value of
bandwidth is also maintained to support this claim the
detailed results are presented in the result section.
Keywords: Gain, power dissipation, bandwidth, capacitance
INTRODUCTION
Charge Coupled Devices (CCDs) were invented in the 1970s
and originally found application as memory devices Charge
Coupled Devices (CCD) have many applications, but the
most important is in imaging [3]. The basic operation of the
sensor is to convert light into electrons. When light is
Incident on the active area of the image sensor it interacts
with the atoms that make up the silicon crystal. The energy
transmitted by the light (photons) is used to enable an
electron to escape from the tight control of one atom to roam
more freely about the device as a “conduction” electron,
leaving behind an atom shy of one electron. Modern CCD has
two types of architecture:
1. Full-Frame (FF)
2. Frame-Transfer (FT)
FF CCDs have the simplest architecture and are the easiest to
fabricate and operate. They consist of a parallel CCD shift
register, a serial CCD shift register and a signal sensing
output amplifier. Images are optically projected onto the
parallel array which acts as the image plane the architecture is
shown in the fig. 1
FT CCDs are very much like FF architectures. The difference
is that a separate and identical parallel register, called a
storage array, is added which is not light sensitive. The idea is
to shift a captured scene from the photosensitive, or image
array, very quickly to the storage array [5]. Readout off chip
from the storage register is then performed as described in the
FF device previously while the storage array is integrating the
next frame. The architecture is shown in the fig. 2
Fig. 1 Full Frame architecture
Fig. 2 Frame transfer architecture
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0101-2
Both of the above architecture are widely used but the
performance of both the architecture are depends on the type
and the quality of the On-chip (output) amplifier which is
fabricated at the last stage of the structure as shown in the fig
above.
ARCHITECTURE OF ON-CHIP AMPLIFIER
Output amplifier has also two type of the architecture
1. Single stage amplifier
2. Two stage amplifier
out
M1
Mc
VRG
VRD
VDD
FD
Detection Node
L=2u
W=22u
L=2u
W=22u
L=2u
W=22u
Fig. 3 Single Stage CCD On-Chip amplifier
The single stage amplifier consists of source follower M1 and load transistor Mc for biasing. The reset FET is connected to the detection node and consists of floating diffusion [6, 7] and the gate of M1. In the ON state it resets the detection node to a reference voltage (VRD) and in the OFF state the floating can receives the next charge packet. The voltage source between the gate and source of the current sink Transistor Mc determines the bias current of the first stage and can be used as a signal injection point to measure the ratio between total capacitance and the effective sense capacitance and the bandwidth in the off state.
The Two stage amplifier further improves the character tics of the amplifier and gives the better result which is shown in the result section of the paper and the architecture of two stages is shown two stage amplifier also improves the sensitivity of the amplifier and this also reduces the noise level of the overall CCD.
Mr
M1
Mc
M2
M3
Vdd
VCS
VRD
FD
Detection node
Reset gate pulse
output
L=2u
W=22u
L=2u
W=22u
L=2u
W=22u
L=2u
W=22u
L=2u
W=22u
Fig. 4 Two Stage CCD On-Chip amplifier
OPTIMIZATION
For optimization of the on-chip amplifier Length and Width
of the individual transistor are varied and the various
optimization results are obtained. The effect of increase and
decrease of Length and Width of the transistor is given as
To achieve maximum gain:
Transistor „M1‟: -The gain can be maximized by increasing
the width of this transistor as this increases the difference in
the output voltage amplitude.
Transistor „MC‟: -The gain can be maximized by decreasing
the width of this transistor as this increases the difference in
the output voltage amplitude.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0101-3
Transistor „M2‟: -The gain can be maximized by increasing
the width of this transistor as this increases the difference in
the output voltage amplitude.
Transistor „M3‟: -The gain can be maximized by decreasing
the width of this transistor as this increases the difference in
the output voltage amplitude.
To achieve maximum bandwidth:
Transistor „M1‟: - The bandwidth of the circuit can be
increased by increasing the width of this transistor as the
increase in width increases the transconductance which helps
in increasing the bandwidth as the impedance decreases.
Transistor „MC‟: - The bandwidth of the circuit can be
increased by increasing the width of this transistor as the
increase in width increases the transconductance which helps
in increasing the bandwidth as the impedance decreases.
Transistor „M2‟: - The bandwidth of the circuit can be
increased by increasing the width parameter of this transistor.
So bandwidth can be increased by changing this parameter.
Transistor „M3‟: - The bandwidth of the circuit can be
increased by increasing the width of this transistor as the
increase in width increases the Tran conductance which helps
in increasing the bandwidth as the impedance decreases,
although the change desired is not that large.
To achieve minimum power dissipation:
Transistor „M1‟: - The power dissipation of the circuit can be
reduced by reducing the width of this transistor as the current
flowing into this transistor reduces with the reduction in the
width while power dissipation can be reduced by increasing
the length because increase in length reduces
transconductance which in turn reduces the amount of current
flowing into the transistor.
Transistor „MC‟: - The power dissipation of the circuit can be
reduced by reducing the width of this transistor as the current
flowing into this transistor reduces with the reduction in the
width while power dissipation can be reduced by increasing
the length because increase in length reduces
transconductance which in turn reduces the amount of current
flowing into the transistor.
Transistor „M2‟: - The power dissipation of the circuit can be
reduced by reducing the width of this transistor as the current
flowing into this transistor reduces with the reduction in the
width.
Transistor „M3‟: - The power dissipation of the circuit can be
reduced by reducing the width of this transistor as the current
flowing into this transistor reduces with the reduction in the
width.
RESULTS
Table 1: When the width of the transistor M3 varied
Transistor
Dimensions
(W× L) μm
M1 Mc
M2
(W×
L) μm
M3
(W× L)
μm
Power
Dissipation
(mW)
Bandwidth
BM
(MHz)
15×25
12×10
20x10
10x25
5.9
302
15×25
12×10
20x10
12x25
5.95
320
15×25
12×10
20x10
15x25
6.0
242
15×25
12×10
20x10
18x25
6.1
207
Table 2: When the width of the transistor M2 varied
Table 3: When the Length of the transistor M3 varied
Transistor
Dimensions
(W× L) μm
M1 Mc
M2
(W×
L) μm
M3
(W× L)
μm
Power
Dissipation
(mW)
Bandwidth
in
(MHz)
15×25
12×10
20x10
10x5
7.0
580
15×25
12×10
20x10
10x10
6.4
594
Transistor
Dimensions
(W× L) μm
M1 Mc
M2
(W× L)
μm
M3
(W× L)
μm
Power
Dissipation
(mW)
Bandwidth
BM
(MHz)
15×25
12×10
20x10
10x25
5.15
69
15×25
12×10
18x10
10x25
5.25
62
15×25
12×10
16x10
10x25
5.2
78
15×25
12×10
14x10
10x25
5.3
70
15×25
12×10
12x10
10x25
5.4
87
15×25
12×10
10x10
10x25
5.7
122
15×25
12×10
8x10
10x25
5.8
148
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0101-4
15×25
12×10
20x10
10x15
6.1
596
15×25
12×10
20x10
10x18
6.0
365
15×25
12×10
20x10
10x20
5.9
270
15×25
12×10
20x10
10x25
5.7
122
15×25
12×10
20x10
10x30
5.8
109
Table 4: When the Length of the transistor M2 varied
Transistor
Dimensions
(W× L) μm
M1 Mc
M2
(W×
L) μm
M3
(W×
L) μm
Power
Dissipa
tion
(mW)
Bandwidth
in
(MHz)
15×25
12×10
20x5
10x15
6.4
150
15×25
12×10
20x10
10x15
6.1
490
15×25
12×10
20x15
10x15
5.9
550
15×25
12×10
20x18
10x15
5.8
570
15×25
12×10
20x20
10x15
5.8
326
15×25
12×10
20x25
10x15
5.75
380
The results of the above table are taken from the Tanner T-spice tool by using the 2.0 Mosis model file for the enhancement MOSFET transistor. The power dissipation and the bandwidth are directly, measures from the waveform editor in the Tanner EDA tool.
CONCLUSION AND FUTURE SCOPE
It is observed from the result that in case of single stage On-
Chip amplifier minimum power dissipation and maximum
bandwidth is achieved when the Width of the M1 transistor is
18μm and the Length of the M1 transistor is 25μm meter and
the Width of the Mc transistor is 10μmr and the Length of the
Mc transistor is 16μm. In this case power dissipation is 4.3
milli-watts and the gain of the amplifier is 0.82 and
bandwidth is 617MHz. In case of two stage amplifier
maximum bandwidth is achieved when dimension of
transistor is as M1(15μmx25μm), M2(20μmx10μm),
M3(10μmx15μm) & Mc(12μmx10μm) and for minimum
power dissipation the dimension of all the transistor should be
M1(15μmx25μm), M2(20μmx10μm), M3(10μmx25μm) &
Mc(12μmx10μm). The whole design simulated using
MOSIS/Orbit 2.0μm process by using Tanner tool.
In this thesis Analog simulation is done by using the Tanner
tool and using the enhancement type MOSFET transistor is
used, this thesis can be further extended for the depletion type
MOSFET because in depletion type MOSFET noise level
will get further reduce and the other thing which can be
improved in future is, semiconductor and environmental
noise effect which is not consider in this current thesis.
REFERENCES
[1] Gruner, Sol M. Tate, Mark W. Eikenberry and Eric
F “Charge - coupled device area x-ray detectors”.
Review of Scientific Instruments, page No. 2815 -
2842 Volume:73 Issue: 8
[2] M.J.Howess & D.V.Morgan, “Charge-Coupled
Devices and Systems”, John Wiley & Sons.
[3] James R. Janesick, “Scientific Charge-Coupled
Devices”, Spie Press Monograph Vol.85.
[4] M.s Tyagi, “Introduction To Semiconductor
Materials And Devices”, by John Wiley & Sons,
Inc © 1991.
[5] Dalsa web site; CCD Technology Primer;
http://www.dalsa.com/corp/markets/ccd_vs_cmos.as
px
[6] Kodak CCD Primer, #KCP-001,”Charge coupled
device (CCD) Image Sensors”, Eastman Kodak
Company - Microelectronics Technology Division.
[7] D.Barbe, "Imaging Devices Using the Charge-
Coupled Concept". Proceedings of the IEEE,
pp. 38-67, Jan. 1975.
[8] Stuart A. Taylor, “CCD and CMOS Imaging Array
Technologies: Technology Review”, Technical
Report EPC106, Xerox Research Centre Europe,
1998.
[9] Beynon J.D.E, “The Basic Principles Of Charge
Coupled Devices”, MICROELECTRONICS,
vol.7 No.2c 1975 Mackintosh Publications Ltd.
Luton.
[10] P.Centen, E. Roks. "Characterization of Surface- and
Buried-Channel Detection Transistors for On-
Chip Amplifiers". Technical Digest IEDM97,
pp.193-196, San Francisco, Dec 7-10, 1997.
[11] http://www.mosis.com/products/fab/vendors/tsmc/ts
mc-kits.html
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0102-1
Abstract— The mathematical model provides an insight into the
complete behavior of the physical system that reduces the
problem to its essential characteristics. The floating admittance
matrix (FAM) approach is a neat method of mathematical
modeling of electronic devices and its uses in circuits. The zero
sum property of the floating admittance matrix provides a
check to proceed further or reobserve the first equation itself.
All transfer functions are represented as cofactors of the
floating admittance matrix of the circuit.
Keywords: Amplifier, Common Source FET, Floating
Admittance Matrix, Zero Sum property, Cofactors, Plots
INTRODUCTION
The most commonly used amplifier configuration of
MOSFETs is common source amplifier. The common-
source (CS) amplifier may be viewed as a transconductance
amplifier or as a voltage amplifier. As a transconductance
amplifier, the input voltage is seen to be modulating the
current going to the load. As a voltage amplifier, input
voltage modulates the amount of current flowing through the
FET, changing the voltage across the output resistance
accordingly.
This paper aims to develop the mathematical model of
common source amplifier. The floating admittance matrix of
FET is taken to advantage for derivation of its voltage gain,
input resistance and output resistance in the common source
configuration.
MATHEMATICAL MODEL OF FET
The two stage Common Source FET amplifier can be
represented as in Fig. 1
Fig.1 Two-stage Common Source Amplifier
The a.c. equivalent circuit of Fig. 1is shown in Fig. 2
Fig.2 ac circuit of two-stage Common Source Amplifier
The matrix representation of FET as two-port network (four
terminals) is written as
Programmable Input Output Resistances of
FET Amplifier
Mrs. Meena Singh
Lecturer, Deptt. of ECE, University
Polytechnic, B.I.T. Mesra, Ranchi
+91-9279265054
Arun Kumar Singh Deptt. of ECE, Madan Mohan
Malaviya Engg. College, Gorakhpur
+91-9312801316
Dr. B. P. Singh
Professor, Deptt. of ECE &EEE,
Mody Institute of Technology &
Science, Lakshmangarh
([email protected])+91-9468688102
+
VD
D
1
2
3
RD1
RG2 RG1
RF
RD2
4
rs
1 2
3
R21 R12
RF
RL rs
RS2 RS2
C
RD2 R12
vi
RD1
C
VDD
C R22
C
C C
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0102-2
s
d
g
ii
ii
ii
3
2
1 =
3
2
1
gggggg
gggg
g0g
321
dmgdmg
dmdm
gg
s
d
g
vv
vv
vv
3
2
1 (6.1) (5.1)
(1)
The admittance matrix of the FET as a device is expressed in
(1). Its coefficient matrix is expressed as
Y =
3
2
1
gggggg
gggg
g0g
321
dmgdmg
dmdm
gg
=
dmdm
dmdm
gggg
gggg
000 (2)
Gate to source resistance of FET is assumed to be very large
(ideally infinity) as it is always reverse biased, hence gg =
0 S. Then the above coefficient matrix of the FET of (1)
reduces to (2). Thus, the admittance matrix of two FETs
(device1 and device2) connected in Fig.2 can be written as
1deviceY =
3
2
1
gggg
gggg
000
321
1d1m1d1m
1d1m1d1m
(3)
2deviceY =
3
4
2
gggg
gggg
000
342
2d2m2d2m
2d2m2d2m
(4)
Now the composite matrix of two devices (device1 and
device2) is written as
devicesY =
4
3
2
1
gggg0
gggggggg
0gggg
0000
4321
2d2d2m2m
2d2d2m1d1m2m1d1m
1d1m1d1m
(5)
The over all admittance matrixes for Fig.2 is written as
Y =
FGLG2dgLG2dg2mg2mgFG
LG2dg
LG2GG1DG1GG
sg2dg2mg1dg1mg2GG1DG2mg1dg1GGsg1mg
02GG1DG1dg1mg2GG1DG1dg1mg
FG1GGsg0FG1GGsg
(6)
Equation (6) represents the Floating Admittance Matrix [3],
[4], [5] of two stages Common Source Amplifier.
Now from (6) the input impedance of circuit in Fig.2 can be
expressed as [1],[2]
=
]G)GGggg(gg[(G
)GGg)(GGggg)(GGg(
)GGg)(GGgg(
FGD2m2g1d2m1mF
FL2dGD2m2g1dFG1g
FL2dGD2g1d
(7)
Similarly, its output impedance and voltage gain can be
expressed as [1], [2]
=
]G)GGggg(gg[(G
)Gg)(GGggg)(GGgg(
)GGg)(GGgg(
FGD2m2g1d2m1mF
F2dGD2m2g1dFGs1g
FG1gGD2g1d
(8)
1313
Y
1343
Y
131Sgn34Sgn43
13VA 11
AV=)GG)(gGGg(g
)GG(gGgg
FLd2GDg2d1
GDd1Fm2m1 (9)
VERIFICATION ON MATLAB
The values of , , and 43
13VA for different values of
source conductance and load conductance ( 0mS, 1mS, and
2mS) have been programmed through MATLAB. The
output of the MATLAB programs have been plotted for ,
, and 4313VA with respect to feedback conductance, Gf .
If we assume that the two MOSFETs of Fig. 2 are properly
biased to yield the same values of its internal parameters
( 1dg = 2dg and 1mg = 2mg ), then for plotting on demand
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0102-3
value of simulated input and output resistances, typical
values of external parameters along with its internal
parameters can be given as:
1dg = 2dg = 0.1mS, 1mg = 2mg = 5mS, LG = DG = 1mS,
1GG = 2GG = GG = 0.001mS, 1gg = 2gg = 0.0001mS, FG
= variable (0mS to 0.15mS).
The plots of input and output resistances results into on
demand values or in other words simulated input and output
resistance can have any values, both negative and positive
that is controlled by the feedback conductance between the
two stages of the amplifier.
The plot of input resistance as a function of feedback
conductance is shown in Figs.3, 4, and 5 for 0 S, 1 mS and 2
mS of load conductance respectively as per (7).
Following observations are recorded from the plots in Fig. 3,
4 and 5:
Fig.3 Input resistance as a function of feedback conductance for
GL= 0 S
a) For GL = 0 S, input resistance is almost constant (
1.148e+06 Ω) from initial values of Gf till Gf reaches
2.7520e-05 mS, thereafter input resistance began to rise
exponentially (from 1.148e+06 Ω to 4.837e+06 Ω) for
2.7520e-05 mS to 2.7523e-05 mS variation in Gf. It is
interesting to note that Ri suddenly jumps down (from
4.837e+06 Ω to -6.828e+07 Ω) for 2.7523e-05 mS to
2.7524e-05 mS variation in Gf , again Ri began to increase
suddenly to -4.237e+06 Ω as Gf approaches 2.7525e-05 mS,
the curve then starts increasing linearly (from -4.237e+06 Ω
to -1.473e+06 Ω) from Gf = 2.7525e-05 mS to Gf = 2.7527e-
05 mS respectively, and Ri remains constant thereafter at -
1.473e+06 Ω for higher values of Gf.
Fig.4 Input resistance as a function of feedback conductance for
GL= 1 mS
b) For GL= 1 mS, input resistance is almost constant at
3.289e+05 Ω from initial values of Gf till Gf reaches
0.0004036 mS, thereafter Ri starts increasing linearly (from
3.289e+05 Ω to 4.393e+07 Ω) from Gf = 0.0004036 mS to
Gf = 0.0004038 mS and suddenly jumps down (to -
7.805e+06 Ω) as Gf reaches 0.00040381 mS. Again, Ri
began to rise (from -7.805e+06 Ω to -6.729e+05 Ω) from Gf
= 0.00040381 mS to Gf = 0.0004039 mS respectively, and
remains constant thereafter at -6.729e+05 for higher values
of Gf.
Fig.5 Input resistance as a function of feedback conductance for
GL = 2 mS
c) For GL= 2 mS, input resistance rises exponentially (from
216.5 Ω to 3331 Ω) from Gf = 0.0001 mS to Gf = 0.0011 mS
respectively, then suddenly it jumps down to Ri= -4418 Ω at
Gf = 0.0012 mS and again rises exponentially( to -225.4 Ω)
till Gf = 0.002 mS and remains constant thereafter at -225.4
Ω for higher values of Gf.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0102-4
The plot of output resistance as a function of feedback
conductance (Gf) is shown in Figs.6, 7, and 8 for 0 S, 1 mS
and 2 mS of source conductance respectively as per (8).
Following observations are recorded from the plots in Fig. 6,
7 and 8:
Fig.6 Output resistance as a function of feedback conductance for
GS = 0 S
a) For gs = 0 S, output resistance is almost constant (
1.735e+04 Ω) from initial values of Gf till Gf reaches
2.752e-05 mS, thereafter output resistance starts rising
exponentially (from 1.735e+04 Ω to 5.452e+04 Ω) for
2.7520e-05 mS to 2.7522e-05 mS variation in Gf. It is
interesting to note that Ro suddenly jumps down (from
5.452e+04 Ω to -7.697e+05 Ω) for 2.7522e-05 mS to
2.75242e-05 mS variation in Gf, again Ro began to increase
suddenly to -4.776e+05 Ω as Gf reaches 2.75262e-05 mS,
then starts increasing exponentially (from -4.776e+05 Ω to -
1.252e+04 Ω) from Gf = 2.75262e-05 mS to Gf = 2.753e-05
mS respectively, and then Ro remains constant thereafter
at -1.252e+04 Ω for higher values of Gf.
Fig.7 Output resistance as a function of feedback conductance for
GS = 1 mS
b) For Gs= 1 mS, output resistance is almost constant at
237.9 Ω from initial values of Gf till Gf reaches 0.03340
mS, thereafter Ro starts increasing exponentially (from
237.9 Ω to 2829 Ω) from Gf = 0.03340 mS to Gf = 0.03341
mS and suddenly jumps down (to -7836 Ω) as Gf reaches
0.033411 mS. Again, Ro rises (from -7836 Ω to -22.83 Ω)
from Gf = 0.033411 mS to Gf = 0.0335 mS, and remain
constant thereafter at -22.83 Ω for higher values of Gf.
Fig.8 Output resistance as a function of feedback conductance for
GS = 2 mS
c) For Gs= 2 mS, output resistance rises exponentially (from
0.805 Ω to 39.85 Ω) from Gf = 0.09 mS to Gf = 0.1 mS
respectively, suddenly it jumps down to Ro= -1.028 Ω at Gf
= 0.11 mS and remains constant thereafter at -1.028 Ω for
higher values of Gf.
The plot of voltage gain as a function of feedback
conductance is shown in Figs.9 and 10 for 0 S, 1 mS and 2
mS of load conductance respectively as per (9).
Fig.9 Voltage gain as a function of feedback conductance for
GL = 0 S
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0102-5
Fig.10 Voltage gain as a function of feedback conductance for
GL = 1 mS and 2 mS
Plots in the figs. 9 and 10 reveals that voltage gain (AV) is an
inverse function of feedback conductance (Gf), further the
voltage gain decreases as the value of source conductance
(gs) increases due to their inverse relationship given by (9).
CONCLUSION
Plots in the Figs. 3 to 8 reveal a region of very sudden
change in the values of input resistance and output resistance
from very high positive values to large negative value, for
very small change of the order of 10-05
in the value of
feedback conductance, Gf. This zone of very high variation
in input and output resistances can be used for compensation
of resistances to obtain very high Q-factor in the lossy
networks.
REFRENCES
[1] Wai-Kai Chen, On second order cofactors and null return difference in feedback amplifier theory, International Journal of circuit theory and
application, Vol. 6, Issue 3, pp. 305-312, Dec. 2006.
[2] Otso Juntunen , A two port S-parameter data transformation, circuit theory laboratory report series, CT-35, Helsinki University of technology,
Finland, Espoo 1998. [3] B.P. Singh, Unified Approach to electronics circuit analysis, IJEEE, pp.
276-285, July 1978.
[4] B.P. Singh, Active bridge for measurement of admittance parameters of the transistors, Indian Journal of Pure and Applied Physics, Vol. 15, pp.
783-786, Nov. 1976.
[5] B.P. Singh, A new active bridge for measuring FET parameters, J Phys. E. Scientific Instrument, Vol. II, pp. 667-670, 1978.
[6] Jacob Millman and Christos C. Halkias, Integrated Electronics, Analog
and Digital Circuits and Systems, TATA McGRAW-HILL publication, pp. 471-475, 2004.
[7]B.P. Singh, Meena Singh, Sanjay Kumar Roy and S.N. Shukla,
Mathematical Modeling of Electronic Devices and its integration; Proceedings of National Seminar on Recent Advances on Information
Technology, Allied Publishers Pvt. Ltd., Indian School of Mines Dhanbad
University, pp.494-502, Feb. 6-7, 2009
[8]B.P. Singh, Arun Kumar Singh, verification of transfer functions
of BJT obtained by using MATLAB, Proceedings of IEEE National
Symposium on Innovative Development in Electronics Arena, Arya
College of Engineering, pp. 92-96, Dec. 12, 2009.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-1
RELIABILITY PREDICTION FOR IGBT BASED INVERTERS UNDER
DIFFERENT SWITCHING PATTERNS
Fuzail Ahmad
#1, S.K.Singh
*2, Amit Kumar Verma
#DOEACC CENTRE GORAKHPUR,INDIA, DOEACC CENTRE GORAKHPUR,INDIA
IBM GURGAON,INDIA
[email protected] [email protected]
Abstract—Due to the increasing importance of power electronics
in control of devices particularly in electrical vehicles the
reliability analysis becomes important. The reliability of a
component is the probability that this component will perform its
intended function after a time ‘t’ in a given operating condition.
Nowadays component reliability is not very important by
considering only the power losses. For predicting reliability of
power electronics components temperature and temperature
cycle are to be determined.
Military handbook [3] has been released by US
department of defence is generally accepted and often used to
determine reliability [1]. Now the handbook is not revised and
new components like IGBTs are not considered here the values
are too conservative for available devices. Some manufacturers
gives information of finding reliability through information that
only continue to finding switching losses and total power losses,
very few of them gives the thermal model of the devices. The
information of calculating the power losses and thermal
modelling is presented in [5] based on PWM reconstruction
technique. This method is useful for large simulation time step
and particularly for long mission profiles. D. Hirschman
presented an approach with simple formulas for reliability
prediction of inverters in HEVs. Work presented in literature so
far has developed reliability models for power electronics
components but not bothered about the effect of PWM method
on the reliability. This work presents the comparison between
six-step PWM based inverter and SVPWM based IGBT inverter
on finding reliability. In this work reliability is found by
conventional method and also by considering thermal cycles.
MATLAB/Simulink based models for finding out the switching
losses and temperature cycles are developed.
I. INTRODUCTION
The use of power electronic components in automobile
applications is increasing day-by-day. Due to this it becomes
important to determine the reliability of power electronic
components used in automotive applications.
Inverters are used in hybrid electric vehicles to
convert the DC supply coming from battery into AC for use in
motor to run the vehicle. Inverters are made up of
semiconductors and capacitors, so it is important to assure the
reliability of these components. Because malfunctioning of
any of the power electronic components may prevent the
vehicle to operate.
Mainly three phase voltage source inverters are used
in these types of applications. Here IGBTs are used as
switching devices. For designing an inverter, it is important to
make a good thermal design such that on the one hand the
temperature of the components never exceeds their specified
maximum temperature and on the other hand the cooling
system is not oversized.
document is a template. An electronic copy can be
downloaded from the conference website. For questions on
paper guidelines, please contact the conference publications
committee as indicated on the conference website.
Information about final paper submission is available from the
conference website.
II. BASICS OF RELIABILITY CALCULATION
2.1 INTRODUCTION
―The reliability of a component is the probability that
this component will perform its intended function after a time
t in a given working condition.‖
The Global reliability of the system is the product of all
reliabilities
Here n is the no. of components and .
It means adding component reduces reliability.
The starting point in reliability analysis is the
evaluation of reliability of a device or a component. This is
generally done from the available failure data. That is, a large
number of identical components are subjected to identical
operating conditions and the frequency of their failures is
tabulated.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-2
Let be the total no. of identical components for
reliability.
is the no. of components surviving at time‗t‘.
is the no. of components failed at time‗t‘.
Then at any time, + = and Reliability is
given as
=
(2.1)
Reliability is characterized by the failure rate .
The failure rate is the probability that a component, which is
still operational at time , fails in the time interval ,
where . Thus, it gives the fraction of failures in a
certain time interval for defined boundary conditions. The unit
of the failure rate is FIT (failures in time)
(2.2)
The total failure rate for a system, consisting of k
components, is the sum of all single failure rates , given as
(2.3)
The mean time to failure (MTTF) is also used to characterize
the reliability. MTTF is mean time elapsed before the first
failure occurs, is equal to the area under the reliability curve.
(2.4)
It can be calculated easily as
(2.5)
Different approaches can be used to calculate reliability.
Well known method is to use Failure rate catalogs. There are
various failure rate catalogs available e.g. Military Handbook
(MIL-HDBK-217F) and Recueil de Données de Fiabilité
(RDF 2000).
2.2 Military Handbook (MIL-HDBK-217F) Method
Military Handbook 217F has been released in 1995 by the
US Department of Defense, Washington DC. This revised
version is also the last version as the Department of Defense
has discontinued updating this standard. Hence, new
electronic devices like IGBTs are not considered in this
standard and many reference values are too conservative for
the currently available devices. Regardless, MIL-HDBK-217F
is generally accepted and often used to determine reliability.
The models have been developed, based on the historical part
failure rates.
2.2.1 Component failure rate for IGBT
The component failure rate is computed by multiplying
a component base failure rate with application specific -factors.
Failures/
(2.6)
Here is Base Failure Rate
is Temperature Factor
is Application Factor
is Quality Factor
is Environmental Factor
However no-factor exist which takes temperature cycles into
consideration.
III. ELECTRICAL MODELING AND
CALCULATION OF POWER LOSSES
A. During the design phase of an inverter, it is important to
make a good thermal design such that on the one hand the
temperatures of the components never exceed their
specified maximum temperature and on the other hand the
cooling system is not oversized. In hybrid electric vehicles,
the inverter load cannot directly be derived from the
current load status. Instead, the inverter load is computed
by a complex algorithm that considers the motor speed, the
required torque, the state of charge of the traction battery
etc.
The electrical simulation includes the inverter model and
computes the currents and voltages at the terminals of the
inverter. These values are stored in a file which is used as
input for the thermal simulation. The advantage of this
procedure is that the results of the electrical simulation can be
reused for different thermal simulations, if nothing in the
model is changed.
SIMULATION AND RESULTS
5.1 INTRODUCTION A block diagram representation of the whole work is
shown in fig 5.1. The fig shows a three-phase inverter with
IGBT/Diode as a switching device, constant DC supply as an input to
the inverter model and a three-phase load.
The losses in IGBT i.e. conduction loss and switching loss
is calculated and fed to the thermal model. Here it should be noted
that switching losses in an IGBT can be found by using datasheets.
The thermal model gives the junction temperature as an output,
which is later used in calculating reliability of the devices.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-3
Fig.5.1 Block Diagram Representation of Model
5.2 THREE PHASE INVERTER
The Universal Bridge block used in simulation model
implements a universal three-phase power converter that consists of
up to six power switches connected in a bridge configuration. The
types of power switch and converter configuration are selectable
from the dialog box. The Universal Bridge block allows simulation
of converters using both naturally commutated and line-commutated
power electronic devices (diodes or thyristors) and forced-
commutated devices (GTO, IGBT, MOSFET).
5.2.1 DESCRIPTION OF IGBT
The important specifications of IGBTs are as follows:
INPUT (g) - PWM switching signal to control the opening and
closing of the IGBT.
OUTPUT (m) - The Simulink output of the block is a vector
containing two signals. These signals are demultiplexed by using the
Bus Selector block provided in the Simulink library. These signals
are -
1. IGBT Current (A)
2. IGBT Voltage (V)
The Parameters of the IGBT used in simulation model are as follows:
1. Internal resistance (Ron) - The internal resistance Ron of the IGBT
device, in ohms (Ω). In this model it is 1mΩ.
2. Snubber resistance (Rs) - The snubber resistance, in ohms (Ω).
The Snubber resistance Rs is set to infinite to eliminate the snubber
from the model.
3. Snubber capacitance (Cs) - The snubber capacitance in farads (F).
The Snubber capacitance Cs is set to zero to eliminate the snubber.
5.3 CONTROL CIRCUIT Generate pulses for carrier-based pulse width modulator
(PWM) for IGBTs. For each arm the pulses are generated by
comparing a triangular carrier waveform to a reference modulating
signal. The modulating signals can be generated by the PWM
generator itself, or they can be a vector of external signals connected
at the input of the block. Three reference signals are needed to
generate the pulses for a three-phase bridge.
The amplitude modulation ratio, phase, and frequency of
the reference signals can be changed to control the output voltage of
the bridge connected to the PWM Generator block on the AC
terminals. The two pulses firing the two devices of an arm bridge are
complementary to each other for example, when pulse 1,3,5 is low (0)
then pulse 2,4,6 is high (1).
INPUT – Internal generation of modulating signals.
OUTPUT - Six pulses are generated for a three-arm bridge. Pulses 1,
3, and 5 fire the upper devices of the first, second, and third arms.
Pulses 2, 4, and 6 fire the lower devices.
The parameters of the control circuit used in simulation are as
follows:
1. Carrier Frequency (Hz) – 1080 Hz.
2. Sample Time (sec) – 5.14 µsec.
3. Modulation Index – 0.8. The amplitude of the internal
sinusoidal modulating signal. The Modulation index must be greater
than 0 and lower than or equal to 1. This parameter is used to control
the amplitude of the fundamental component of the output voltage of
the controlled bridge.
4. Frequency of Output Voltage – 60 Hz. The frequency, in hertz, of
the internal modulating signals. This parameter is used to control the
fundamental frequency of the output voltage of the
controlled bridge.
5. Phase of Output Voltage (degrees) – 0.
5.4 CALCULATION OF LOSSES IN IGBT
5.4.1 CONDUCTION LOSSES
As described in detail in chapter 4 conduction loss in an
IGBT is given as a multiplication of collector to emitter voltage of
IGBT when it is conducting and the collector current.
(5.1)
5.4.2 SWITCHING LOSSES
The best way to find switching losses in an IGBT is by
using datasheets provided by the manufacturer. For this model IXER
35N120D1 by IXER is used. In this datasheet
5.5 THERMAL MODEL
In the thermal model the transient thermal impedance curve
(Fig.5.3) provided in every datasheets of IGBT/Diode is used to find
the parameters of the thermal network (given in Fig.4.2).
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-4
Fig.5.2 Block Diagram Representation of THERMAL MODEL.
Some manufacturers provide the values of thermal
resistance and capacitance in their datasheets. But in most of the
datasheets the information required to obtain thermal network
parameters is commonly given in form of a transient thermal
impedance curve ( ).
Fig. 5.3 Transient Thermal Impedance Curve
5.5.1 CURVE FITTING
Here the curve fitting technique is used to approximate the
curve by eq. 4.10. The points taken for data fitting are:
For IGBT
t= [0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0]
= [0 0.38 0.5 0.58 0.59 0.595 0.6 0.6 0.6 0.6 0.6]
For Diode
t= [0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0]
= [0 0.79 1.01 1.19 1.25 1.28 1.3 1.3 1.3 1.3 1.3]
Here for smoothening the curve moving average method is used. For
fitting values to this curve exponential type of FIT is used the
governing equation is
a+b*exp(-c*x)+d*exp(-e*x)
Here the values of variables a,b,c,d,e gives the coefficients for
thermal network equations.
Table 5.1 gives the values of coefficients of the equation
approximated and fitted by exponential curve fitting technique.
Table 5.2 gives the values coefficients of the transfer
function found by transient thermal impedance curves.
The values of calculated thermal resistance and thermal capacitance
values are given in Table 5.3.
TABLE 5.1
Table 5.3
5.6 RESULTS AND DISCUSSION
The simulation was carried out for three-phase IGBT
inverter used in two different applications: Six step VSI induction
motor drive and Space vector PWM VSI induction motor drive. The
junction temperature of six IGBTs and six Diodes are simulated. In
this case the temperatures of the IGBT and diode junctions do not
differ significantly. Hence, the temperatures of one IGBT junction
and one diode junction are presented here. The simulations are
carried out for both of these cases for different simulation times, also
speed and torque values are changed in between simulations to better
incorporate the driving cycles. It can be seen that the results are
improved for long simulation run time.
5.6.1 SIX STEP GENERATION TECHNIQUE
The results shown here is for values of thermal coefficients
given in datasheets. The results for values of thermal coefficients
calculated from transient thermal impedance curve by curve fitting
technique are given in AppendixIII.
Parameters of fitted curve
IGBT DIODE
= -0.499 = -1.099
= 7.772 = 6.901
= 69.518 = 40.212
Calculated and
IGBT DIODE
= 0.238 °C/W = 0.650 °C/W
= 0.362 °C/W =0.650 °C/W
= 0.095 J/°C = 0.064 J/°C
= 0.240 J/°C = 0.133 J/°C
Values of coefficients
IGBT DIODE
= 0.60 = 1.30
= 0.0202 = 0.0562
= 0.1431 = 0.1698
=
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-50
0
50
Time (sec)
Stator C
urrent (A
) Stator current
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
1000
2000
Time (sec)
Speed (rpm
)
Rotor speed
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-50
0
50
Time (sec)
Torque (N
m)
Fig.5.4 Stator Current, Speed and Torque Curve
In fig. 5.5 the changes in torque and speed values is shown
which clearly indicates the changes that occur at time 1sec, 1.5sec,
2.5sec and 4sec. The change in the curve of stator current takes place
in accordance with changes in speed and torque values.
A speed reference step from 0 to 1800 rpm is applied at t =
0. The speed set point doesn't go instantaneously at 1800 rpm but
follows the acceleration ramp. The motor reaches steady state at t = 1
s.
At t = 1.5 s, a decelerating torque is applied on the
motor's shaft. We can observe a speed decrease. Since the rotor speed
is higher than the synchronous speed, the motor is working in the
generator mode. The braking energy is transferred to the DC
link and the bus voltage tends to increase. However the over voltage
activates the braking chopper which causes the voltage to reduce. In
this example, the braking resistance is not big enough to avoid a
voltage increase but the bus is maintained within tolerable
limits.
At t = 2.5 s, the torque applied to the motor's shaft
steps from 30 Nm to 0 Nm .You can observe a DC bus voltage and
speed drop. At this point, the DC bus controller switches from
braking to motoring mode.
At t = 4 s, the load torque is switched from 0 to 15 the
speed of motor again starts following the acceleration ramp. Again
motor reaches a steady state at t=4.4sec.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
10
20
30
40
50
60
70
80TEMPERATURE CURVE
TIME (in seconds)
TE
MP
ER
AT
UR
E (in
degree C
elc
ius)
0 5 10 15 20 25 300
2
4
6
8
10
12
14
16
difference in temperature
num
ber of tim
es
number of detected temperature cycles
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
5
10
15
20
25
30
35TEMPERATURE CURVE
TIME (in seconds)
TE
MP
ER
AT
UR
E (
in d
egre
e C
elc
ius)
0 5 10 15 20 25 30 35 400
5
10
15
20
25
30
difference in temperature
num
ber
of
tim
es
number of detected temperature cycles
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-6
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
50
100
Time (sec)
Pow
er
Loss (
W)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
10
20
30
40
Time (sec)
Tem
p(d
eg c
el)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
50
100
Time (sec)
Tem
p(d
eg c
el)
Fig. 5.5 Power Loss, Junction Temperature Curve
Fig 5.6 shows the power losses that occur in IGBTs.
Since total power loss is summation of conduction losses and
switching losses, and switching losses are constant losses, which
is 40W as shown in fig.5.5 The fluctuation in curve is only due to
the variation of conduction losses. The losses in a diode are same
as that in IGBTs.
The junction temperature for IGBT and diode is shown.
Which indicate that the temperature in a diode is higher than that
in IGBTs. The curve shows the variation in power and
temperature cycles due to the variation in speed of the motor.
Fig. 5.6 Detected Temperature Cycles for IGBT
Fig. 5.7 Detected Temperature Cycles for Diode
The temperature cycles of junction temperature are
detected from the algorithm (given in 4). There are total of 210
temperature cycles are detected. The curve clearly indicates that
numbers of temperature cycles are high at low values of
difference in temperature and goes on decreasing. The
temperature cycles below 15 ºC are not much harmful for
semiconductors. But they should be considered due to their large
numbers.
∆T n(reldata) N N(Total)= N*
n(reldata)
1/N
3 22 8.6071e+006 1.89E+08 5.28E-09
4 20 8.1873e+006 1.64E+08 6.11E-09
5 27 7.7880e+006 2.10E+08 4.76E-09
6 7 7.4082e+006 5.19E+07 1.93E-08
7 4 7.0469e+006 2.82E+07 3.55E-08
8 4 6.7032e+006 2.68E+07 3.73E-08
9 9 6.3763e+006 5.74E+07 1.74E-08
10 5 6.0653e+006 3.03E+07 3.30E-08
11 5 5.7695e+006 2.88E+07 3.47E-08
12 5 5.4881e+006 2.74E+07 3.64E-08
13 5 5.2205e+006 2.61E+07 3.83E-08
14 3 4.9659e+006 1.49E+07 6.71E-08
15 3 4.7237e+006 1.42E+07 7.06E-08
16 3 4.4933e+006 1.35E+07 7.42E-08
17 5 4.2741e+006 2.14E+07 4.68E-08
18 3 4.0657e+006 1.22E+07 8.20E-08
19 3 3.8674e+006 1.16E+07 8.62E-08
20 5 3.6788e+006 1.84E+07 5.44E-08
21 4 3.4994e+006 1.40E+07 7.14E-08
22 4 3.3287e+006 1.33E+07 7.51E-08
23 4 3.1664e+006 1.27E+07 7.90E-08
24 4 3.0119e+006 1.20E+07 8.30E-08
25 4 2.8650e+006 1.15E+07 8.73E-08
26 4 2.7253e+006 1.09E+07 9.17E-08
27 4 2.5924e+006 1.04E+07 9.64E-08
28 4 2.4660e+006 9.86E+06 1.01E-07
29 4 2.3457e+006 9.38E+06 1.07E-07
30 4 2.2313e+006 8.93E+06 1.12E-07
31 4 2.1225e+006 8.49E+06 1.18E-07
32 4 2.0190e+006 8.08E+06 1.24E-07
33 4 1.9205e+006 7.68E+06 1.30E-07
34 4 1.8268e+006 7.31E+06 1.37E-07
35 4 1.7377e+006 6.95E+06 1.44E-07
36 4 1.6530e+006 6.61E+06 1.51E-07
37 4 1.5724e+006 6.29E+06 1.59E-07
38 4 1.4957e+006 5.98E+06 1.67E-07
F(t) 2.78e-6
R(t) 1-F(t) 0.99999722
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-7
Table 5.4 Number of Temperature Cycles and Reliability
The values of reliabilities found by using direct value
putting and by using curve fitting technique shows that, curve fitting
technique gives better reliability.
5.6.2 SPACE VECTOR PWM
TECHNIQUE
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-50
0
50
Time (sec)
Stator C
urrent(A
)
Stator current
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
500
1000
1500
2000
Time (sec)
Speed (rpm
)
Rotor speed
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-20
0
20
40
Time (sec)
Torque(N
m)
Electromagnetic Torque
Fig. 5.8 Stator Current, Speed and Torque
At time t = 0 s, the speed set point is 1800 rpm. The speed
follows precisely the acceleration ramp. Speed comes to a steady
state at t=1 sec.
At t = 1.5 s, the full load torque is applied to the motor shaft
while the motor speed is still ramping to its final value. This forces
the electromagnetic torque to increase to a high value and then to
stabilize at 20 Nm once the speed ramping is completed and the
motor has reached 1200 rpm.
At t = 2.5 s, the speed set point is changed to 1500 rpm and the
electromagnetic torque reaches again a high value so that the speed
ramps precisely at 1800 rpm/s up to 1500 rpm under full load.
At t = 4 s, the mechanical load passed from 0 Nm to 15 Nm,
which causes the electromagnetic torque to stabilize at approximately
at 20 Nm shortly after. Note that the DC bus voltage increases since
the motor is in the braking mode. This increase is limited by the
action of the braking chopper.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
50
100
150
Time (sec)
Pow
er Loss(W
)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
10
20
30
Time (sec)
Tem
p(deg cel)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
20
40
60
Time (sec)
Tem
p(deg cel)
Fig
5.9 Power Loss and Temperature Curves
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
5
10
15
20
25
30TEMPERATURE CURVE
TIME (in seconds)
TE
MP
ER
AT
UR
E (
in d
egre
e C
elc
ius)
0 5 10 15 20 25 30 350
5
10
15
difference in temperature
num
ber
of
tim
es
number of detected temperature cycles
Fig. 5.10 Detected temperature Cycles for IGBT
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-8
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
10
20
30
40
50
60TEMPERATURE CURVE
TIME (in seconds)
TE
MP
ER
AT
UR
E (in degree C
elcius)
0 5 10 15 20 250
5
10
15
20
25
30
35
difference in temperature
num
ber of tim
es
number of detected temperature cycles
Fig. 5.11 Detected temperature Cycles for Diode
Table 5.5 Number of Temperature Cycles and Reliability
5.6.3 COMPARISON OF RELIABILITIES
TABLE 5.6 CALCULATED
MTTFs
In Table 5.3 the calculated MTTFs for the two approaches
are compared. Even though the same simulation data were used, the
both approaches calculated components MTTFs which differ for
orders of magnitude. It can be easily seen that the MTTFs for IGBTs
and Diodes in both applications are comes out to be nearly same.
This due to the reason that both experiments are done in nearly same
the operating condition. Reliability calculated by Military handbook
does not consider the effect of temperature cycling hence MTTFs
from this method is same for all three cases. In all cases the IGBTs
are comes out to be least reliable component.
∆T n(reldata2) N N(Total)=
N*
n(reldata2)
1/N
3 3 8.6071e+006 2.58E+07 3.87E-08
4 3 8.1873e+006 2.46E+07 4.07E-08
5 3 7.7880e+006 2.34E+07 4.28E-08
6 4 7.4082e+006 2.96E+07 3.37E-08
7 4 7.0469e+006 2.82E+07 3.55E-08
8 4 6.7032e+006 2.68E+07 3.73E-08
9 3 6.3763e+006 1.91E+07 5.23E-08
10 3 6.0653e+006 1.82E+07 5.50E-08
11 3 5.7695e+006 1.73E+07 5.78E-08
12 3 5.4881e+006 1.65E+07 6.07E-08
13 3 5.2205e+006 1.57E+07 6.39E-08
14 3 4.9659e+006 1.49E+07 6.71E-08
15 3 4.7237e+006 1.42E+07 7.06E-08
16 3 4.4933e+006 1.35E+07 7.42E-08
17 3 4.2741e+006 1.28E+07 7.80E-08
18 3 4.0657e+006 1.22E+07 8.20E-08
19 3 3.8674e+006 1.16E+07 8.62E-08
20 3 3.6788e+006 1.10E+07 9.06E-08
21 3 3.4994e+006 1.05E+07 9.53E-08
22 3 3.3287e+006 9.99E+06 1.00E-07
23 3 3.1664e+006 9.50E+06 1.05E-07
24 3 3.0119e+006 9.04E+06 1.11E-07
25 3 2.8650e+006 8.60E+06 1.16E-07
26 3 2.7253e+006 8.18E+06 1.22E-07
27 3 2.5924e+006 7.78E+06 1.29E-07
28 3 2.4660e+006 7.40E+06 1.35E-07
29 3 2.3457e+006 7.04E+06 1.42E-07
30 3 2.2313e+006 6.69E+06 1.49E-07
31 3 2.1225e+006 6.37E+06 1.57E-07
32 3 2.0190e+006 6.06E+06 1.65E-07
33 3 1.9205e+006 5.76E+06 1.74E-07
34 3 1.8268e+006 5.48E+06 1.82E-07
F(t) 2.95E-06
R(t) 1-F(t) 0.99999705
Six-step SVPWM SVPWM
(Ts=50sec)
MTTF
(hrs)
IGB
Ts
Diod
es
IGBT
s
Diod
es
IGB
Ts
Diod
es
MIL-
HDBK-
217
Coffin-
Manson
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-9
The results for SVPWM technique for simulation time of
50sec is given in appendix II.
REFERENCES
[1] D. Hirschmann, D. Tissen, S. Schroder, and R. De Doncker,
― Reliability Prediction for Inverters in Hybrid Electrical
Vehicles‖, IEEE transactions on power
electronics,vol.22,n0.6,nov 2007
[2] D. Hirschmann, D. Tissen, S. Schroder, and R. De Doncker,
―Inverter design for hybrid electrical vehicles considering
mission profiles,‖ in Proc. IEEE Vehicle Power Propulsion
Conf., Sep. 2005.
[3] “Military Handbook (MIL-HDBK-217F),” Dept. Defense, Dec.
1991, Ed.
[4] L.K. Mestha, P.D. Evans, ―Analysis of on-state losses in PWM
inverters‖. IEE Proceedings, Vol. 136 pp.189-195, July 1989.
[5] A.D. Rajapakse, A.M. Gole, and PL. Wilson. ―Electromagnetic
transient simulation models for accurate representation of
switching losses and thermal performance in power electronic
systems‖. IEEE Trans. Power Delivery, 20(1):319-327,
January 2005.
[6] A. Goel and R. J. Graves, ―Electronic system reliability:
Collating prediction models,‖ IEEE Trans. Device Mater.
Rel., vol. 6, no. 2, pp. 258–265, Jun. 2006.
[7] P.Nance,M.Marz ―Thermal Modeling of Power Electronics
System‖ PCIM Europe Power Electronic Systems, No.
2/2000 pp.20-27.
[8] W. Engelmaier, ―Fatigue life of leadless chip carrier solder
joints during power cycling,‖ IEEE Trans. Comp. Hybrids
Manufact. Technol., vol. CHMT-6, no. 3, pp. 232–237, Sep.
1983.
[9] M. Ciappa, F. Carbognani, and W. Fichtner, ―Lifetime
prediction and design of reliability tests for high-power
devices in automotive applications,‖ IEEE Trans. Device
Mater. Rel., vol. 3, no. 4, pp. 191–196, Dec. 2003.
[10] A. Morozumi, K. Yamada, T. Miyasaka, S. Sumi, and Y. Seki,
―Reliability of power cycling for IGBT power semiconductor
modules,‖ IEEE Trans. Ind. Appl., vol. 39, no. 3, pp. 665–671,
May. 2003.
[11] Mitsubishi Semiconductors Power Modules ―General
considerations for IGBT and intelligent power modules‖.
[12] Z. Zhou, M. S. Khanniche, P. Igic, S. T. Kong, M. Towers, and
P. A. Mawby, ―A fast power loss calculation method for long
real time thermal simulation of IGBT modules for a three-
phase inverter system,‖ in Power Electron. Applications,
2005 Eur. Conf., Sep. 2005.
[13] T. Kojima, Y. Nishibe, Y. Yamada, T. Ueta, K. Torii, S. Sasaki,
and K. Hamada, ―Novel electro-thermal coupling simulation
technique for dynamic analysis of HV (hybrid vehicle)
inverter,‖ in Proc. 37th IEEE Power Electron. Specialists
Conf., 2006, PESC ’06, Jun. 2006, pp. 1–5.
[14] Semikron Application Handbook. Berlin, Germany: ISLE
Verlag, 1998. ISBN 3-932633-24-5.
[15] Z. Zhou,M. S. Khanniche,P. Igic,S. M. Towers ,P. A. Mawby,
―Power loss calculation and thermal modeling for a three
phase phase inverter drive system‖, J. Electrical Systems 1-4
(2005): 33-46.
[16] Takashi Kojima, Yuji Nishibe, Yasushi Yamada,Takashi Ueta,
Kaoru Torii, Shoichi Sasaki, Kimimori Hamada. ―Novel
Electro-Thermal Coupling Simulation Technique for
Dynamic Analysis of HV (Hybrid Vehicle) Inverter‖ 37th
IEEE Power Electronics Specialists Conference / June 18 - 22,
2006, Jeju, Korea.
[17] K & K Associates, Ed., Thermal Network Modeling Handbook
10141 Nelson St.. Westminster, CO, 80021, K & K
Associates, Developers of Thermal Analysis Kit (TAK), 2000.
[18] A.R. Hefner. ―A dynamic electro-thermal model for the IGBT‖.
IEEE Trans. Industry Applications, 30(2):394-405, March
1994.
[19] L.K. Mestha, P.D. Evans, ―Analysis of on-state losses in PWM
inverters‖, IEE PROCEEDINGS, Vol. 136, Pt. B, No. 4, JULY
1989.
[20] C.-S. Yun, P. Malberti, M. Ciappa, and W. Fichtner, ―Thermal
component model for electromechanical analysis of IGBT
module systems,‖ IEEE Trans. Adv. Packag., vol. 24, no. 3,
pp. 401–405, Aug. 2001.
[21] M. Ciappa and W. Fichtner, ―Lifetime prediction of IGBT
modules for traction applications,‖ in Proc. IEEE Int.
Reliability Physics Symp.,
San Jose, CA, 2000, pp. 210–216.
[22] A.T. Bryant, A. Walker, and P.A. Mawby, ―Fast Inverter loss
simulation for Hybrid electrical vehicle drives.‖, Hybrid
Vehicle Conference, IET The Institution of Engineering and
Technology, 2006.
[23] IXYS Semiconductor GmbH, IXER 35N120D1 Product
Specification Sheet, Lampertheim, Germany, 2003.
[24] Eupec IGBT modules , BSM 100 GD 60 DLC datasheet,
2000-02-08.
[25] International rectifier,IGBT, IRG4PC40KD datasheet,2000.
[26] TOSHIBA, GTR Module silicon n-channel IGBT,
MG300J2YS50 datasheet.
[27] Dustin A. Murdock, Jose E. Ramos Torres, Jeffrey J. Connors,
and Robert D. Lorenz. “Active Thermal Control of Power
Electronic Modules‖, IEEE Transactions on industry
Applications, VOL. 42, NO. 2, March/April 2006.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0103-10
[28] D. Xu, H. Lu, L. Hang, S. Azuma, M. Kimata and R. Uchida,
Power Loss and Junction Temperature Analysis of Power
Semiconductor Devices, IEEE Transaction on Industry
Applications, Vol..38, No.5, pp, 1426-1431,
September/October 2002.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
1
Abstract— Multi-threshold CMOS (MTCMOS) technology
features the MOSFETS having low threshold voltage (for speed
enhancement) and high threshold voltage (for suppressing
standby leakage current during sleep period). In this design,
frequent transition of mode i.e. active to sleep and sleep to active
may occur, which consumes significant amount of energy. This
paper presents charge recycling concept between virtual supply
and virtual ground to reduce dynamic energy consumption
during mode transition. This paper presents the simulation of
two bit carry ripple adder used in 2 bit accumulator depicting
reduction of 75% dynamic energy consumption during mode
transition as compared to a ripple adder with conventional
MTCMOS.
Index Terms— Charge recycling, Gated ground, Gated-power,
Multi-threshold voltage, Virtual power node,
I. INTRODUCTION
OW power design is one of the most significant challenges
in designing today’s advanced VLSI circuit. Currently,
portable devices consume lots of energy during idle period due
to leakage current which shortens the battery lifetime. A
popular low leakage circuit technique –multi-threshold voltage
technology which is based on disconnecting the low threshold
voltage (low Vt) logic gates from power supply and /or the
ground line by the use of sleep transistor (high Vt) (Fig.1)
during the standby mode by turning off the sleep transistor [1].
However during the mode transition from active to sleep and
sleep to active, a significant amount of energy is consumed. If
mode transition is frequent, then energy overhead is more
significant to turn off and turn on the power gating structure.
As shown in Fig. 1, virtual power node and virtual ground
node have high parasitic capacitance due to due diffusion
capacitances of transistor connected to virtual line, wire
capacitances.
This paper applies a new charge recycling technique to
minimize energy consumption during mode transition from
active to sleep and sleep to active. The charge stored on the
parasitic capacitances of virtual power node (VP) and virtual
ground node (VG) is recycled during mode transition.
The remainder of the paper is organized as follows. The
conventional MTCMOS and virtual node voltages are
described in section II, charge recycling technique during
sleep to active and active to active and parasitic capacitance
calculation in section III, simulation results in section IV and
conclusion in section V.
II. CONVENTIONAL MTCMOS
The conventional MTCMOS as shown in Fig. 1, consist of
two blocks where 1st block is power gated by an NMOS sleep
transistor creating virtual ground node (VG) between the
block and sleep transistor , and the second block is power
gated by the PMOS sleep transistor creating virtual power
node (VP) between the sleep transistor and the block.
A. Virtual Ground and virtual power voltages
In active mode, sleep transistors NMOS and PMOS are turn
on (linear region). During active mode, voltage at virtual
ground node (VG) is zero and at virtual power node (VP) is Vdd
[2]. In sleep mode, both NMOS and PMOS are in cut-off.
Then the virtual ground node (VG) and virtual power node
(VP) will be charged up to steady state value of high voltage (≈
1.4 V) and low voltage (≈ 0V) for the supply of 1.8 V as
shown in fig. 2. Large portion of the total energy drawn from
the power supply is stored in the parasitic capacitance (shown
in fig.1 as lumped capacitance) associated to virtual nodes.
The remaining portion of energy is dissipated in the parasitic
impedances of low Vt circuitry i.e. a full adder in fig. 1. In
order to calculate total dynamic energy, i.e. energy consumed
Carry Ripple Adder based on Charge Recycling
for Lower Energy MTCMOS
Arvind Kumar, Member, IEEE , Sanjeev Rai, Sarad Shrestha, ECED, MNNIT,Allahabad
L
CMOS Full
Adder CMOS Full
Adder
Carry
Virtual
Gnd (VG)
Virtual
Vdd (Vp)
Vdd
Vdd
Fig. 1. Power gating structure using NMOS and PMOS sleep
transistors. High Vt transistor is represented with thick line in channel
region
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
2
during sleep to active and active to sleep mode transition, we
assumed that sleep period is long enough to charge the virtual
ground node (VG) to VDD and virtual power node (VP) to zero.
Let CG-virtual and CP-virtual represents the total parasitic
capacitances at Virtual ground node (VG) and virtual power
node (VP) respectively. Then energy consumed during sleep to
active mode transition is as follows:
(1)
Similarly during active to sleep mode, we assumed virtual
power node (VP) is at value of VDD and virtual ground node
(VG) at zero. For active to sleep mode transition, energy
consumed is as follows:
(2)
The total energy consumed during one cycle of active to
sleep and sleep to active is follows:
. (3)
III. CHARGE RECYCLING MTCMOS TECHNIQUE
The charge recycling technique includes the charge
recycling of charges stored at virtual ground node (VG) and
virtual power node (VP) is recycled through a transmission
gate [3, 4] shown Fig. 3.
A. Charge recycling during sleep to active mode transition
As mentioned in section II, during sleep mode, virtual
ground node (VG) will be charged to almost VDD and virtual
power node to almost zero. Before turning on sleep transistor
to make in active state, the transmission gate is turned on for a
short period [3]. This allows for charge sharing between
virtual ground node (VG) and virtual power node (VP) until the
parasitic capacitance on the nodes share the common voltage
(Vf) [5] as shown in fig. 5. Here we assume that parasitic
capacitance on the virtual nodes are almost equal. After the
complete charge sharing i.e. having equal voltages on the
virtual nodes, the transmission gate is switched off and now
the sleep transistors are turned on for sleep to active state. The
total energy drawn from the power supply to charge the
parasitic capacitance at virtual power node (VP) during mode
transition from sleep to active is as follows:
(4)
B. Charge recycling during active to sleep mode transition
During active state, the virtual power node (VP) is at value
of VDD and virtual ground node (VG) is at almost zero value.
Before turning off the sleep transistor while going from active
to sleep state, the transmission gate is switched on shortly for
charge recycling between virtual ground node (VG) and virtual
power node (VP). The charge sharing occurs between two
nodes until the common voltage (Vf) on both nodes and
transmission gate is switched off. Now the sleep transistors are
turned off. The charge recycling process is shown in Fig. 7.
The parasitic capacitance at virtual ground node (VG) draws
the energy Eactive-sleep from supply during active to sleep
transition which is as follows:
(5)
Hence total energy drawn from the power supply during
Virtual Vdd
(Vp)CMOS Full
Adder
CMOS Full
AdderCarry
Virtual Gnd
(VG)
Sleep
Sleep
VCR
VCRVDD
VDD
Fig. 3. Charge recycling MTCMOS circuit with transmission gate between
virtual ground node (VG) and virtual power node (VP)
Fig. 2. Virtual Ground Voltage VG =1.3V and Virtual supply voltage VP =
0V during sleep mode
Sleep
Charge
Recycling
Active
Sleep
Fig. 4. Charge recycling Signal (VCR)
Esleep-active = CP-virtual V2
DD
Eactive-sleep = CG-virtual V2DD
ETOTAL = CG-virtual V2
DD + CP-virtual V2
DD
= V2DD (CG-virtual + CP-virtual)
Esleep-active = VDD (VDD-Vf) CP-virtual
Eactive-sleep = VDD (VDD-Vf) CG-virtual
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
3
jkjkjkjk
lklklklkl
active to sleep and sleep to active mode transition is given as
follows:
(6)
C. Capacitance calculation
For the capacitance calculation on the virtual node, all the
parasitic capacitances of transistors connected to virtual node
are summed up. MOSFET intrinsic capacitance, fig. 6, mainly
includes structural capacitance, channel capacitance, diffusion
capacitances [6] [7].
Structural capacitance includes the overlap capacitances
(gate to source overlap capacitance (CGSO) and gate to drain
overlap capacitance (CGDO)). Channel capacitance depends on
the operating regions. For digital circuit, we can take average
value over three operating regions. Likewise, diffusion
capacitance includes source to body (CSB) and drain to body
(CDB) which is calculated by following equation:
(7)
where CJ is zero bias bulk capacitance per square meter and
CJSW zero bias perimeter capacitance per meter.
IV. SIMULATION RESULT
We used the cadence-spectre simulator and the technology
180nm(Vtnlow=|Vtplow| =0.156 V and Vtnhigh=|Vtphigh|=0.386V )
for the simulation of the circuit. Two bit static carry-ripple
adders (using 28-transistors) are designed. The carry-ripple
adder with conventional MTCMOS shown in fig.1 and the
adder with charge recycling technique shown in fig.2 are
compared in terms of dynamic energy during mode transitions.
All possible input vectors are given to the circuit and the
almost same energy overheads are found out. By using charge
recycling, the energy overhead during mode tranistion of
charge recycling ripple adder is 75% lower as compared to the
adder with conventional MTCMOS. Fig. 8 shows the total
energy overheads for a full cycle of mode transition i.e. from
active to sleep and sleep to active.
Fig.7. Charge recycling waveform of the two bit carry ripple adder during
mode transition from active mode to sleep mode
CDIFF = CBP + CSW
= CJ Area + CJSW Perimeter
TABLE I.
Process parameter of TSMC 180 nm process for VDD =1.8 V
Parameters NMOS PMOS
CGDO (fF/µm) 0.37 0.33 CJ (fF/µm2) 0.77 0.85
CJSW (fF/µm) 0.18 0.33
Fig. 5. Charge recycling waveform of the two bit carry ripple adder during
mode transition from sleep mode to active mode
CGD
CGS
CGB
CDB
CSB
Fig. 6. Capacitances of MOS transistor
ETotal(CR) = Esleep-active + Eactive-sleep
= VDD (VDD-Vf) CP-virtual + VDD (VDD-Vf) CG-virtual
= VDD (VDD-Vf) (CP-virtual + CG-virtual)
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
4
V. CONCLUSION
In this paper, a charge recycling MTCMOS technique for
two bit ripple adder is proposed to reduce the dynamic energy
overhead during mode transition from sleep to active and
active to sleep transition. Transmission gate is used for charge
recycling between virtual rails. We have shown the reduction
of 75% of energy overhead during mode transition i.e. active
to sleep and sleep to active, in charge recycling technique with
compare to conventional one. Here, in the standby mode, the
circuit lost the data. So in future, we can propose the data
retentive circuit in this circuit.
REFERENCES
[1] S.Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, “1 V power supply high-speed digital circuit tehnology with
multi-threshold-voltage CMOS, “IEEE J. Solid-State Circuits, vol. 30,
no.8, pp.847-854, Aug.1995. [2] A. Abdollahi, F. Fallah, M. Pedram “ A Robust Power Gating Structure
and Power Mode Transition Strategy for MTCMOS Design”, IEEE
Trans. Very Large Scale Intergrated Sysytem, vol. 15, Jan. 2007. [3] E. Pakbaznia, F. Fallah, and M. Pedram, “Charge recycling in
MTCMOS circuits: concept and analysis,” in Proc.ACM/IEEE Des.
Autom. Conf., 2006,pp 97-102. [4] Z. Liu and V. Kursun, “ Charge Recycling between Virtual Power and
Ground Lines for Low Energy MTCMOS,” Proceedings of the
IEEE/ACM International Symposium on Quality Electronic Design. Pp.
239-244, March 2007.
[5] J. P. Uyemura, Introduction to VLSI CIRCUITS ANS SYSTEMS,
WIELWY Student edtition. [6] S. Mo. Kang, Y. Leblebici, CMOS Digital Intergrated Circuits-
Analysis and Design, 3rd ed. PEARSON Education.
[7] N. H.E . Weste, D. Harris, A. Benerjee, CMOS VLSI Design- A circuit and system perspective, 3rd ed. PEARSON Education.
[8] J. M. Rabey, A. P. Chandrakasan, B. Nicolic, Digital Intergrated Circuit,
A Design Perspective, 2nd ed.
Fig.8. The energy overheads of the MTCMOS 2-bit ripple adders
0
5
10
15
20
25
30
35
40
45
50
Conventional Gated -
MTCMOS
Charge Recycling
MTCMOS
En
argy(f
J)
CONFERENCE ON ―SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)‖ MARCH 26-27 2011
VLP0105-1
Forthcoming CMOS Technology in Nanoscale Era Shashank Mishra
#1, Kshitij Bhargava
#2, Rohit Tripathi
#3 , Piyush Jain
#4
Electronics and Communication Engineering (Microelectronics and Embedded Technology) Department
Jaypee Institute of Information Technology, Noida-201307, U.P., India
[email protected] [email protected]
Abstract— CMOS technology has reached to the level of sub-
45nm range. It is expected that the nano-CMOS technology will
govern the IC manufacturing at least for another couple of
decades. Though there are many challenges ahead, further down-
sizing the device to a few nanometers is still on the schedule of
International Technology Roadmap for Semiconductors (ITRS).
Several technological options for manufacturing nano-CMOS
microchips has been available or will be available very soon. This
paper reviews the challenges of nano-CMOS downsizing and will
focus on the recent developments on the key technologies for the
nano-CMOS in the years to come.
I. INTRODUCTION
Among numerous great inventions made in the 20th
century, electronics is the most important one. Almost every
thing related to human activities, such as power generation,
transportation, entertainment, medical care, is now provided
and controlled by electronics. Semiconductor is strategically
an important technological area for all nations. The electronic
circuit development has been accomplished with the
downscaling of component size since the replacement of
vacuum tubes with transistors 40 years ago. The circuit
characteristics have benefited a lot from the downsizing. We
are now able to integrate millions of CMOS transistors at the
nanoscale level on the silicon chip with only few centimetres
square of area occupied. Right now the operating speed of the
recently developed microprocessor has already reached upto 5
GHz and is expected to increase further. Although recent
trends indicate that the increase in the clock frequency may
gradually get saturated. The CMOS integrated circuits as well
as their core device technology are expected to evolve further
for at least a couple of decades and their importance will be
further increased in future intelligent systems. CMOS device
dimensions have been reduced to a millionth at the production
level in the past 100 years. Hundred years ago, no one could
have ever imagined that the mankind of our time will be able
to make any such electronic components which will consist of
billions of electronic components with dimension smaller than
the bacteria size and those circuits will fulfil the different
needs of the society. Future scaling trends have been predicted
by the International Technology Roadmap for Semiconductors
(ITRS) for 30 years up to 2040, when the physical gate length
is expected to be 1 nm (as shown in figure 1), [2]. It is
believed that the CMOS device downsizing will approach the
physical limit.
Figure 1: Feature size versus time in silicon ICs.
II. CHALLENGES IN SCALING
Device downsizing from 10 μm to the sub-45-nm range
presented a lot of benefits in terms of speed, power, and cost.
But apart from the improvements, reported above, one of the
major problems for performance degradation in the ultra-large
scale circuits is the interconnect delay due to the increase in
the resistance and the capacitance values of narrow and dense
interconnection metal lines (parasitic). Furthermore, the
performance improvement is also questionable for the ultra-
small MOSFET itself. According to the scaling theory, the
drain current per unit gate width should remain constant.
However, a significant reduction of the drain current value per
unit gate width for sub-45nm gate length MOSFETs was
reported recently (as in Fig. 2), [2]. This phenomenon is due
to the non-optimized MOSFET structure and process. On the
other hand, the small drain current (of several tens of micro-
Ampere per micrometer) at the scaled supply voltage becomes
a major concern. Besides, the fringing capacitance of the gate
electrode, and the inversion layer capacitance will also
degrade the performance of the ultra-small MOSFETs (as in
Fig.3), [2]. It is still doubtful at this moment that such a small
MOSFET can be used for high-speed devices. Hence, without
CONFERENCE ON ―SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)‖ MARCH 26-27 2011
VLP0105-2
any new technology support, further downscaling may only
result in performance degradation.
Figure 2: Significant reductions of the unit drain currents
Figure 3: Challenging issues further downsizing of MOS transistor
III. IMPROVEMENTS IN CMOS
There have been proposals to try and change the structure
of the transistor itself. Here we are discussing the two most
prominent structural changes: Silicon on Insulator (SOI) and
Double Gate CMOS (DGCMOS). The basic concept of
Silicon on Insulator is fairly simple. Rather than fabricating a
transistor whose body is connected to the substrate (Fig. 4.a),
which is the normal method, an insulating oxide is first
deposited on the substrate and then the transistor is fabricated
on top of that (Fig. 4.b). By doing this the body can be made
electrically isolated from its surroundings. This means that the
bulk to source voltage Vbs is now floating. This design
provides a number of performance benefits. Vbs is now greater
than or equal to zero, which lowers the threshold voltage, Vt,
providing a performance increase. Also, there is no junction
area capacitance. Finally, stacked circuits do not suffer from
the reverse body effect. The new structure also lends itself to
some new uses, such as using the insulating layer for a high
resistance element.
Figure 4.a: Bulk CMOS Gate
Figure 4.b: SOI Gate
There are of course some disadvantages associated with the
new structure as well. While the floating Vbs provides many
benefits, its variability can also become problematic. The
value of Vbs is a function of the present current level in the
gate as well as the history of previous states which the gate
has been in. This means that the threshold of a gate may vary
significantly throughout its operation. Also, if Vbs climbs too
high it can cause pass-gate leakage. There have been
techniques developed to address some of these issues. To test
this technology, IBM redesigned some of their PowerPC line
chips using SOI. They were able to demonstrate a 22-33%
performance increase over the bulk CMOS version of these
chips. They also found that, while implementing SOI
structures it requires a proper understanding of the unique
problems that this technology gets associated with, it was
possible to redesign existing technologies in a reasonable
amount of time. The second structure is more experimental,
but promises great benefits in the future. That structure is the
Double-Gate CMOS (DGCMOS). The basic idea of this
structure is to add an extra gate (or more) to increase coupling
between the gate and the channel. Some have called this the
―ideal structure for scalability‖. Most of the people agree that
it is the design of the future, but there are some difficulties to
overcome before them. The difficulties arise in how to
implement the DGCMOS structure. Using traditional
fabrication processes a second gate could be added below the
body. However, the alignment issues of such a gate are
troublesome. The proposed solution is known as the FinFET.
This structure builds the drain, source, and gate up vertically.
(as in Fig. 5).
CONFERENCE ON ―SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)‖ MARCH 26-27 2011
VLP0105-3
Figure 5: FinFET structure
This may solve the alignment issue, but there is one other
challenge to overcome. In order to control SCE, the body
thickness must be ¼ of the gate length. This is a daunting
challenge because the gate length is usually the smallest
dimension that can be fabricated. There are some technologies
that may address this, but more work needs to be done in this
area.
The most popular idea is to use carbon nanotubes (CNTs) as
transistors (a configuration example is shown in Fig. 6). This
concept is very appealing because it is still a transistor and
could make use of all the architectural knowledge developed
for CMOS. Carbon nanotubes however do have a long way to
go before they can start replacing the silicon based MOS
transistors. First of all, nanotube transistors developed till date
has shown very poor performance characteristics. Many of the
problems they are exhibiting are similar to the challenges
CMOS is currently facing, such as high off-state leakage and
source-to-drain tunneling. Also, despite the hopes for
chemical self assembly some day, it is still very difficult to
produce nanotube transistors.
Figure 6: Basic carbon nanotube transistor
IV. CONCLUSIONS
Silicon MOSFETs have been the smallest electronic device
for several decades. The gate length used for high
performance logic unit is 45 nm in production and 5 nm in
research. Note that the 5-nm gate length is the distance of 18
atoms and 0.8-nm oxide thickness is two atomic layers only.
Si technology is no doubt the most successful nano-devices.
We do not see that there is any realistic replacement for
silicon devices. Even the Si devices reach the downsizing
limit no matter 10 nm, 5 nm, or 1 nm, other emerging devices
such as molecular transistors will also reach their limit of
downsizing in similar dimensions. It is a critical period for
moving from 45-nm to 10-nm technology within this decade.
Most of the materials and the manufacturing processes used in
the deep-submicron era are now pushing to their physical
limits. New materials and technologies are required for further
down-scaling the device to 10-nm technology and below.
Immersion lithography for ultra fine patterning, strained
channels, nickel salicide, high-k gate dielectric, low-k
interlayer for interconnect, plasma doping, flash and laser
annealing for source and drain doping, elevated source and
drain and three-dimensional MOSFETs for controlling short-
channel effects, would help to overcome the materials and
technological constraints and improve the device performance
in the ultra-small scale. The final remark is a non-technical
issue. We anticipate that this issue will be one of the most
important issues for nano-CMOS technology development in
the next 15 years. We are aware that most of the new mega-
fabs being planned or under construction are in the East and
Southeast Asia, and particularly the Mainland China. In 10 or
15-year’s time, the distribution of semiconductor
manufacturing sites in Asia (including Japan) will be quite
substantial. Currently, Korea and Taiwan are in the first place
for semiconductor memory manufacturing and semiconductor
foundry, respectively. They also lead the technology
development in Asia region. Mainland China seems to be
another super power for semiconductor manufacturing. The
share of China semiconductor manufacturing will keep fast
growing with the support of booming IC design houses,
constructing new fabs with remarkable increase in industrial
investment, and will be the most important huge and rapidly
expending market. As many other industries and other sectors
of electronic products, Mainland China will eventually
become ―the factory of the world‖ in semiconductor
manufacturing in 15 years or longer and will have great
impact on the future nano-CMOS technology.
REFERENCES
[1] G. E. Moore, ―Cramming more components onto integrated circuits‖, [Electronics, vol. 38, no. 8, 1965.
[2] International Technology Roadmap for Semiconductors, 2003 Edition, Semiconductor Industry Association (SIA), Austin, Texas: SEMATECH, USA.
[3] H. Iwai, Future semiconductor manufacturing-challenges and opportunities, IEDM Tech. Dig., 2004, pp. 1-16.
[4] H. Iwai, CMOS downsizing toward sub-100 nm, Solid–State Electron., vol. 48, 2003, pp. [497-503].
CONFERENCE ON ―SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)‖ MARCH 26-27 2011
VLP0105-4
[5] Zhao W, Cao Y. New generation of Predictive Technology Model for sub-45nmearly design exploration IEEE Trans. Electron Devices 2006; 11:2816-23.
[6] T. Morimoto, H. S. Momose, T. Iinuma, et al, A NiSi salicide technology for advanced logic devices, IEDM Tech. Dig., 1991,
653-656.
[7] T. Iizima, A. Nishiyama, Y. Ushiku, et al, A novel selective
Ni3Si contact plug technique for deep-submicron ULSIs, Symp. VLSI
Technology, 1992, pp.70-71. [8] R. Tsuchiya, M. Horiuchi, S. Kimura, et al, Silicon on thin BOX: A new
paradigm of the CMOSFET for low-power and high-performance application featuring wide-range back-bias control, IEDM Tech. Dig., 2004, pp.631-634.
[9] T. Ghani, et al., "Scaling challenges and device design requirements for high performance sub-50 nm gate length planar CMOS transistors," Symp. VLSl Technology, 2000, pp. 174-175.
[10] B. Yu, ―Scaling towards 35 nm gate length CMOS,‖ in Proc. VLSI Symp., Kyoto, AMD, June 12–14, 2001, pp. 9–10.
[11] D. Connelly, C. Faulkner, and D.E. Group, ―Performance advantage of Schottky source/drain in ultrathin-body silicon-on-insulator and dual gate CMOS,‖ IEEE Trans. Electron Devices, vol. 50, no. 5, pp. 1340–1345, May 2003.
[12] J. Knickerbocker et al., IEEE Custom Integrated Circuits Conference (CICC) p. 659 (2005).
[13] G. Anelli, Design and characterization of radiation tolerant integrated circuits in deep submicron CMOS technologies for the LHC experiments, Ph.D. Thesis, Institute National Poly-technique de Grenoble, France, December 2000, also available at http://www.cern.ch/ RD49.
[14] D. Frank et al., ―CMOS device and circuit limits,‖ Proc. IEEE, vol. 89, Mar. 2001.
[15] Davari, R. H. Dennard, and G. G. Shahidi, ―CMOS scaling, the next ten years,‖ Proc. IEEE, vol. 83, p. 595, 1995.
C. Mead, ―Scaling of MOS technology to submicrometer feature sizes,‖ J. VLSI Signal Processing, pp. 9–25, 1994.
[16] Y. Taur and E. Nowak, ―CMOS devices below 0.1 m: How high will performance go?‖ in Proc. Int. Electron Devices Meeting, 1997, p. 215.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0106-1
Abstract—A very simple circuit of the MOSFET amplifier to
realize both very high positive as well as negative resistances at
its input and output terminals is presented. The mathematical
model is the representation of any device or system that
predicts response of the device or system under different types
of excitations. The floating admittance matrix (FAM) approach
is one of the neat methods of mathematical modeling of
electronic devices and its uses in circuits. The zero sum
property of the floating admittance matrix provides a check to
the worker to proceed further or reobserve the first equation
itself. All transfer functions are represented as cofactors of the
floating admittance matrix of the circuit.
Keywords: Amplifier, Common Source FET, Floating
Admittance Matrix, Zero Sum property, Cofactors, Plots
INTRODUCTION
The input resistance of a MOSFET is supposed to be very
high, yet a single-stage MOSFET amplifier is sometimes not
suitable for certain applications, especially, when high gain
along with change in the resistance levels from positive to
negative of very high to very low, is required. This type of
requirement is solved by either cascading or cascoding or
combination of the both in different sections of the amplifier
stages. Fig. 1 shows two stages of the MOSFET amplifier
with RF connected between output of the second stage to the
input of the first stage. It reveals that with proper adjustment
of the feedback resistance, RF, one may realize extremely
value of input and output resistance, both positive and
negative. The common source amplifier is the most versatile
MOSFETs amplifier configuration. The common-source
(CS) amplifier may be viewed as a transconductance
amplifier or as a voltage amplifier. As a transconductance
amplifier, the input voltage is seen to be modulating the
current going to the load. As a voltage amplifier, input
voltage modulates the amount of current flowing through the
MOSFET, changing the voltage across the output resistance
accordingly. The input resistance of a conventional emitter
follower, cathode follower or source follower is limited by
finite value of the passive emitter/carthode/source resistance
as well as the input bias resistance. In fact, the input bias
resistance shorts the input resistance of the amplifier and
hence the effective input resistance is limited to the
maximum value of the input bias resistance. A number of
papers are available in the literature which describes
separate circuits for realization of positive and negative
resistances. The simple single set-up here realizes both
positive and negative input and output resistance and saves
large number of active and passive components. The
importance of the negative resistance is very much felt in the
design of oscillators, multivibrators, filters, and synthesis of
driving-point functions. An attractive method for controlling
of the line loss in the telephone lines to any extend can be
achieved by introducing resistance; which covers very large
range of values, in the impedance boosting-network. The
realization of very high positive as well as negative
resistances of any amplifier is all the more important for
instrumentation.
This paper aims to develop the mathematical model of
common source amplifier in the form of floating admittance
matrix. The floating admittance matrix of the MOSFET is
taken to advantage for derivation of its voltage gain, input
resistance and output resistance in its common source
configuration.
MATHEMATICAL MODEL OF FET
The two stage common source MOSFET amplifier can be
represented as in Fig. 1 with a feedback through RF from
output of the second stage to the input of the first stage.
Fig.1 Two-stage Common Source Amplifier
The a.c. equivalent circuit of Fig.1 is shown in Fig. 2. The
matrix representation of MOSFET as two-port network (four
terminals) is written as
On Demand Simulation of Input and Output
Resistances of MOSFET Amplifier Mrs. Meena Singh
Lecturer, Deptt. of ECE, University
Polytechnic, B.I.T. Mesra, Ranchi
+91-9279265054
Arun Kumar Singh Deptt. of ECE, Madan Mohan
Malaviya Engg. College, Gorakhpur
+91-9312801316
Dr. B. P. Singh
Professor, Deptt. of ECE &EEE,
Mody Institute of Technology &
Science, Lakshmangarh
([email protected])+91-9468688102
+
VD
D
1 2
3
R21 R12
RF
RL
rs
RS2 RS2
C
RD2 R12
vi
RD1
C
VDD
C R22
C
C C
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0106-2
s
d
g
ii
ii
ii
3
2
1 =
3
2
1
gggggg
gggg
g0g
321
dmgdmg
dmdm
gg
s
d
g
vv
vv
vv
3
2
1 (6.1) (5.1)
(1)
Fig.2 ac circuit of two-stage Common Source Amplifier
The admittance matrix of the MOSFET as a device is
expressed in [1]-[3]. Its coefficient matrix is expressed as
Y =
3
2
1
gggggg
gggg
g0g
321
dmgdmg
dmdm
gg (1)
The gate to source resistance of MOSFET is assumed to be
very large (ideally infinity) as it is always reverse biased,
hence gg = 0S. Then the above coefficient matrix of the
MOSFET of (1) reduces to (2).
dmdm
dmdm
gggg
gggg
000
(2)
Thus the floating admittance matrix of two MOSFETs
(device1 and device2) connected in Fig.2 can be written as
1deviceY =
3
2
1
gggg
gggg
000
321
1d1m1d1m
1d1m1d1m
(3)
2device
Y =
3
4
2
gggg
gggg
000
342
2d2m2d2m
2d2m2d2m
(4)
Now the composite matrix of two devices (device1 and
device2) is written as
devicesY =
4
3
2
1
gggg0
gggggggg
0gggg
0000
2d2d2m2m
2d2d2m1d1m2m1d1m
1d1m1d1m
(5)
The overall admittance matrixes for Fig.2 is written as
Y =
FGLG2dgLG2dg2mg2mgFG
LG2dg
LG2GG1DG1GG
sg2dg2mg1dg1mg2GG1DG2mg1dg1GGsg1mg
02GG1DG1dg1mg2GG1DG1dg1mg
FG1GGsg0FG1GGsg
(6)
Equation (6) represents the Floating Admittance Matrix [3],
[4], [5] of two stages common source amplifier.
Now from (6) the input impedance of circuit in Fig.2 can be
expressed as [1]-[3]
=
]G)GGggg(gg[(G
)GGg)(GGggg)(GGg(
)GGg)(GGgg(
FGD2m2g1d2m1mF
FL2dGD2m2g1dFG1g
FL2dGD2g1d
(7)
Similarly, its output impedance and voltage gain can be
expressed as [1]- [3]
=
]G)GGggg(gg[(G
)Gg)(GGggg)(GGgg(
)GGg)(GGgg(
FGD2m2g1d2m1mF
F2dGD2m2g1dFGs1g
FG1gGD2g1d
(8)
1313
Y
1343
Y
131Sgn34Sgn43
13VA 11
AV=)GG)(gGGg(g
)GG(gGgg
FLd2GDg2d1
GDd1Fm2m1 (9)
VERIFICATION ON MATLAB
The values of , , and 43
13VA for different values of
source conductance and load conductance ( 0mS, 1mS, and
2mS) have been programmed through MATLAB. The
output of the MATLAB programs have been plotted for ,
, and 4313VA with respect to feedback conductance, GF .
If we assume that the two MOSFETs of Fig. 2 are properly
biased to yield the same values of its internal parameters
( 1dg = 2dg and 1mg = 2mg ), then for plotting on demand
value of simulated input and output resistances, typical
values of external parameters along with its internal
parameters can be given as:
1
2
3
RD1
RG2 RG1
RF
RD2
4
rs
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0106-3
1dg = 2dg = 0.1mS, 1mg = 2mg = 5mS, LG = DG = 1mS,
1GG = 2GG = GG = 0.001mS, 1gg = 2gg = 0.0001mS, FG
= variable (0mS to 0.15mS).
The plots of input and output resistances results into on
demand values or in other words simulated input and output
resistance can have any values, both negative and positive
that is controlled by the feedback conductance connected
between the two stages of the amplifier.
The plot of input resistance as a function of feedback
conductance is shown in Figs.3, 4, and 5 for 0 S, 1 mS and 2
mS of load conductance respectively as per (7).
Following observations are recorded from the plots in Fig. 3,
4 and 5:
Fig.3 Input resistance as a function of feedback conductance for
GL= 0 S
a) For GL = 0S, input resistance is almost constant
(1.148e+06 Ω) from initial values of GF till GF reaches
2.7520e-05 mS, thereafter input resistance began to rise
exponentially (from 1.148e+06 Ω to 4.837e+06 Ω) for
2.7520e-05 mS to 2.7523e-05 mS variation in GF. It is
interesting to note that Ri suddenly jumps down (from
4.837e+06 Ω to -6.828e+07 Ω) for 2.7523e-05 mS to
2.7524e-05 mS variation in GF, again Ri began to increase
suddenly to -4.237e+06 Ω as GF approaches 2.7525e-05 mS,
the curve then starts increasing linearly (from -4.237e+06 Ω
to -1.473e+06 Ω) from GF = 2.7525e-05 mS to GF =
2.7527e-05 mS respectively, and Ri remains constant
thereafter at -1.473e+06 Ω for higher values of GF.
b) For GL= 1 mS, input resistance is almost constant at
3.289e+05 Ω from initial values of GF till GF reaches
0.0004036 mS, thereafter Ri starts increasing linearly (from
3.289e+05 Ω to 4.393e+07 Ω) from GF = 0.0004036 mS to
GF = 0.0004038 mS and suddenly jumps down (to -
7.805e+06 Ω) as GF reaches 0.00040381 mS. Again, Ri
began to rise (from -7.805e+06 Ω to -6.729e+05 Ω) from
GF = 0.00040381 mS to GF = 0.0004039 mS respectively,
and remains constant thereafter at -6.729e+05 for higher
values of GF.
c) For GL= 2 mS, input resistance rises exponentially (from
216.5 Ω to 3331 Ω) from GF = 0.0001 mS to GF = 0.0011
mS respectively, then suddenly it jumps down to Ri= -4418
Ω at GF = 0.0012 mS and again rises exponentially (to -
225.4 Ω) till GF = 0.002 mS and remains constant thereafter
at -225.4 Ω for higher values of GF.
Fig.4 Input resistance as a function of feedback conductance for
GL= 1 mS
Fig.5 Input resistance as a function of feedback conductance for
GL = 2 mS
The plot of output resistance as a function of feedback
conductance (GF) is shown in Figs.6, 7, and 8 for 0 S, 1 mS
and 2 mS of source conductance respectively as per (8).
Following observations are recorded from the plots in Fig. 6,
7 and 8:
a) For gs = 0S, output resistance is almost constant (
1.735e+04 Ω) from initial values of GF till GF reaches
2.752e-05 mS, thereafter output resistance starts rising
exponentially (from 1.735e+04 Ω to 5.452e+04 Ω) for
2.7520e-05 mS to 2.7522e-05 mS variation in GF. It is
interesting to note that Ro suddenly jumps down (from
5.452e+04 Ω to -7.697e+05 Ω) for 2.7522e-05 mS to
2.75242e-05 mS variation in GF, again Ro began to increase
suddenly to -4.776e+05 Ω as GF reaches 2.75262e-05 mS,
then starts increasing exponentially (from -4.776e+05 Ω to -
1.252e+04 Ω) from GF = 2.75262e-05 mS to GF = 2.753e-
05 mS respectively, and then Ro remains constant
thereafter at -1.252e+04 Ω for higher values of GF.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0106-4
Fig.6 Output resistance as a function of feedback conductance for
GS = 0 S
Fig.7 Output resistance as a function of feedback conductance for
GS = 1 mS
b) For gs= 1 mS, output resistance is almost constant at
237.9 Ω from initial values of GF till GF reaches 0.03340
mS, thereafter Ro starts increasing exponentially (from
237.9 Ω to 2829 Ω) from GF = 0.03340 mS to GF = 0.03341
mS and suddenly jumps down (to -7836 Ω) as GF reaches
0.033411 mS. Again, Ro rises (from -7836 Ω to -22.83 Ω)
from GF = 0.033411 mS to GF = 0.0335 mS, and remain
constant thereafter at -22.83 Ω for higher values of GF.
c) For gs= 2 mS, output resistance rises exponentially (from
0.805 Ω to 39.85 Ω) from GF = 0.09 mS to GF = 0.1 mS
respectively, suddenly it jumps down to Ro = -1.028 Ω at GF
= 0.11 mS and remains constant thereafter at -1.028 Ω for
higher values of GF.
The plot of voltage gain as a function of feedback
conductance is shown in Figs.9 and 10 for 0 S, 1 mS and 2
mS of load conductance respectively as per (9).
Plots in the figs. 9 and 10 reveals that voltage gain (AV) is an
inverse function of feedback conductance (GF), further the
voltage gain decreases as the value of source conductance
(gs) increases due to their inverse relationship given by (9).
Fig.8 Output resistance as a function of feedback conductance for
GS = 2 mS
Fig.9 Voltage gain as a function of feedback conductance for GL
= 0 S
Fig.10 Voltage gain as a function of feedback conductance for
GL = 1 mS and 2 mS
CONCLUSION
The plots from Figs. 3 to 8 reveal a region of very sudden
change in the values of input resistance and output resistance
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0106-5
from very high positive values to large negative value, for
very small change of the order of 10-05
in the value of
feedback conductance, GF. This is the zone of very high
variation in input and output resistances, both negative and
positive, which can be used for compensation of resistances
to obtain very high Q-factor in the lossy networks.
REFRENCES
[1] Wai-Kai Chen, On second order cofactors and null return difference in
feedback amplifier theory, International Journal of circuit theory and application, Vol. 6, Issue 3, pp. 305-312, Dec. 2006.
[2] Otso Juntunen , A two port S-parameter data transformation, circuit
theory laboratory report series, CT-35, Helsinki University of technology, Finland, Espoo 1998.
[3] B.P. Singh, Unified Approach to electronics circuit analysis, IJEEE, pp.
276-285, July 1978. [4] B.P. Singh, Active bridge for measurement of admittance parameters of
the transistors, Indian Journal of Pure and Applied Physics, Vol. 15, pp.
783-786, Nov. 1976.
[5] B.P. Singh, A new active bridge for measuring FET parameters, J Phys.
E. Scientific Instrument, Vol. II, pp. 667-670, 1978. [6] Jacob Millman and Christos C. Halkias, Integrated Electronics, Analog
and Digital Circuits and Systems, TATA McGRAW-HILL publication, pp.
471-475, 2004. [7]B.P. Singh, Meena Singh, Sanjay Kumar Roy and S.N. Shukla,
Mathematical Modeling of Electronic Devices and its integration;
Proceedings of National Seminar on Recent Advances on Information Technology, Allied Publishers Pvt. Ltd., Indian School of Mines Dhanbad
University, pp.494-502, Feb. 6-7, 2009
[8]B.P. Singh, Arun Kumar Singh, verification of transfer functions
of BJT obtained by using MATLAB, Proceedings of IEEE National
Symposium on Innovative Development in Electronics Arena, Arya
College of Engineering, pp. 92-96, Dec. 12, 2009.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0107-1
Performance Analysis and Comparison of PFSCL and
MCML
Kirti Gupta , Ranjana Sridhar, Jaya Chaudhary
DTU (formerly Delhi College of Engineering)
ABSTRACT:
CML or current mode logic is a
differential logic style which offers high
noise immunity and high speed of
operation. In this paper we compare the
performance of PFSCL or positive
feedback source coupled logic with
MCML or MOS current mode logic which
are derivatives of CML style. We show
through simulations on Orcad PSPICE
using .18nm technology that PFSCL offers
significant advantages over MCML in
terms of power consumption, area
occupied and propagation delay .
Due to growing market for digital signal
processing and optical communication
applications, commercial interest in high
resolution mixed signal ICs has been
growing. In mixed signal ICs the analog
and the digital blocks are integrated on the
same base and hence the resolution of the
analog block is limited by the dynamic
switching noise produced by the digital
block. Hence CMOS logic style is not
suitable as it is suffers from dynamic
switching noise. Also, for CMOS the
advantage of having zero static power
consumption is lost when it is used at
hundreds of MHz to GHz of frequencies.
Several other logic styles have been
proposed to reduce the dynamic switching
noise in mixed signal ICs such as in [2],[3]
and [4]. The CML style offers advantage
in robustness to switching noise as
compared to CMOS logic style [1]. Also,
at high frequencies (hundreds of MHz to
GHz range) CML style is more power
efficient than CMOS logic[7].This type of
logic was first implemented using bipolar
transistors [5] and extended for application
with MOS transistors. It has less power
consumption than ECL but is slower than
ECL.
MCML is a extension of Current Mode
Logic where MOSFET is used as the
transistor instead of BJT. A constant
current source is used to bias the
differential pair of transistors which
switches the current from one of the pair to
another depending upon the applied input.
The differential operation suppresses the
noise coupled with the signal inputs.
PFSCL is new logic style which introduces
positive feedback into single ended
MCML gates [ 7]. This eliminates the need
for complementary second input signal
while still maintaining the differential
mode of operation.
In the following, the operation of MCML
gates is explained in section II. The
architecture of PFSCL and its operation is
addressed in section III. In section IV,
result of comparison between the
performance of PFSCL and MCML is
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0107-2
presented and the simulation results are presented.
MCML GATES :
To understand the operation and the
unique properties of MCML we consider
the simple case of an inverter and will see
different configurations for its
construction.[8]
Inverters can be implemented using
transistors operating as voltage controlled
switches. The simplest configuration is as
shown in the figure below:
[from ref 9]
When vi is low switch will be open and
vo=vdd since no current flows through
resistance R.When vi is high then switch
will be closed and vo= 0.
We can modify the above configuration by
using a pair of complementary switches
called as PU and PD.
[from ref 9]
PU switch connects the output node to vdd
and the PD switch connects output to the
ground. When vi is low,the PU switch will
be closed and the PD switch open
establishing vo=vdd. Next if vi is raised to
logic high, the PU switch will be open
while the PD switch will close thus
establishing vo=vdd. This circuit constitutes
the basis of the CMOS inverter.
The third type of configuration can be
implemented using a double –throw switch
as shown below :
[fromref 9]
The switch is used to steer the constant
current IEE into one of the two resistors
connected to the positive supply VCC. If
logic high is applied at vi it results in the
switch being connected to Rc1, then a
logic inversion function is realized at v01.
This current steering is the basis for
current mode logic circuits.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0107-3
PRINCIPLE OF OPERATION AND STATIC MODEL OF MCML:
MCML is a dual rail logic circuit which
uses both the applied input and its
complement as an input pair. The
schematic is made up of NMOS source
coupled pair where the transistors work in
the saturation or cutoff. Here we are
considering the resistive load, however
different types of loads can be used such
as active PMOS load. Total current IT is
steered to any of the two branches and is
converted to differential output voltage by
the two resistors RD1 and RD2.M1 and
M2 constitutes a differential pair.
If VGS (M2) is higher than VGS (M1),
then current ID2 exceeds the current
ID1.Therefore, the output voltage Vo2
begins to drop until it reach steady sate
.The output voltage swing Vswing is
defined as voltage difference between Vo1
and Vo2 at steady state.
The differential output voltage Vo is equal
Vo = Vo1 – Vo2 = RD (iD1 – iD2)
The voltage swing is defined as difference
in the output voltage between cutoff and
saturation codition and is given by
Vswing=(Rd)(IT)
The small signal gain Av of a MCML with
matched gm for single ended output is
given by : Av= gmRD ∕ 2
Noise margin is given by:
NM = (Vswing/2)(1 - √2 /AV)
where AV>>1/√2 was assumed.[9].
The delay associated with a SCL gate is
given by:Г = .69 Rd Cout
where Cout is the net parasitic capacitance
at the output.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0107-4
III PFSCL
In PFSCL, the MCML logic style is
modified to include positive feedback from
the drain of M1 vo1 to the gate of M2, the
second transistor of the differential
pair.[10]
STATIC BEHAVIOUR OF PFSCL
GATE:
The bias current Iss is steered through
either M1 or M2 depending on the input
signal vin. The transistors M1 or M2
operate in the cutoff or in the saturation
region depending on vin. The logic high
voltage level is Vdd and the logic low
level is Vdd-IssRd. Hence, the PFSCL has
the same Vswing as in
MCML.
The small-signal circuit around a given
bias point can be represented as in above
figure where the source voltage vx value is
calculated by applying the superposition of
input voltages vin and vout at the gate of
M1 and M2 and observing that the voltage
gain between the gate and the source of
M1 is equal to
and that of M2
is
For ,
calculating Av from the small signal
equivalent circuit of PFSCL gives us:
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0107-5
Av = (gmn Rd/2)/( 1- gmnRd/2)
From this expression we see that very high
value of closed loop gain is achieved for
gmnRd/2 tending to 1.
Piecewise linear approximation of DC
transfer characteristics of PFSCL gates
We see that the factor
gm1gm2/(gm1+gm2) reduces significantly
outside the transition region around VLT
and due to high sensitivity of Av to this
factor, the closed loop gain rapidly reduces
to zero outside the transition region.
Hence, the DC transfer characteristic has a
slope that sharply tends to zero outside the
transition region and can be approximated
by three segment piecewise linear function
with slope =-Av around VLT and zero
slope for other ones.
Due to positive feedback the expression
for the small signal voltage gain Av
changes to
NM = Vswing/2 (1-1/Av)
From the expression for NM we see that
noise margin is lower than half of
VSWING and tends to it for high values of
Av ( ie gmnRd/21).
SUMMARY FROM THE ABOVE SECTIONS ON PFSCL AND MCML
From the expressions of Av , voltage
swing and NM(noise margin) we see that
PFSCL topology offers advantages with
respect to MCML:
1) Keeping all design parameters
constant like (voltage swing,
biasing voltages and noise margin)
PFSCL achieves same gain for
lower value gmn and RD.
2) Less Rd implies an area saving.
3) From the expression for Av , for
MCML we see that for increasing
Av we have to proportionally
increase the gmn or in other words
the width of the transistors M1 or
M2 or value of Rd.
4) As gmn depends directly on Iss,
increase in gmn requires a increase
in gate voltage of the transistors
implementing the constant current
source or increase in (W/L) ,
5) That is, increase in power or area
required.
6) Whereas for PFSCL we have to
satisfy the relation gmn Rd/21
only, this can be easily achieved.
7) The reduction in area of the NMOS
transistors for particular value of
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0107-6
Vswing and Av leads to decrease
in the associated parasitic
capacitances.
a. This gives PFSCL a speed
advantage over MCML
circuits.
8) This increase in speed can be
utilised for certain applications.
It can also be traded off for a
corresponding decrease in
power supply voltage which is
required in low power design.
IV COMPARISON OF MCML AND PFSCL LOGIC GATES
PERFORMANCE
In this section we present results of
simulation carried out on PFSCL and
MCML gates. The simulations were
carried out on Orcad PSPICE using 180nm
BSIMv3 MOS model.
The values of circuit parameters voltage
gain, gate bias voltage and voltage swing
were taken within the range used in
practical applications. pMOS loads were
used in PFSCL and MCML.
For simulation purpose, the voltage swing
was taken to be 400mV which is within
the practical range of 350mV-650mV.The
value of the voltage gain Av is generally
between 2-10. Simulations have been
performed using Av=2 and Av=6.The
Cload value is taken as .1pF.All the results
have been presented for input signal
frequency 500Mhz, with input swing =1.4
to 1.8V.
Area required vs Iss for given Av=2 and Vswing=0.4V
0.00E+001.00E-05
2.00E-053.00E-054.00E-05
5.00E-056.00E-05
0.00E+
00
1.00E-
04
2.00E-
04
3.00E-
04
4.00E-
04
5.00E-
04
Iss,bias current
Are
a r
eq
uir
ed
W1+W2+W3 MCML
W1+W2+W3 PFSCL
This graph shows that as the bias current
value increases, the area occupied by
MCML increases at a faster rate than area
occupied by PFSCL. The advantage in
area also leads to decrease in associated
parasitic capacitance which in turn causes
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0107-7
the PFSCL gate to be faster than a MCML gate.
t_delay PFSCL vs MCML
0.00E+00
5.00E-10
1.00E-09
1.50E-09
2.00E-09
2.50E-09
0.00E+00 2.00E-05 4.00E-05 6.00E-05 8.00E-05 1.00E-04
Iss
t_d
ela
y t_d pfscl
t_d MCML
This graph shows the advantage of PFSCL gate vs MCML in terms of speed of operation.
This enables the extension of CML architecture into the GHz frequency range.
(For the values of Av=6,Vswing=0.4V and Cload=0.1pF)
Monte Carlo Simulations were also carried
out on PFSCL vs MCML gate to
determine the robustness of the logic style
to process variations(eg: tox ) and
variations in Vth(the threshold voltage of
the MOS).
From the simulation result it was found
that PFSCL was more robust and its
robustness increases as the bias current
increases.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0107-8
REFERENCES :
1) D. Allstot, S. Chee, S. Kiaei, and
M. Shristawa, “Folded source-
coupled Logic vs. CMOS static
logic for low-noise mixed-signal
ICs,”IEEE Trans. Circuits Syst. I,
vol. 40, pp. 553–563, Sept. 1993.
2) S. Kiaei, S. Chee, and D. Allstot,
“CMOS source-coupled logic for
mixed-mode VLSI,” in Proc. Int.
Symp. Circuits Systems, 1990,
pp.1608–1611.
3) J. Kundan and S. Hasan,
“Enhanced folded source-
coupled logic techniquefor low-
voltage mixed-signal integrated
circuits,” IEEE Trans.Circuits Syst.
II, vol. 47, pp. 810–817, Aug.
2000.
4) J.Kundan and S. Hasan, “Current
mode BiCMOS folded source-
coupled logic circuits,” in Proc.
ISCAS, June 1997, pp. 1880–
1883.
5) ] P. Gray, P. Hurst, S. Lewis, and
R. Meyer, Analysis and design of
analog integrated circuits, 4th
ed. New York: John Wiley &
Sons, 2000.
6) Design of nanometer MOS
Current Mode Logic (MCML):
basic concepts and
perspectives(lecture), Massimo
Alioto,2007
7) Modeling and Evaluation of
Positive-Feedback Source-
Coupled Logic, M. Alioto,
Member, IEEE, L. Pancioni, S.
Rocchi, Member, IEEE, and V.
Vignoli, Member, IEEE, IEEE
Tansactions on Circuits and
Systems—I: Regular Papers vol.
51, NO. 12, December 2004
8) A. Sedra and K. Smith,
Microelectronic Circuits,Oxfords
9) M. Alioto and G. Palumbo,
“Design strategies for source
coupled logic gates,” IEEE Trans.
Circuits Syst. I, vol. 50, pp. 640–
654, May 2003.
10) Modeling and Evaluation of
Positive-Feedback, Source-
Coupled Logic, M. Alioto,
Member, IEEE, L. Pancioni, S.
Rocchi, Member, IEEE, and
V. Vignoli, Member, IEEE, IEEE
Transactions on Circuits and
Systems—I: Vol. 51, No. 12, Dec
2004
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0107-9
Comparative Study of Fast Adders using VHDL and FPGA
Nishi Chandra Rajani Bisht
ET Deptt.,H.B.T.I.,Kanpur Associate Professor,ET Deptt.,H.B.T.I,Kanpur
Abstract: Adders are one of the most widely used components in integrated circuits and they are most commonly used in various electronic applications. The major challenge for VLSI designer is to reduce area of chip and the next phase is to increase the speed of operation to achieve fast operations.
Therefore, various adders such as the ripple adder, carry-look-ahead adder, carry select adder etc. are compared and VHDL is used in their comparison. Their comparative study included the use Xilinx 9.2i as the synthesis tool, Xilinx ISE Simulator as the simulation tool and FPGA Spartan-II kit for the implementation of these adders.In this comparison study, area and delay report is generated for these adders and the VHDL codes can be as well implemented on the FPGA Spartan-II kit.
Introduction
One of the most widely used components in integrated circuits are adders, so designing efficient adders has been the goal of research in VLSI design. Addition is a crucial arithmetic function for most digital systems. Various adder structures can be used to execute addition such as serial and parallel structures. They are used not only for addition, but also for other operations such as subtraction, multiplication, division, and address computation .Adders are one of the most widely used components in integrated circuits and they are most commonly used in various electronic applications e.g. Digital signal processing in which adders are used to perform various algorithms like FIR, IIR etc[1]. In past, the major challenge for VLSI designer is to reduce area of chip by using efficient optimization techniques. Apart from aiding a designer in selecting an adder with favorable characteristics, aim is providing insight into design tradeoffs that can save power and enhance performance. The adders studied include linear time ripple carry and manchester carry chain adders, carry skip and carry select adders, carry
lookahead adder and its variations, and carry-save adders. Several researchers had worked on the performance analysis of adders and other researchers on the performance analysis of multipliers. Therefore, lot of research is going on to reduce power consumption. Therefore, there are three performance parameters on which a VLSI designer has to optimize their design i.e. Area, Speed and Power[2]. It is very difficult to achieve all constraints for particular design, therefore depending on demand or application some compromise between constraints has to be made. Hence, the VHDL codes have been formulated for these fast adders and to get area and delay report, Xilinx 9.2i is used as the synthesis tool. In addition to this, Xilinx ISE Simulator is used for simulation and FPGA Spartan –II kit is used for implementation. Fast Parallel Adders Ripple Carry Adder (RCA) It is possible to create a logical circuit using multiple full adders to add N-bit numbers. Each full adder inputs a Cin, which is the Cout of the previous adder. This kind of adder is a ripple carry adder, since each carry bit "ripples" to the next full adder. Ripple carry adder can be designed by cascading full adder in series i.e. carry from previous full adder is connected as input carry for the next stage. Full adder is a basic building block of Ripple carry adder. The major limitation of Ripple carry adder is that as the bit length goes on increasing, delay also increases. Therefore, Ripple carry adder is not suitable if large number bits are to be added. The two Boolean functions for the sum and carry are: SUM = Ai ⊕ Βi ⊕ Ci (i) Cout = Ci+1 = Ai · Bi + (Ai ⊕ Bi) · Ci (ii)
Fig 1. Ripple carry adder Condition Carry Adder (CCA) This adder computes sum and carry depending upon status of previous carry i.e. 1. If ci = 0 then Sout = ai xor bi & ci+1 = ai and bi (iii) 2. If ci = 1 then Sout = ai xnor bi & ci+1=ai or bi (iv) The adder does not consider the case of computing sum and carry directly by using full adder.
Fig 2. Condition carry adder Carry Lookahead Adder (CLA) In the lookahead carry algorithm ,carry for the next stages is calculated in advance based on input signals.As a result this algorithm speed up the operation to perform addition. If ‘‘X’’ and ‘‘Y’’ are two inputs, “ci” is initial carry, “sout” and “cout” are output sum and carry respectively, then Boolean expression for calculating next carry and addition is[3]: Pi = xi xor yi --- Carry Propagation (v) Gi = xi and yi --- Carry Generate (vi) Ci+1 = Gi or (Pi and Ci) --Next Carry (vii)
Fig 3. Carry lookahead adder Given the two Boolean functions for the sum and carry as follows[ref1]: SUM = Ai ⊕ Βi ⊕ Ci (viii) Cout = Ci+1 = Ai · Bi + (Ai ⊕ Bi) · Ci (ix) Manchester Carry Chain Adder Manchester adder is also a type of Carry look-ahead adder.In the case of manchester adder ,there is a slight modification in calculating next carry to be propagated i.e. instead of using Boolean expression Ci+1 = Gi + Ci.Pi to calculate next carry, Manchester carry adder uses expression: Ci+1=Gi+Ci.ti (x) ti = Xi + Yi (xi) Thus, we can say that carry recurrence can be written in terms of ti instead of Pi, which leads to slightly faster adder because in binary addition, ti is easier to produce than Pi (OR instead of XOR). Conventional Carry Skip Adder (CSKA) Carry has to propagate through all N stages in case of N-bit Ripple carry adder, which results in large delay in performing binary addition. On the other hand,it is possible to skip carry over group of n-bits in case of Carry Skip Adder. This results in less delay as compared to ripple carry adder. The logic used for the carry skip is shown in the figure below and also obvious from the equations.
P(0:3)<= ((x(0) or y(0)) and (x(1) or y(1)) and
(x(2) or y(2)) and (x(3) or y(3))); (xii)
Fig 4. Conventional carry skip adder Modified Carry Skip Adder (CLSKAs) In the case of conventional carry skip adder, each block consists of ripple carry adder and skip logic is used after each block to generate carry for next block. The speed of operation is affected by the method of carry propagation from previous block to next block[4]. While in CLSKAs, carry lookahead scheme is used in each block to generate carry for next block. As a result ther is a better performance in terms of speed as look ahead carry adder is faster than ripple carry adder[5]. Figure shows modified CLSKA with fixed block size i.e. 4-bit each.
Fig 5.Modified carry skip adder Carry Select Adder (CSA) In the carry select adder, the principle used to calculate sum is based on assuming input carry from previous stage. One adder calculates the sum assuming input carry of 0 while the other calculates the sum assuming input carry of 1[6]. Then, the actual carry triggers a multiplexer that selects the appropriate sum . Fig. shows the schematic block diagram of 16-bit Carry select adder consists of 4-
blocks each of 4-bit Look ahead carry adder . Carry output of each block is fed into next block as input carry.
Fig 6. Carry select adder Carry Save Adder (CSA) In carry save adder, if sum of two 16-bit binary numbers is to be computed, so 16 half adders are taken at first stage instead of using 16 full adders. Therefore, carry save unit consists of 16 half adders, each of which computes single sum and carry bit based only on the corresponding bits of the two input numbers. It is used to compute sum of three or more n-bit binary numbers. This adder is same as a full adder Let x and y are two 16 bit numbers and produces partial sum and carry as s and c: Si = xi xor yi (xii) Ci = xi and yi (xiii) The final addition is then computed as: 1. Shifting the carry sequence C left by one place. 2. Placing a 0 to the front (MSB) of the partial sum sequence S. 3. Finally, a ripple carry adder is used to add these two together and computing the resulting sum.
Fig 7. Carry save adder
RESULTS AND ANALYSIS The adders namely, ripple carry adder, carry lookahead adder , manchester carry chain adder, carry select adder, carry save adder, condition carry adder, conventional carry skip adder and modified carry skip adder have been designed using VHDL (Very High Speed Integration Hardware Description Language) for 16-bit unsigned data. In order to demonstrate the performance of these adders , the adders are compared on the basis of their delays and area occupied. The delay and area reports are generated for these specified adders. To get the delay and area report, the following tools are used:
1. Xilinx 9.2i is used as the synthesis tool. 2. Xilinx ISE Simulator is used for
simulation. 3. FPGA – Spartan II is used for
implementation.
The delay and area reports of the adders are generated with the help of the synthesis tool i.e. Xilinx 9.2i. The VHDL codes formulated for these adders are firstly simulated using the Xilinx ISE Simulator and further these codes are synthesized using the synthesis tool. The synthesis tool after the synthesis process generates a synthesis report and this report can provide us with the propagation delay and also the number of 4-input LUTs used by the design out of the total number of LUTs. Further, the VHDL codes of the adders after being simulated and synthesized can be implemented on FPGA kit by downloading design codes on the kit. The VHDL codes implemented on the kit such that the codes are converted in the design format (i.e. the programming file) to be downloaded on the kit. The delay and area reports generated for these adders are given in tabular form in table 1.
ADDERS (With Fix Block Size=4 bit)
DELAY (ns)
LUTs (out of 1536)
Ripple Carry Adder
32.997 32
Carry Lookahead Adder
22.792 17
Manchester Carry Chain Adder
31.744 32
Carry Select Adder
26.056 45
Carry Save Adder
35.424 46
Condition Carry Adder
33.378 32
Conventional Carry Skip Adder
17.636 43
Modified Carry Skip Adder
27.163 69
Table 1. Delay and area report of 16-bit fast adders
CONCLUSION
The delay and area reports generated as a result of simulation and synthesis processes run on the VHDL codes of the adders provide us with the performance analysis of these 16-bit adders. According to the reports of these adders, comparison between the delays of the adders concludes that the conventional carry skip adder has minimum propagation delay (17.636 ns) while it occupies 43 LUTs out of total 1536 LUTs on the Spartan II -XC3S50-5-TQ144 FPGA kit. However, carry lookahead adder has next least propagation delay (i.e. 22.792 ns) and least number of LUTs occupied on the FPGA kit (i.e 17 LUTs out of 1536 LUTs).
From the area and delay reports of these adders , it is observed that there are trade-offs between performance parameters i.e Area and Delay. In order to design delay efficient adder, conventional carry skip adders in which it is possible to skip carry over group of n-bits. This results in less delay as compare to ripple carry adder to generate output sum and carry bit for next block. This result in fast operation but at the cost of few more LUT’s due to carry skip logic.
References
1. R.P.P.Singh,Parveen Kumar and Balwinder Singh, “Performance Analysis of fast adders using VHDL”,2009 International Conference on advances in Recent Technologies in Communication and Computing.
2. Nagendra, C.; Irwin, M.J.; Owens, R.M.,“Area-time-power tradeoffs in parallel adders”, Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on Volume 43, Issue 10, Page(s): 689 – 702, 1996.
3. Hasan Krad and Aws Yousif Al-Taie, “Performance Analysis of a 32-Bit Multiplier with a Carry-Look-Ahead Adder and a 32-bit Multiplier with a Ripple Adder using VHDL”, Journal of Computer Science 4 (4): 305-308, 2008.
4. Wang, Y.; Pai, C.; Song, X., “The design of hybrid carry lookahead/carry-select adders, Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on Volume 49, Page(s): 16-24, 2002.
5. Min Cha and Earl E. Swartzlander, Jr, “Modified Carry Skip Adder for reducing first block delay”, Proc. 43rd IEEE Midwest Symp. on Circuits and Systems, Lansing MI, Page(s): 346-348, 2000.
6. Behnam Amelifard, Farzan Fallah,
Massoud Pedram, “Closing the gap between Carry Select Adder and Ripple Carry Adder: A new class of Low-power and High-performance Adders”, Proceedings of the Sixth International Symposium on Quality Electronic Design (ISQED’05) , 2005.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0109-1
Organic Thin Film Transistor: Materials,
Structures and Operational Parameters Poornima Mittal
1, Brijesh Kumar
2, B. K. Kaushik
3, Y. S. Negi
4 and Krishna Raj
5
1Electronics and Communication Engineering, Graphic Era University, Dehradun, INDIA
3Electronics and Computer Engineering, Indian Institute of Technology, Roorkee, INDIA
2,4Polymer Science and Technology Group, DPT, Indian Institute of Technology, Roorkee, INDIA
5Department of Electronics Engineering, H.B.T.I., Kanpur, INDIA
[[email protected], [email protected], [email protected], [email protected], [email protected]]
ABSTRACT: Organic Thin Film Transistors (OTFTs)
are out breaking their performance over the past few
years and becoming very attractive for large range of
applications such as oscillators, flexible display devices,
small and large scale and even integrated optoelectronic
devices. Transistor based on organic semiconductor as
active layer to manage electric current flow is known as
organic thin film transistor. For the last decade
organic/polymeric materials have been extensively
investigated for substrate, conducting semiconductor
layer, dielectric and contact electrodes for thin film
transistor (TFT) devices. In organic thin film transistor,
the type of semiconductor, processing, doping and
structure can affect their electrical characteristics. This
paper presents new insight into structure, organic
materials, conduction mechanism and performance
characteristics of OTFT. However pentacene based
bottom and top contact structure has been modelled to
characterise adopted structures for organic transistor.
It explores the current status of OTFTs in terms of
various parameters such as contact resistance, effect of
channel length, active layer thickness and on/off current
ratio etc. Organic electronic products are lighter, more
flexible and less expensive than their inorganic
counterparts. These are also biodegradable being made
from carbon. This opens the door to many exciting
applications that would be impossible using silicon.
Since OTFT provide simple and low cost processes, its
application to display has been discussed.
Keywords: Bottom and Top Contact Structures of
OTFTs, Contact Resistance, Mobility, Organic
Materials, Organic Thin Film Transistors.
1. INTRODUCTION
Organic electronics has the potential to create
new range of devices, circuits and their
applications. Some important applications like
display drivers, advertising boards, smart cards,
wall sized televisions, identification tags, portable
products such as modern cell phones and video
games [1]. Organic material based devices like
Organic Thin Film Transistor (OTFT), Organic
Field Effect Transistor (OFET), Organic Light
Emitting Diode (OLED) and Solar Cell have
numerous advantages of low cost, flexibility and
light weight than their inorganic counterparts.
Organic semiconductors can be processed at low
temperatures compatible with plastic substrate
whereas higher temperatures are required for
alternative Si based devices [2, 3]. Organic
transistors can usually be manufactured at or near
room temperature, unlike silicon based
transistors, which typically require fairly high
process temperatures (>800ºC for crystalline Si
transistor).
For simulation of OTFTs certain structures
have been proposed. In order to enhance the
device speed, considerable research effort has
been devoted to increase the mobility of organic
materials by improving deposition conditions [4,
5]. At the same time as a result of this effort,
mobility exceeding 1 cm2/V.sec for Pentacene
[6], this is of comparable value to amorphous
hydrogenated silicon (a-H:Si) and 0.1 cm2/V.sec
for poly (3-hexylthyophine) P3HT [7]. In addition
to mobility, other ways of improving performance
of OTFTs such as channel length scaling and
active layer thickness have also attracted
considerable attention [8]. This paper first
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0109-2
describe the different structures of organic thin
film transistors in section 2, various organic and
polymer materials used for active semiconductor
and dielectric layer in section 3, operation and
characteristics in section 4. Finally parameters
and display application has been discussed in
section 5 and 6 respectively.
2. OTFT STRUCTURES
OTFTs adopt the architecture of thin film
transistor (TFT), which has proven it’s
adaptability with low conductivity materials. It
contains three electrodes source, drain and gate, a
dielectric layer and active organic
semiconducting (OSC) layer. The structure can be
top gate or bottom gate and further both
architectures can be divided into top contact and
bottom contact alternatives as depicted in fig.1 (a)
and (b). The deposition of organic semiconductor
on the insulator is much easier than the reverse
due to fragile nature of organic semiconductors;
hence bottom gate architecture is built in majority
for current OTFTs.
Well known structure for standard silicon
MOSFETs is top-gate-top-contact (TGTC),
however for simulation of OTFT bottom-gate-
top-contact (BGTC) and bottom-gate-bottom-
contact (BGBC) architecture has been modeled
mostly. Certain advantages and disadvantages are
associated with each of OTFT structures. In terms
of field effect mobility among both the structures,
BGTC structure shows better performance in
comparison with BGBC structure. The better
field effect mobility for top contact OTFT is due
to less contact resistance than that of a bottom
contact one [9]. The performance of OTFTs in a
BGBC bottom contact device structure is
generally observed to be lower by two orders of
magnitude than to the top contact device
configuration [10-13].
(a)
(b)
Fig.1 Schematic cross-section of OTFT structures with
pentacene as active semiconductor, Al2O3 as dielectric and
gold contact electrodes. (a) Bottom Gate Top Contact
(BGTC) (b) Bottom Gate Bottom Contact (BGBC).
3. ORGANIC MATERIALS
The performance of OTFTs depends on
their constituent organic semiconductors and
materials of insulator. Following materials are
explained here for different layers of OTFTS.
3.1. SUBSTRATE
For substrates quartz, polycarbonate,
polyethylene naphthalate (PEN), glass, silicon
wafer and polyimide materials can be used [14,
15]. Inorganic substrates have high melting point
and good flatness where as polymer substrates
have high toughness, flexibility and light weight.
3.2. CONTACT ELECTRODE
To improve electrical characteristics, ohmic
contact can be formed between gold (Au) and
organic semiconductor because the work function
of Au is 5.0ev and HOMO of most of the organic
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0109-3
semiconducting materials is around this level.
Adding nickel on gold improves adhesion of the
gold on the oxide. Platinum electrodes are
inferior to gold electrodes. Aluminum shows
slightly higher electron mobility (2.2cm2/Vs) at
room temperature in single crystals [15, 16].
3.3. P-TYPE ORGANIC SEMICONDUCTORS
Organic thin film transistors fabricated with
light weight flexible substrates are expected to
replace hydrogenated amorphous TFT
applications on glass substrates. Table-1 shows
the mobility and on/off current ratio measured
from OTFT by using p- type organic molecules
deposited by different techniques. Among all
investigated oligomeric and polymeric materials,
pentacene thin films have demonstrated the best
electrical performance. Pentacene exhibits typical
p-channel semiconductor characteristics.
TABLE-1 MOBILITY (µ) AND CURRENT ON/OFF
RATIO FOR SOME P-TYPE SEMICONDUCTORS [17].
Material Mobility
cm2/V s
Ion/Ioff
Pentacene 3.2 109
Copper phthalocyanine 0.01-0.02 NR
Polythiophene 10-5
>102
αω-dihexyl-hexathiophene 0.13 >104
P3HT 0.1 106
3.4. N-TYPE ORGANIC SEMICONDUCTORS
It is surprising to note that most of the work
to date has focused on p-type organic materials,
whereas some effort has been guided towards the
preparation of novel n-type semiconductor
materials recently. While designing n-type
devices, semiconductor must be utilized which
can allow the injection of electrons into its
LUMO. Gold has been optimized for source and
drain electrodes [10], and it has a work function
of 5.0ev and since most n-type materials have
solid state electron affinity levels 4.0ev.
Thus charge injection into the semiconductor
would be limit by the energy barrier of
approximately 1ev, is a another issue associted
with complexity of n-type devices. Substantial
effort has gone into the development of organic
n-channel OTFTs because this allows the
implementation of complementary circuits with
low static power consumption [9, 18]. Table-2
gives mobility and current on/off ratio for some
n-type semiconductors.
TABLE-2 MOBILITY (µ) AND CURRENT ON/OFF
RATIO FOR SOME N-TYPE SEMICONDUCTORS [17].
Material Mobility
(cm2V
-1s
-
1)
Ion/Ioff
Pc2Lu
(Lutetiumbisphthalocyanines)
2×10-4
NR
TCNQ
(tetracyanoquinodimethane)
3×10-5
4-450
C60 0.08 106
F16CuPc 0.03 5×104
3.5. MATERIALS FOR DIELECTRIC
Organic polymers having good processability
and dielectric properties, such as poly methyl
methacrylate (PMMA), poly vinyl phenol (PVP),
polyimide (PI), and poly vinyl alcohol (PVA)
have been extensively employed as the gate
insulator. Switchig voltage of OTFTs increase
with low dielectric constant of insulators. Some
important dielectric materials with their dielectric
constant are polyimide - 2.6, PMMA - 2.65,
Al2O3 - 9, and SOG (spin on glass) - 3.9.[17].
4. OPERATION AND CHARACTERISTICS
4.1. OPERATION
TFTs cannot accommodate a bend bending
due to absence of bulk region [19]. The
conducting channel is formed by an inversion
layer in MOSFETs while in TFTs, it is because of
accumulation. Depending upon the polarity of the
gate voltage they can operate in unipolar carrier
(electron or hole) accumulation modes. In a thin
film FET or accumulation type FET, charge-
voltage relation is simply given as:
ρ (x) = [V(x) – Vg] Cox (1)
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0109-4
With ρ and V the local charge per area and
voltage in the channel respectively. Polymeric
material such as Pentacene acts as p-type
semiconductor having holes as majority carriers.
Fig.2 Top contact OTFT operation with pentacene as active
semiconductor layer.
When a negative gate voltage is applied, an
electric field is formed across the dielectric,
causing an accumulation region of holes at the
dielectric-semiconductor boundary shown in fig.
2. Applying a voltage to the source-drain
terminals allows a current to flow across this
accumulation layer between the contacts.
Basically OTFT operates like a capacitor, when a
voltage is applied to the gate an equal (but of
opposite sign) charge is induced at both side of
the insulator [20].
4.2. CHARACTERISTICS
Despite the fact that the transport physics in
organic /polymeric TFTs is different from that in
silicon MOSFETs, the current-voltage
characteristics can to first order be described with
the same formalism:
ID = µWCi/L [(VGS – Vth) VDS – V2
DS/2] (2)
For VGS – Vth > VDS (linear regime)
ID = µWCi/2L (VGS – Vth) (3)
For VDS > VGS – Vth > 0 (saturation regime)
Where, W is the channel width, L is the channel
length, Ci is the gate dielectric capacitance and µ
is the carrier mobility in the semiconductor. The
currene-voltage (Id -Vds) characteristics of OTFT
is similar to inorganic based FETs at gate bias
voltage VGS higher than a threshold voltage Vt, as
illusrated in fig. 3.
Fig. 3 Output characteristics of organic thin film transistor
with Pentacene as semiconductor layer, Al2O3 dielectric
material and gold as contacts.
Characteristics shows a linear (ohmic)
region with dependency of ID on VDS for low
drain-source bias voltage (VDS << VGS) and
saturation of ID occurs at high drain voltages (VDS
>VGS). The biasing voltages and current polarity
is considered as per behavior of device similar to
NMOS or PMOS.
5. PARAMETERS
5.1. MOBILITY
Field effect mobility is a key parameter to
determine the processing speed of organic
devices. Mobility of carriers can be modulated by
gate voltage; it tends to increase when gate bias
increases [20]. By many decades quoted values
for effective mobility for organic transistors vary
in the range of 10-5
to 10 cm2/V s. Mobility
depends on many other factors such as gate
biasing, method of fabrication and the method of
evaluation of the mobility from the simulation
and experiments [21]. The bias dependent
mobility, expressed as power law for polymer
based field effect transistor is given by:
µ (VGS) = µ0 (VGS – VT)
γ (4)
The parameter γ is usually estimated in the
range of 0.2 – 0.5 for different OTFTs/PFETs
[22, 23]. TFTs exhibits mobility up to 0.4 cm2V
-
1s
-1 at low operating voltages (5V) [24, 25]. The
mobility increases from very low values about
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0109-5
0.02 cm2V
-1s
-1 at VG = -14 V to 1.26 cm
2V
-1s
-1 at
-146 V [26].
5.2 ON/OFF CURRENT RATIO
The ratio of current in the accumulation mode
over the current in the depletion mode is called
Ion/Ioff. Current ratio depends upon various factors
such as materials, channel length, and thickness
of semiconductor. Short channel devices shows
higher on/off current ratio over devices having
large length of conducting channel [6]. This ratio
increases with decrease in the thickness of
semiconducting layer. For memory and display
applications high on/off current ratio is more
important requirement than high mobility. It has
been quoted that on/off current ratio has been
measured as 108 for BGBC thin film transistor
structure with Pentacene as organic
semiconductor, cross linked PVP as insulator and
gold contacts for source and drain [29]. One has
observed it around 109 for Pentacene as active
organic semiconductor [31].
5.3. THRESHOLD VOLTAGE
To extract information about impurity
concentrations, interface states and traps it is
common practice to use threshold voltage and sub
threshold current as device evaluation parameters.
In MOSFETs, the sub threshold current
exponentially depends on the gate-bias as well as
the drain-source bias because below threshold the
free carrier density exponentially depends on the
local bias. The threshold voltage (Vt) of OTFTs
varies with either the gate insulator capacitance
[27] or the thickness of the organic film [20].
5.4. CONTACT RESISTANCE
Ideally the contact resistance should be ohmic
and small in order to make enable the whole
voltage applied to the device, contributes to the
transport current. For top contact devices it
strongly depends upon gate bias and sharply
increases at low gate-source voltage, while
contact resistance appears to be almost
independent of the gate bias in bottom contact
structures. Necliudov et al. measured the contact
resistance as 1.3×108 Ohm µm with mobility of
approximately 0.9 cm2/V s for bottom contact
Pentacene OTFT, consistent with an injection
barrier of between 0.2 and 0.3eV in the
simulation, additionally it has been quoted that
top contact resistance is strongly depend on gate
voltage [22] and much less than the bottom
contact resistance at high gate bias.
5.5. EFFECT OF CHANNEL LENGTH
Drain current strongly depends upon the
semiconductor used for channel and it can be
modulated by length of the conducting channel.
M. Austin et al quoted drain current dependence
on the length of channel for P3HT (poly (3-
hexylthiophene)) in OTFTs with different
channel lengths of 1000nm and 70nm. It has been
shown that saturation region is present for long
channel (1000nm) device but no saturation region
appears in the short channel (70nm) device. Long
channel devices are relatively immune to high
contact resistance and when scaled to smaller
channel lengths, the device performance may
degrade [28]. The on/off current ratio is higher
for short channel devices over long channel
devices.
5.6 EFFECT OF ACTIVE LAYER THICKNESS
Electrical parameters of OTFT does not solely
depend upon gate capacitance, these can be
modulated by film thickness and charge injection
from the source electrode. There are trends which
can be expressed as a function of the product of
thickness of polymeric film and gate capacitance
per unit area. It has been observed that with
increasing the permittivity of gate insulator and
thickness of organic material, the mobility
decreases in OTFTs [21].
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0109-6
6. OTFT FOR DISPLAY DEVICES
The companies currently developed a very
diverse set of substrate, drive element and display
mode technologies in order to realize flexible
display. E-paper display market is expected to
show 46.9% annual average growth rate from
US$ 260 million in 2010 to US$ 2.1 billion in
2015 and US$ 7 billion in 2020. OTFTs can be
used to make good displays of LCD or E-paper as
there need high on/off current ratio [29].
TABLE-3 DISPLAY APPLICATIONS WITH OTFTS
WITH PENTACENE AS OSC [29]
App. Specification Organization
OLED 4*4 pixel on PC NHK (Japan)
OLED 8*8 pixel on glass Pioneer (Japan)
LCD 64*128 on plastic ERSO (Taiwan)
LCD 15 in. full color XGA on
glass
Samsung
(Korea)
LCD 1.4 in. 80*80 RGB on glass Hitachi (Japan)
Table-3 summarizes the display prototypes
using OTFTs and LCD (made with OTFT matrix
array) and active matrix organic light emitting
diode (AMOLED) with dot matrix patterns.
Organic/polymer LEDs displays have the
potential to replace LCDs and become the next
dominant force in flat panel display due to require
fewer steps in fabrication processes and have
lower material costs than LCD [30].
7. CONCLUSION
Organic/polymer electronics is a very
promising alternative to crystalline,
polycrystalline and amorphous silicon processes.
Moreover, there are no restrictions as to the
dimensions of the device. It has been observed
that with increasing the permittivity of gate
insulator and thickness of organic material, the
mobility decreases in OTFTs. The effect of
channel length has been discussed; long channel
devices are relatively immune to high contact
resistance. Top contact OTFT shows better field
effect mobility due to less contact resistance than
that of a bottom contact one.
It has been quoted that on/off current ratio is
higher for short channel devices over long
channel devices. For memory and display
applications high on/off current ratio is more
important requirement than high mobility and this
ratio should be more than 108. In spite of
numerous advantages such as, large area
coverage, structural flexibility and especially low
cost, certain limitations like instability, lower
carrier mobility, and shorter lifetimes are
associated with organic material based devices
need to be resolve to commercialize OTFTs
based applications.
REFERENCES
[1] M. Jamal Deen, “Plastic microelectronics with organic
and polymeric thin film transistors,” Proc. 26th
international conference on microelectronics, MIEL,
2008.
[2] Yoshiro Yamashita, “Organic semiconductors for
organic field effect transistor,” Sci. Technol. Adv.
Mater. vol.10, pp-024313, 2009.
[3] H. Klauk, D. J. Gundlach, and T. N. Jackson, “Fast
organic thin-film transistor circuits,” IEEE Electron
Device Lett., vol. 20, pp. 289-291, 1999.
[4] A. R. Brown, A. Pomp, C. M. Hart, and D. M. De
Leeuw, “Logic gates made from polymer transistors
and their use in ring oscillators,” Science, vol. 270, pp.
972-974, 1995.
[5] Y. Sun, Y. Liu and D. Zhu, “Advances in organic field-
effect transistors, ” J. mater. chem. , vol. 15, pp. 53-
65, 2005.
[6] Y. Y. Lin, D. J. Gundlach, S. F. Nelson, and T. N.
Jackson, “Stacked pentacene layer organic thin film
transistors,” IEEE Electron Device Lett., vol. 18, pp.
606–608, Dec. 1997.
[7] Z. Xie, M. Abdou, A. Lu, M. J. Deen, S. Holdcroft,
“Electrical Characteristics of Poly (3-Hexylthiophene)
Thin Film MISFETs,” Canadian J. of Physics, vol. 70
no. 10 & ndash; 11, pp. 1171-1177, 1992.
[8] O. Marinov, M. J. Deen, and R. Datars, “Compact
modeling of charge mobility in organic thin-film
transistors,” J. Appl. Phys. , vol. 106, no. 6, pp.
064501-1–064501-13, Sep. 2009.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0109-7
[9] H. Klauk, “Organic thin film transistor,” Chem. Soc.
Rev., 39, pp. 2643-2666, 2010.
[10] O. Marinov, M. J. Deen, and B. Iniguez, “Charge
transport in organic and polymer thin-film transistors:
Recent issues,” Proc. Inst. Elect. Eng. Circuits Devices
Syst., vol. 152, no. 3, pp. 189–209, Jun. 2005.
[11] N. Karl, “Charge Carrier Transport in Organic
Semiconductors,” Synth. Met. , vol. 649, pp . 133-
134, 2002.
[12] R. A. Street and A. Salleo,” Contact effects in polymer
transistors,”Appl. Phys. Lett. vol. 81, no. 15, pp. 2887,
2002.
[13] S. F. Nelson, Y. Y. Lin, D. J. Gundlach and T. N.
Jackson, “Temperature independent transport in high
mobility pentacene transistors,” Appl. Phys. Lett. vol.
72, no.15, pp.1854, 1998.
[14] F. Garnier, “Thin-Film Transistors Based on Organic
Conjugated Semiconductors, Chem. Phys., 227, 253,
1998.
[15] H. Klauk, D. J. Gundlach, M. Bonse, C. C. Kuo, and
T. N. Jackson, “A reduced complexity process for
organic thin film transistors,” Appl. Phys. Lett., 76,
1692, 2000.
[16] J. H. Schon, Ch. Kloc, and B. Batlogg, “On the
intrinsic limits of pentacene field-effect transistors,”
Organic Electronics., vol.1, no. 57, 2000.
[17] C. Shekar, T. Lee and S. W. Rhee, “Organic thin film
transistors, material, processes and devices,” Korean J.
Chem. Engg., vol. 21, no. 1, pp. 267-287, 2004.
[18] G. Horowitz, “Organic field-effect transistors,” Adv.
Mater. vol. 5, pp. 365-377, 1998.
[19] P. Stallinga, and H. L. Gomes, “Modelling electrical
characteristics of thin-film-field-effect transistor, I.
Trap-free materials,” Synthetic Metals, 156, pp. 1305-
1315, 2006.
[20] G. Horowitz, “Organic thin film transistors: From
theory to real devices”, J. Mater. Res., vol. 19, no. 7,
pp. 1946-1962, Jul 2004.
[21] O. Marinov, M. J. Deen, and B. Iniguez, “Performance
of organic thin film transistors,” J. Vac. Sci. Technol.,
vol. 24, no. 4, pp. 1728–1733, 2006.
[22] P. Necliudov, M. Shur, D. Gundlach, and T. Jackson,
“Modeling of organic thin film transistors of different
designs,” J. Appl. Phys., vol. 88, no. 11, pp. 6594–
6597, Dec. 2000.
[23] De Leeuw, D. Gelinck, G. Geuns, T. Van Veenendaal,
E. Cantatore, E. and B. Huisman, “Polymeric
integrated circuits: fabrication and first
characterization,” IEEE-IEDM, 2002, pp. 293–296.
[24] C. D. Dimitrakopoulos, S. purushothaman, J. Kymissis,
A. Calleggari and J. M. Shaw, “Low-Voltage Organic
Transistors on Plastic Comprising High-Dielectric
Constant Gate Insulators,” Science, vol. 283 no. 5403
pp. 822-824, February 5, 1999.
[25] C. D. Dimitrakopoulos, I. J. Kymissis, S.
Purushothaman, D. A. Neumayer, P. R. Duncombe,
and R. B. Laibowitz, "Low-Voltage, High-Mobility
Pentacene Transistors with Solution-Processed High
Dielectric Constant Insulators," Adv. Mater. 11, 1372,
1999.
[26]C. D. Dimitrakopoulos and P. R. L. Malenfant,
“Organic Thin Film Transistors for Large Area
Electronics,” Adv. Mater. vol. 14, pp. 99-117, 2002.
[27] M. J. Deen, O. Marinov, Jianfei Yu, S. Holdcroft and
W. Woods, “Low-frequency noise in polymer
transistors,” IEEE Trans. on Electron Devices, vol. 48,
no. 8, pp. 1688-1694, 2001.
[28] I. G. Hill, “Numerical simulations of contact resistance
in organic thin-film transistors,” Appl. Phys. Lett. Vol.
87, pp. 163505-1-163505-3, 2005.
[29] Jin Jang and S. H. Han, ”High performance OTFT and
its application,” Current Applied Physics, 6S1, pp.
e17-e21, 2006.
[30] A. Afzali, C. D. Dimitrakopoulos and T. L., Breen,
“High-performance, solution-processed organic thin
film transistors from a novel pentacene Precursor,” J.
Am. Chem. Soc., vol. 124, pp. 8812, 2002.
[31] J. H. Schon, S. Berg, Ch. Kloc, and B. Batlogg,
“Ambipolar pentacene field-effect transistors and
inverters,” Science, vol. 287, pp. 1022, 2000.
CHARACTERIZATION OF 4T SRAM CELL
Setu Garg1, Prof.S.N.Sharan2, Garima Chandel3 Member IEEE, Hridesh Verma4
1 GCET, Greater Noida,2GNIT, Greater Noida, 3,4ABES IT Ghaziabad, India. [email protected], [email protected], [email protected],
ABSTRACT — The Static Random Acess Memory discussed in this paper is based on a Four-Transistor SRAM cell. This paper focuses on the various important parameters viz., Static Noise Margin Analysis and Bit Line Leakage current analysis to characterize Four-Transistor SRAM cell. Maximum allowable SNM is needed to be investigated for efficient operation of SRAM cell. The purpose of this analysis is to measure the SNM of bit cell without flipping the cell contents.Bit line leakage current analysis is also done. Analysis involves bit cell contribution to column leakage and margin available for sum of total cell leakage current in a long column. The performance and results have been validated through simulations using ELDO tool from Mentor Graphics Corporation.
Index Terms – SRAM, Bit Line, Static Noise Margin, DC Source, Word Line
I. INTRODUCTION
Static random-access memory (SRAM) is a critical component across a wide range of microelectronics applications from consumer appliances to high-end workstation and microprocessor applications. For almost all fields of applications, semiconductor memory has been a key enabling technology. It is forecasted that embedded memory in SoC designs will cover up to 90% of the total chip area. A representative example is the use of cache memory in microprocessors. The operational speed could be significantly improved by the application of on-chip cache memory that temporarily stored a fraction of the data and instruction content of the main memory.
The SRAM consists of an array of static memory cells which are connected by horizontal word lines and vertical bit lines. To select a word line out of 2h, a h-bit address has to be applied. The output data is usually organized as a word of b-bits. From the architectural point of view the output word
represents a b-bit input/output (I/O -port). The I/O-port consists of b I/O-blocks, i.e. one block per bit of the output word. Each bit of the I/O-port can be connected to one out of 2w bit lines by a 2w-to-1 column or bit line multiplexer. Any SRAM cell can be accessed by an address word which is (h + b) bits long. This address is applied to the control logic block which controls all the memory operations, e.g. write, read, enable, data- in, data-out..
II. BASIC SRAM ARCHITECTURE A typical static random access memory (SRAM) architecture is as shown in Figure 1. It consists of a matrix of memory cells arranged in an array of 2N rows by 2M columns. The total size of the memory array is 2M x 2N bits. During a read operation, one of the 2N rows (Word lines) is selected by the row address decoders by decoding the row addresses. All the memory cells in the given word line are enabled. The column decoder selects one of the 2M columns and the value of the selected memory cell is read out by the sense amplifier. The data into and out of the memory array is controlled by the Read-Write control circuit.
Figure 1. Static Random Access Memory Architecture
III. SCHEMATIC AND READ/WRITE OPERATION OF 4T SRAM CELL
In four Transistor SRAM celll two NMOS
transistors are used as pass transistors to access the cell and two PMOS transistors which are used as drivers to the cell.
Figure2. Schematic Of The Cell
A. WRITE OPERATION
In order to store a logic ‘1’ to the cell, BL is charged to Vdd and BL’ is charged to ground and vise versa for storing ‘0’. Then the word line is switched to Vdd to turn on the NMOS access transistors. When the access transistors are turned on, the values of the bit lines are written into Q and Q’. The node that is storing logic’1’ will not go to full Vdd because of a voltage drop across the NMOS access transistor. After the write operation the word line voltage is reset to ground to turn off the NMOS access transistors. The node with the logic’1’ stored is pulled up to full Vdd through the PMOS driver transistors.
B. READ OPERATION
The read operation of the cell is different from that of 6T cell. To read from the cell the bit lines are charged to ground instead of Vdd and the word line voltage is set to Vdd to turn on the NMOS access transistors. The node with logic’1’ stored will pull the voltages on the corresponding bit line up to a high (not Vdd because of the voltage drop across the NMOS access transistor) voltage level. The other bit line is pulled to ground. The sense amplifier detects which bit line is at high voltage and which bit line is at ground.
If the cell was storing a logic’0’ the voltage level of BL will be lower than BL’ so the sense amplifier will output a logic‘0’. If the cell was storing logic’1’ then the voltage level of BL will be higher than BL’ then the sense amplifier will output a logic’1’
IV. EXPERIMENTS AND RESULT
A. Static Noise Margin Analysis
SNM quantifies the maximum level of voltage nose which can be present at the internal nodes of a bit cell without flipping the cell contents. Figure 3shows the location Q and Q´, the noise margin sources in the 4T SRAM cell schematic. The purpose of this analysis is to measure the SNM of bit cell. A SRAM cell should be designed such that under all conditions some SNM is reserved to cope withdynamic disturbances caused by a particle, cross talk, voltage supply ripple and thermal noise. I have done SNM analysis for 6T and 4T cell for 0.18µ technology node. The method and results are shown below.
Figure 3. 4T SRAM cell simulated structure
B. Method to calculate Static Noise Margin
To analyze Static Noise Margin, introduce a DC noise source inside the SRAM cell and see where the cell flips .Put the WL (Word Line) at Vdd . Bit Line and Bit Line’ (BL and BL’) are connected to ground.Iinitialize Q’ with Vdd and Q with 0. Now slowly increase VX from 0 and monitor points Q and Q’ to investigate where the cell flips. Static Noise Margin is measured to be 362.3279 mV.
C. Bit Line Leakage Current Analysis
The purpose of this analysis is to characterize the bit cell contribution to column leakage. The main purpose of this test is to see the margin available for the sum of total cell leakage currents in a long column (from unselected WLs) during a read operation. This simulation should be used as guidelines for designing the maximum number of physical rows in a SRAM array.
Figure 4. Bit Line Leakage Current Calculation
D. Method to calculate Bit Line Leakage Current
To do Bit Line Leakage Current Analysis initialize the output Q to ‘0’ and Q’ to Vdd . At this time Word Line (WL) is in off condition and therefore set to 0. BL and BL’ are connected to ground. Now leakage current is measured as the current through MN4 (pass transistor facing the ‘1’).The Bit Line Leakage Current for 4T SRAM cell is measured to be 7.1441 pA.
V. CONCLUSION
The two basic parameters static noise margin and bit line leakage current are successfully measured.All the simulations are done in ELDO tool from Mentor Graphics Corporation. Both the parameters discussed in this paper are very important in characterization of 4T SRAM cell. A SRAM cell is designed such that under all conditions some SNM is reserved to cope with dynamic disturbances caused by particle, cross talk,
voltage supply ripple and thermal noise. Static Noise Margin is measured to be 362.3279 mV. For BLCC it is also seen the margin available for the sum of total cell leakage currents in a long column during a read operation. The Bit Line Leakage Current for 4T SRAM cell is measured to be 7.1441 pA. Objective is also to keep Bit line Leakage Current as low as possible.
VI. REFERENCES
[1]. Neil H. E. Weste and Kamran Eshraghian, “Principles of CMOS VLSI Design,” Second-Edition, Pearson Education Asia, 2002.
[2]. S. M. kang and Y. leblebici, “CMOS Digital Integrated Circuits,” Third Edition, Tata McGraw –Hill, 2002.
[3]. Tegze P. Haraszti, “CMOS Memory Circuits”, Kluwer Academic Publishers, 2000 .
[4]. Semiconductor Memories, A handbook of design, manufacture and application By “Betty Prince”. [5]. Stephan De Beer, Monuko du Plessis, and Evert Seevinck,”An SRAM Array Based on a Four-Transistor CMOS SRAM Cell”, IEEE Transactions On Circuits and Systems—Fundamental Theory and Applications, Vol. 50, No. 9, September 2003.
[6]. Jinshen Yang ,Li Chen,”A New Loadless 4-Transistor SRAM Cell with a 0.18 μm CMOS Technology”, IEEE,2007.
[7]. Ding-Ming Kwai ,”Review of 6T SRAM Cell” ,Intellectual Property Library Company ,June 3, 2005.
[8]. T-H Joubert, E Seevinck, M du Plessis, “A CMOS REDUCED-AREA SRAM CELL”, ISGAS 2000 - IEEE International Symposium on Circuits and Systems, May 28-31, 2000, Geneva, Switzerland.
[9]. Bharadwaj S. Amrutur and Mark A. Horowitz,” Speed and Power Scaling of SRAM’s”, IEEE Transactions on Solid State Circuits, Vol.. 35, No. 2, Febraury 2000.
U
Quantitative Analysis and Optimization Techniques
for On-Chip Cache Leakage Power
Vikas Tiwari Shyam Akashe Rajkumar Rajoriya
M.Tech (VLSI Design) Associate Professor. Assist. Professor
ITM, Gwalior, India ITM, Gwalior ITM, Gwalior e-mail: [email protected] e-mail:[email protected]
Abstract—On-chip L1 and L2 caches represent a sizeable fraction of the total power consumption of microprocessors. In nanometer-scale technology, the sub threshold leakage power is becoming one of the dominant total power consumption com- ponents of those caches. In this study, we present optimization techniques to reduce the sub threshold leakage power of on-chip caches assuming that there are multiple threshold voltages, ’s, available. First, we show a cache leakage optimization technique that examines the tradeoff between access time and sub threshold leakage power by assigning distinct ’s to each of the four main cache components—address bus drivers, data bus drivers, decoders, and static random access memory (SRAM) cell arrays with sense amplifiers. Second, we show optimization techniques to reduce the leakage power of L1 and L2 on-chip caches without affecting the average memory access time. The key results are: 1) two additional high ’s are enough to minimize leakage in a single cache—3 ’s if we include a nominal low for micro- processor core logic; 2) if L1 size is fixed, increasing L2 size can result in much lower leakage without reducing average memory access time; 3) if L2 size is fixed, reducing L1 size may result in lower leakage without loss of the average memory access time for the SPEC2K benchmarks; and 4) smaller L1 and larger L2 caches than are typical in today’s processors result in significant leakage and dynamic power reduction without affecting the average memory access time.
Keywords—Microprocessor memory hierarchy, multiple threshold voltage, on-chip caches, SRAM, sub threshold leakage power.
I. INTRODUCTION
NTIL VERY recently, only dynamic power has been a
significant source of power consumption, and Moore’s
law has helped to control it. Shrinking processor technology below 100 nm has allowed, and actually required, reducing the
supply voltage to reduce dynamic power consumption. How-
ever, smaller geometries with a low-threshold voltage exacer-
bate leakage, so static power is beginning to dominate the power
consumption equation [1]. For example, a 90-nm Pentium 4 con-
sumes 110 W, and roughly 40% of the total power dissipation
is consumed by leakage power [2]. The excessive heat dissipa-
tion by the leakage power in the high-end 90-nm Pentium 4 pro-
cessor forced Intel Corporation to adopt more expensive power
delivery, cooling, and packaging systems.
A potentially important source of this power dissipation is
on-chip caches, because larger on-chip caches are being
integrated onto the chip. For example, an Intel processor for
server applications has 1 and 6 MB on-chip L2 and L3 caches,
respectively1; subthreshold leakage power is dissipated by all
of the subbanks even if they are not accessed, while dynamic
power is dissipated when a cache subbank is accessed. To
alleviate this problem, transistors in caches could be designed
for low subthreshold leakage, for example, by assigning them
a higher threshold voltage or by controlling the with
adaptive body biasing or, if a better balance of speed and power
is required, by employing dual [3]–[7]. Traditionally, at
most two ’s—one low and one high —have been avail-
able in high-performance process technologies, allowing cache
designers only limited flexibility for suppressing subthreshold
leakage current. To further improve the subthreshold leakage,
several circuit and microarchitectural techniques [8]–[13] have
therefore been proposed targeted at the subthreshold leakage
power reduction of L1 caches.
One consequence of the increasing importance of sub-
threshold leakage current is that, the number of available ’s
in future process technologies will increase. Next-generation
65-nm processes are expected to support three ’s (one
low and two high ’s) and future processes are likely to
provide designers with even more choices. This increase
provides new flexibility for subthreshold leakage power re-
duction methods, allowing new tradeoffs between the of
different parts of a cache and between different levels in the
cache hierarchy. The availability of additional ’s suggests a
new examination of the tradeoff between cache size and to
reduce power loss from subthreshold leakage current.
In this study, we present systematic techniques for assigning
multiple ’s to memory hierarchies to minimize power dis-
sipation, in particular subthreshold leakage [14]. Based on our
techniques, we provide a detailed quantitative tradeoff analysis
between access time and subthreshold leakage power of on-chip
caches as a function of the number and the strength of .
Although the qualitative trends of subthreshold leakage power
versus access time tradeoff are well known, this paper provides a
detailed quantitative analysis to determine the optimal number
of ’s for given design constraints and to justify the cost of
extra ’s. First, we examine optimal leakage power dissipa-
tion for various access times in on-chip SRAM caches, when
more than one high is available. Then, we show how many
high ’s are needed, in addition to a nominal required for
the processor’s general logic circuits and how much should
be increased for effective leakage power reduction for
TABLE I CACHE ORGANIZATIONS FOR EACH CACHE SIZE
various cache access time points. Second, we present how cache
leakage power can be reduced while maintaining the same av-
erage memory access time of a processor memory system using
L1 and L2 cache access statistics for SPEC2K workloads [15].
The reminder of this study is organized as follows. Section II
explains our on-chip cache subthreshold leakage power and ac-
cess time-modeling methodologies. Section III presents a sub-
threshold leakage power optimization technique for a given ac-
cess time constraint and provides a quantitative tradeoff analysis
of on-chip cache subthreshold leakage power and access time.
Section IV presents two-level cache leakage power optimization
techniques using cache access statistics. Section V discusses fu-
ture directions for this line of work and adds some concluding
remarks.
II. ON-CHIP CACHE LEAKAGE POWER AND ACCESS
TIME MODELS
To examine tradeoffs between subthreshold leakage power
and access time of a processor cache memory system, we need
circuit models to estimate the subthreshold leakage power and
access time of caches. Rather than starting from scratch, we
could have built on a widely used cache memory model called
―CACTI‖ [16]. This model estimates access time, dynamic en-
ergy dissipation, and area of caches for given cache configura-
tion parameters such as total size, line size, associativity, and
number of ports. However, it is based on an outdated 0.8- m
CMOS technology and it applies linear scaling to obtain the fig-
ures for smaller process technologies. Furthermore, it does not
provide access time and leakage power when multiple ’s are
available. To address these shortcomings, we designed caches
with the 70-nm Berkeley predictive technology model (BPTM)2
in anticipation of the next generation of process technology.
Then, we derived our subthreshold leakage power and access
time models based on the HSPICE simulations of the designed
cache circuits.
The designed caches ranged from 16 to 1024 KB in size. The
bitlines and wordlines were segmented to improve access time,
and subbanks were employed to reduce dynamic power dissipa-
tion [17] as well; see Table I for the cache subbank organization
used in this study. The caches were broken into four components
for the purposes of assigning distinct ’s: address bus drivers,
Fig. 1. Cache subbank organization.
data bus drivers, decoders, and 6T-SRAM cell arrays with sense
amplifiers. Fig. 1 illustrates the cache subbank organization used
in this study.
The circuit topology and the ratios of transistors in
the decoder circuits are based on the CACTI model but opti-
mized for the 70-nm technology. In addition, modern techniques
for lower voltage are employed for the bitline precharge and
sense-amplifier circuits. For the address and data bus intercon-
nects, we employed an H-tree topology and inserted repeaters
on each branch of the buses to optimize the interconnect delay
of cache buses. To obtain the interconnect capacitance and re-
sistance of long wires such as bitlines, wordlines, address, and
data buses, the lengths of the interconnects are estimated using
SRAM cell dimensions of 1.42 m 0.72 m and the cache or-
ganizations in Table I. Then, for given interconnect length, the
predictor provided in footnote 2 is used to estimate the intercon-
nect capacitance and resistance.
HSPICE simulations were run extensively to obtain leakage
power and access time (or delay) models for wide ranges of
cache sizes and ’s for their four components. We considered
’s between 0.2 and 0.5 V in steps of 0.05 V at 1-V nominal
supply voltage. We measured the leakage power and the delay
of each cache component separately.
A. Leakage Power Models
Fig. 2 shows versus leakage power of the 7 128,
8 256, and 9 512 row decoders that we designed. The
HSPICE simulation results shown in Fig. 2 agree with the
exponential decay in leakage power with a linear increase of
that is characteristic of general CMOS circuits
(1)
To obtain an approximated analytic equation for leakage power
as a function of , we measured the leakage power of the de-
coders at each discrete point, and we applied an exponen-
tially decaying curve fitting method to the measured leakage
power as follows:
(2)
where , and are constants derived from using Origin
6.1, which is a scientific graphing and analysis software curve-
TABLE II
CACHE COMPONENT LEAKAGE POWER MODEL COEFFICIENTS AT 70 C DIE TEMPERATURE AND A TYPICAL CORNER FOR EACH CACHE SIZE
Fig. 2. Leakage power dissipation of the 7 128, 8 256, and 9 512 decoders.
fitting package3—the -squared error is less than 0.001 for each
fitted curves.
The rest of the cache components—address driver, data
driver, and 6T SRAM cell array—show the same leakage
power trend characteristics as the decoder of Fig. 2; leakage
power decreases exponentially with the linear increase of .
Hence, an identical curve-fitting method can be applied for
these components to derive leakage power models like (2). The
coefficients for all of the components in (2) can be found in
Table II.
Once all of the approximated analytic leakage power models
for each component are derived for a cache size, the total
leakage power of the cache can be approximated as the sum
Fig. 3. Delay time of 7 128, 8 256, and 9 512 decoders.
of the leakage power of all the components. Assuming that we
apply four distinct ’s, the analytic approximated equation
for leakage power (LP) is
(3)
where , and represent the ’s for address
bus drivers, data bus drivers, decoders, and 6T-SRAM cell ar-
rays, respectively. Each exponential term evaluates the leakage
power dissipation of one of the four components.
B. Access Time Models
Fig. 3 shows versus delay time of the 7 128, 8 256,
and 9 512 row decoders that we designed. Basically, the
TABLE III
CACHE COMPONENT DELAY MODEL COEFFICIENTS AT 70C DIE TEMPERATURE AND A TYPICAL CORNER FOR EACH CACHE SIZE
CMOS circuit delay of ultra deep submicrometer short-channel
transistors is
(4)
where , and 4 are constants depending on the technology
and transistor sizes. The measured delay time trends in Fig. 3
agree with (4). However, the circuit delay or access time also fits
very well to an exponential growth function with a very small
exponent over our range of interest. It was convenient for some
of our optimizations to approximate delay this way.
To obtain an approximated analytic equation for delay time
as a function of , we measured the delay time of the decoders
at each discrete point, and we fit the following exponential
curve to the measured delay time:
(5)
where , and are constants derived using the same tech-
nique as that used for the leakage power models.
The rest of the cache components show the same delay trend
characteristics as the decoder case of Fig. 3. Hence, the same
curve-fitting technique can be applied for those components to
derive approximated delay time models as functions of like
(5). The coefficients for all the components in (5) can be found
in Table III.
Once all of the approximated delay time models for each
component are extracted for a specific cache size, total delay
or access time of the cache can be approximated as a sum of the
delay times of all the cache components. Assuming that we can
4 was around 2 in submicrometer technology, but it has been decreased to
about 1.3 in the current generation deep-submicrometer technology.
Fig. 4. Access time and leakage power versus cache size of the baseline caches.
apply four distinct ’s, the analytic approximated equation for
the access time (AT) is
(6)
where , and represent the ’s for address
bus drivers, data bus drivers, decoders, and 6T-SRAM cell ar-
rays, respectively. Each exponential term corresponds to the
delay time of one of the four components.
We also define baseline caches in which the of all the
cache components is set to a low- (0.2 V). Fig. 4 shows the
access time and the leakage power of the baseline caches. The
cache access time grows logarithmically and the leakage power
increases linearly with the cache size. Those trends agree with
those of earlier studies on SRAM design. In Fig. 4, we assume a
direct-mapped cache organization and consider only the leakage
power of data arrays, disregarding the leakage of the tag com-
parators and other cache control logic.
III. CACHE LEAKAGE OPTIMIZATION WITH MULTIPLE
ASSIGNMENTS
A. Methodology
In this section, we present a leakage power optimization tech-
nique assuming that we can assign multiple ’s to a cache. To
find the minimum leakage power of caches using a maximum
of four distinct ’s under a specified target access time con-
straint, we formulate the problem as follows:
constraints
(7) Fig. 5. Normalized optimum LP and V versus normalized AT of 512-KB caches—schemes I and II.
(8)
where , and represent the ’s for address
bus drivers, data bus drivers, decoders, and 6T-SRAM arrays,
respectively.
There exist numerous combinations of , and
satisfying a specific target access time. Among those combinations, we find a quadruple of , and producing minimum leakage power using a numerical optimiza-
tion method (e.g., Matlab’s fmincon function). We allowed the
combination that satisfies a specified access time error range
within 5%. We can repeat this procedure with modified objec-
tive and constraint functions to find an optimal combination
for cache memories that have only two or three distinct ’s.
Assuming that we can assign distinct ’s to each compo-
nent of the cache, it is important to determine how many ’s
are cost-effective because an extra mask and process step are
needed for each additional . To examine the dependence of
the optimization results on access time, we sweep the target ac-
cess time from the fastest possible (assigning a low of 0.2 V
to all the cache components) to the slowest possible (assigning
a high of 0.5 V to all the cache components). We present
here the summary of the assignment schemes we examined
in this study.
• Scheme I: Assigning a high- to all of the cache compo-
nents including address bus drivers, data bus drivers, de-
coders, and 6T-SRAM cell arrays. This requires 2 ’s if
we include a nominal or low for the processor’s gen-
eral logic circuits.
• Scheme II: Assigning a high- only to the 6T-SRAM cell
arrays that dominates leakage power but not the overall
cache delay and assigning a default- or low- (0.2 V) to
the rest of the transistors. This requires at least two ’s if
we include a nominal or low for the processor’s logic.
• Scheme III: Assigning a high- to the 6T-SRAM cell ar-
rays and assigning another high- to the peripheral com-
ponents—address bus drivers, data bus drivers, and de-
coders of the cache. This requires at least three ’s if
we include a nominal or low for the processor.
• Scheme IV: Assigning four distinct high ’s to all four
cache components. This requires at least five ’s if we
include a nominal or low for the processor logic.
B. Leakage Power Optimization and Quantitative Tradeoff
Analysis
In Fig. 5, we plot the normalized optimum leakage power and
at different target access times (125%, 150%, 175%, and so
forth) of 512-KB caches employing schemes I and II. The op-
timum leakage power and the are obtained using (7) and (8)
of Section III-A. The parenthesized I and II in Fig. 5 represent
the schemes I and II, respectively. In the graph, the normalized
minimum leakage power and the access time of 100% corre-
spond to the access time and the leakage power of a 512-KB
baseline cache designed with a low (0.2 V) for all four cache
components—the fasted but leakiest cache. The 125% access
time in the axis means that the cache is 25% slower than the
baseline cache.
According to the trends shown in Fig. 5, the leakage power
decreases exponentially as the increases linearly; note that
the axis is a logarithmic scale. The optimization results for the
different cache sizes show almost the same normalized optimum
leakage power and trends as those of the 512-KB caches in
Fig. 5 as long as the same assignment scheme is applied; see
Table IV for the normalized leakage power of all the cache sizes.
Comparing two schemes—scheme I and II—the 512-KB cache
with scheme II dissipates less leakage power than the one with
scheme I at the same access time point when the normalized ac-
cess time constraint is less than 155%. For example, at the 125%
access time point, scheme II shows 6% leakage dissipation of
the baseline 512-KB cache and scheme I shows 13% leakage
dissipation—a 2 difference. However, scheme I shows better
leakage power reduction beyond a 155% normalized access time
point.
Fig. 6 shows the normalized optimum leakage power and versus normalized access time trends for a 512-KB cache of
scheme III. The optimum leakage power and the ’s are ob-
tained using (7) and (8) of Section III-A. In Fig. 6, the of
TABLE IV
PERCENTAGE LEAKAGE POWER OF SCHEMES I–IV NORMALIZED TO LEAKAGE POWER OF EACH CACHE SIZE AT THE 100% AT POINT
Fig. 6. Normalized optimum LP and V versus normalized AT of 512-KB caches—schemes I and III.
the SRAM cell array, denoted as array in the graph, starts to in-
crease first. This implies that the SRAM cell array is responsible
for the most significant fraction of total cache leakage power,
but it has the least impact on increasing the total cache access
time. After the of the SRAM cell arrays are saturated to the
maximum allowed point (0.5 V), the of the peripheral com-
ponents labeled as peri in the graph is increased further to reduce
further leakage power in the peripheral components. However,
this just increases the access time without much further cache
leakage reduction. For example, the leakage power is not de-
creased over the 215% access time point where the for the
peripheral circuit has not reached the maximum value (0.5 V)
in this 512-KB cache case.
This leakage power and versus access time trends also ex-
plain the leakage optimization results shown in Fig. 5: scheme II
shows a better optimization result than scheme I does when the
normalized access time is less than 155%, but it does not beyond
155% access time point. Recall that scheme I assigns a high- to all the cache components. It sacrifices more access time un-
necessarily by increasing the of the peripheral components
with little leakage reduction at the same access time. However,
scheme II assigns the high- to just the SRAM cell arrays
that are responsible for a greater fraction of total cache leakage
power but affects access time less. However, scheme II cannot
Fig. 7. Normalized optimum LP and V versus normalized AT —scheme IV.
reduce leakage power beyond the 155% access time point, be-
cause the leakage power of the peripheral components, where a
low is used, becomes substantial beyond this point.
Fig. 7 shows the normalized optimum leakage power and versus normalized access time trends for a 512-KB cache of
scheme IV. The optimum leakage power and the ’s are ob-
tained again using (7) and (8) of Section III-A. In scheme IV,
we can assign up to 4 distinct ’s for leakage power opti-
mization. According to the results shown in Fig. 7, the of
the 6T-SRAM cell arrays starts to increase first similar to the
scheme III case. Among the peripheral components, the for
the data bus starts to increase first. This implies that the data
bus consisting of 128 b—the assumed bus width between the L2
and L1 caches—has the second most significant impact on the
leakage power. Even though the address bus has the same struc-
ture, the number of bits in the address bus is much smaller than
the data bus. Hence, the leakage power impact of the address
bus much less than the data bus. However, in the case of smaller
caches (e.g., 16–64 KB caches) where the data bus width is 32 b,
both the data and address bus have almost the same impact on
the leakage power. Therefore, the trends for both the data and
address buses will be the same. These trends suggest the di-
rection of optimizations that reduce cache leakage power.
Table IV summarizes the normalized cache leakage power of
schemes I–IV. As expected, we can reduce more leakage power
TABLE V
CACHE DYNAMIC ENERGY CONSUMPTION PER ACCESS AND LEAKAGE POWER
DISSIPATION AT 70 C DIE TEMPERATURE AND A TYPICAL CORNER FOR EACH CACHE SIZE
while achieving the same access time by having more ’s to
control. If the access time is fixed, the caches of schemes III and
IV always show 38%–72% better leakage optimization results
than those of scheme I. There are a few things we should note
from this comparison study. First, as the target access time is
increased to more than the 150% point in scheme II, caches dis-
sipate more leakage power than those employing scheme I. This
implies that the cache peripheral components consume nonneg-
ligible leakage power. The leakage power of those components
becomes substantial when we cut down the leakage power of the
6T-SRAM arrays significantly. Second, the slowest cache ac-
cess time of scheme II ends around 150% in small-size caches.
This means that the peripheral components also play important
roles in both cache leakage power and access time. In other
words, increasing the of 6T-SRAM cell arrays alone gives us
diminishing returns at some point without reducing the leakage
power further. This is why the caches of scheme I give even
better results than those of scheme II as increases. Finally,
there is a negligible difference between caches of schemes III
and IV in terms of leakage power reduction. This implies that
scheme III employing two distinct high ’s—three ’s if
we include a nominal or low for the processor—is enough
to minimize leakage. Finally, as illustrated in Figs. 5–7 and
Table IV, each cache shows a wide range of optimal leakage
power consumption depending on target access time. Hence, the
right tradeoff point between the leakage power and the access
time of the caches will be determined by either system design
specifications or constraints.
IV. LEAKAGE OPTIMIZATION TECHNIQUES
FOR TWO-LEVEL CACHES
A. Methodology
In a processor memory system, the average memory access
time (AMAT) [18] is a key metric for measuring the overall
memory system performance. To evaluate the performance or
AMAT, it is essential to examine the cache miss characteristics
of realistic applications, because the performance or AMAT is a
function of L1 and L2 cache miss rates and cache access times.
In our study, we assume that the memory system hierarchy con-
sists of separate L1 instruction and data caches with a unified L2
cache. Then, the average performance of the processor memory
system can be measured or compared with the AMAT repre-
sented by
(9)
where HitTime and HitTime are the access time of L1 and
L2 caches, Miss Rate and Miss Rate are the miss rate of L1
and L2 caches, and Miss Penalty is the external memory
access and data transfer time. Note that the local miss rate5 is
used as the Miss Rate .
Similarly, we measure the average memory access energy
(AMAE) to compare the dynamic energy dissipation of each
memory system configuration. Assuming that the L1 cache is
accessed every cycle, the AMAE represents the average en-
ergy dissipation per access in the entire microprocessor memory
system that includes L1, L2, and main memory. We can estimate
average memory access energy, as follow:
(10)
where Hit Energy is average energy dissipation per access
given in Table V. We assume a two-channel 1066-MHz
256-MB RAMBUS DRAM RIMM whose sustained transfer
rate is 4.2 GB/s [19] to derive the main memory access time
and dynamic energy dissipation per access. Though the sus-
tained transfer rate is quite high, we should also consider the
RAS/CAS latency of the memory, which is about 20 ns. For the
energy dissipation per access, we used the number given in [20],
which is 3.57 nJ per access. The dynamic energy dissipation
per access can vary depending on the number of RIMMs. We
assume that one RIMM is installed. See Section IV–B and note
that more RIMMs are favorable for our optimization technique,
because our technique prefers a larger L2 cache to a smaller
one for leakage power reduction. The larger L2 cache accesses
DRAM less frequently than the smaller one, resulting in less
energy consumption for accessing the external DRAM. Hence,
if more RIMM modules are installed implying more energy
dissipation per DRAM access, a larger L2 cache will allow
even more energy to be saved.
To obtain L1 and L2 cache miss rates, we use the Simple-
Scalar/Alpha 3.0 tool set [21], which is a suite of functional and
timing simulation tools for the Alpha AXP ISA. In addition, we
collected the results from all 25 of the SPEC2K benchmarks [15]
to perform our evaluation. All SPEC programs were compiled
for a Compaq Alpha AXP-21 264 processor using the Compaq
C and Fortran compilers under the OSF/1 V4.0 operating system
using full compiler optimizations . We completed the ex-
ecution for each benchmark application to get reliable L2 cache
miss rates, because L2 cache accesses are far less frequent than
5This rate is simply the number of misses in a cache divided by the total number of memory accesses to this cache.
TABLE VI
AVERAGE L1 AND L2 CACHE MISS RATES
FROM THE ENTIRE SPEC2K BENCHMARKS
L1 cache accesses; an insufficient number of L2 accesses may
result in unrepresentatively higher L2 cache miss rates.
Table VI shows the average L1 and L2 cache miss rates from
the entire SPEC2 K benchmarks for 16-, 32-, and 64-KB L1
caches, respectively. We used direct-mapped L1 instruction
caches and four-way set associative L1 data caches. Also, we
used eight-way set associative L2 caches. For simplicity, each
L1 cache miss rate is obtained by taking the sum of the number
of total instruction and data cache misses and dividing by the
sum of total instruction and data cache accesses; a 16-KB L1
means instruction and data caches are each 16 KB in size. Since
an L2 miss rate is a function of the L1 cache miss rate, we
measure the separate L2 cache miss rates for each L1 cache size
configuration. Those cache miss characteristics will definitely
affect the leakage optimization direction of two-level cache
memory systems.
B. L2 Cache Leakage Power Optimization
Since an L2 cache’s contribution to leakage power dominates
due to their size, we will examine the leakage power optimiza-
tion of the L2 cache first. Consider caches designed with low- (0.2 V) devices and a baseline cache memory system consisting
of 16 and 128 KB for L1 and L2 caches, respectively. Then,
we have leakage power consumption and AMAT corresponding
to this configuration. Increasing of the 128-KB L2 cache
will reduce the leakage power of the L2 cache, but it will in-
crease the AMAT of the cache memory system because of the in-
creased access or hit time. However, there is a way to reduce the
leakage power of the cache memory system without increasing
the AMAT that significantly impacts on the execution time of
the system.
The key to reducing leakage power without increasing AMAT
is to compensate for the increased L2 access time by reducing
the cache miss rate of the cache memory system. To reduce the
miss rate, we can increase the L2 cache size. The main memory
access penalty is quite significant in term of both time and en-
ergy. Hence, even a slight reduction of L2 cache miss rates re-
sults in a significant improvement in the AMAT. We note that
although area was one of the most important design constraints
in the past, this trend is changing and power is becoming an
Fig. 8. L2 leakage power optimization at a fixed L1 size (16 KB). (1) and (2) are the leakage power consumption of the 256- and 512-KB caches at the same AMAT as the baseline 128-KB cache, respectively.
equally important constraint in many situations [22]. In this ar-
gument, we assume that the same AMAT will approximately
give us the same execution time for a fixed processor core, L1
cache size, and benchmark program, so that we can fairly com-
pare the total leakage energy consumption as well.
Fig. 8 shows the leakage power versus AMAT of L2 caches
with a fixed L1 cache size—16 KB. The leakage power opti-
mization for individual caches is based on scheme III that re-
quires two additional distinct high ’s for L2. Assuming the
AMAT of the fastest 128-KB L2 cache designed with low- (0.2 V) as a baseline, we compare the leakage power of other
caches at the same AMAT point; see the (1) and (2) points in
Fig. 8. The (1) and (2) points are the leakage power consump-
tion of the cache system with the 256- and 512-KB caches at
the same AMAT as the baseline 128-KB cache system. As can
be seen from the plots, the AMAT can be maintained while the
leakage power can be reduced by replacing the baseline 128-KB
L2 cache with a 256-KB L2 cache that is intentionally slowed
down by increasing its ’s to reduce leakage.
This replacement with the double-sized L2 cache reduces
the leakage power by 70% compared to the fastest but leakiest
128-KB L2 cache with the same AMAT. Similarly, the use of a
512-KB L2 cache can further reduce leakage compared to the
256-KB cache; see the vertical line in Fig. 8.
Finally, the employment of larger L2 caches also reduces
the average dynamic power of the memory system, because
the larger L2 caches reduce the number of external memory
accesses that consume a significant amount of dynamic energy.
Table VII summarizes the results for the normalized leakage
power and normalized average memory access energy for each
L1 cache size designed using scheme III at a fixed AMAT. To
compare leakage power and AMAE, the following standard
cache configurations were used: 128-KB L2 with 16-KB L1,
256-KB L2 with 32-KB L1, and 512-KB L2 with 64-KB L1.
The shaded numbers represent the baseline L2 configuration,
leakage power, and AMAE. Table VII shows the counterintu-
itive results that we can reduce both leakage power and AMAE
by employing larger L2 caches while maintaining the same
AMAT.
TABLE VII
L2 CACHE NORMALIZED LEAKAGE AND AMAE AT THE FIXED L1 SIZE (16 KB) AND AMAT
Fig. 9. L1 leakage power optimization at a fixed L2 size (512 KB). (1) and (2) are the leakage power consumption of the 32- and 16-KB caches at the same AMAT as the baseline 64-KB cache, respectively.
C. L1 Cache Leakage Power Optimization
It is rather difficult to improve the L1 cache miss rates fur-
ther, because they are already very low for 16-, 32-, and 64-KB
caches in the case when SPEC2K benchmarks are run. Hence,
the access time of caches become a dominant factor in deter-
mining the AMAT. For example, the access time of a 64-KB
L1 cache increases by 48% compared to the fastest 16-KB L1
cache, because the access time is very sensitive to size in small
caches. Essentially, cache access time increases logarithmically
with size, but has a steeper slope for smaller caches than for
larger caches. This observation explains why the AMAT of a
cache hierarchy with a smaller L1 cache can be faster than one
with a larger L1 caches for a certain range of cache sizes (e.g.,
16 or 64 KB).
Fig. 9 shows the leakage power versus the AMAT of 16-, 32-,
and 64-KB L1 caches using scheme III each with a fixed L2
cache of size 512 KB. Like the comparison performed in Section
IV–B, the leakage power of different caches is compared at the
same AMAT point. The plots show that leakage power can be
reduced by replacing the fastest 64-KB L1 cache with a 32-KB
L1 cache that is intentionally slowed down by increasing its
’s to reduce the leakage power—the resulting cache memory
TABLE VIII
L1 CACHE NORMALIZED LEAKAGE AND AMAE AT THE FIXED L2 SIZE (512 KB) AND AMAT
system still has the same AMAT. Similarly, a slowed 16-KB
cache with increased ’s can replace a 32-KB cache without
changing the AMAT of the L1/L2 hierarchy. The new system
consumes much less leakage power; see points (1) and (2) in
Fig. 9, which are the leakage power consumption of the cache
system with the 32- and 16-KB caches at the same AMAT as
the baseline cache system.
Table VIII shows the results for normalized leakage power
and AMAE as a percentage of each fast but leaky L1 cache
size using scheme III with fixed AMATs. The comparisons were
performed in the same manner as Table VII. The shaded num-
bers represent the baseline L1 configuration, leakage power,
and AMAE. According to the comparisons, we can reduce both
leakage power and AMAE by employing smaller L1 caches.
This is the inverse of the case for L2 caches, where the leakage
of the overall memory system can be reduced by increasing their
size. However, it should be noted that these results are only valid
within the specific set of sizes and simulation environment given
in this discussion. First, a 4-KB L1 cache will have a cache
miss rate that is much higher than a 16-KB cache, but its access
time will not be sufficiently smaller to make the tradeoff worth-
while. Also, the normalized AMAE is rather high because the
total power fraction of L1 caches is relatively small compared to
L2 caches. Second, many SPEC2K benchmark programs have
very high locality compared to real-world larger size applica-
tions. This results in quite low cache miss rates for small-size
L1 caches as shown in Table VI. Third, the operating system
(OS) context switching was not modeled due to our limited sim-
ulation environment. The context switching typically increases
cache miss rates, because cache flushing increases cold start
misses. These factors must be considered if one is to perform
realistic cache leakage power optimizations with the proposed
techniques.
V. CONCLUSION
In this study, we examined the leakage power and access time
tradeoff for caches where multiple ’s are allowed. We used
curve fitting techniques to model subthreshold leakage power
and access time. Our results show that two extra distinct high
’s for caches—3 ’s including the for the micropro-
cessor core logic—are sufficient to yield a significant reduction
in leakage power. Such an arrangement can reduce the leakage
power by as much as 91%. We also show that smaller L1 and
larger L2 caches than are typical in today’s processors result
in significant leakage and dynamic power reduction without af-
fecting the average memory access time. Given that the pro-
cessor core may need a distinct , and each of the caches may
need up to two ’s (scheme III) we could require up to five
distinct ’s for the leakage power optimization of two-level
cache memory systems.
Even though the modeling and optimization techniques pre-
sented in this study have been performed using continuous-do-
main functions, the actual cache latencies are integer numbers of
processor clock cycles. Cache designers or architects can choose
an appropriate discrete point from the continuous-domain re-
sults depending on their target processor core clock frequency.
Furthermore, the circuit techniques combined with microar-
chitectural level controls exemplified by drowsy caches [10] are
designed to reduce the leakage power of L1 caches when sac-
rificing access time is not an option. Such an approach is less
attractive for L2 caches. The same effect can be obtained more
simply by using high- circuits.
REFERENCES
[1] N. S. Kim et al., ―Leakage current: Moore’s law meets static power,‖ IEEE Computer , vol. 36, no. 12, pp. 68–75, Dec. 2003.
[2] G. Sery, S. Borkar, and V. De, ―Life is CMOS: Why chase life after?,‖
in Proc. IEEE Design Automation Conf., 2002, pp. 78–83. [3] S. Mutoh et al., ―1-V power supply high-speed digital circuit technology
with multithreshold-voltage CMOS,‖ IEEE J. Solid-State Circuits, vol. 30, no. 8, pp. 847–854, Aug. 1995.
[4] T. Douseki, N. Shibata, and J. Yamada, ―A 0.5–1 V MTCMOS/SIMOX SRAM macro with multi-Vth memory cells,‖ in Proc. IEEE Int. SOI Conf., 2000, pp. 24–25. [5] K. Nii et al., ―A low power SRAM using auto-backgate-
controlled MT-CMOS,‖ in Proc. IEEE Int. Symp. Low Power Electronic Device, 1998, pp. 293–298.
[6] H. Mizuno et al., ―An 18- A standby current 1.8-V, 200-MHz micropro- cessor with self-substrate-biased data-retention mode,‖ IEEE J. Solid- State Circuits, vol. 34, no. 11, pp. 1492–1500, Nov. 1999.
[7] F. Hamzaoglu et al., ―Analysis of dual-V SRAM cells with full-swing single-ended bit line sensing for on-chip cache,‖ IEEE Trans. Very Large Scale (VLSI) Syst., vol. 10, no. 2, pp. 91–95, Apr. 2002.
[8] M. Powell et al., ―Gated-V : A circuit technique to reduce leakage in deep-submicron cache memories,‖ in Proc. IEEE Int. Symp. Lower Power Electronics & Design, 2000, pp. 90–95.
[9] A. Agarwal, L. Hai, and K. Roy, ―A single-V low-leakage gated-ground cache for deep submicron,‖ IEEE J. Solid-State Cir- cuits, vol. 38, no. 2, pp. 319–328, Feb. 2003.
[10] N. S. Kim et al., ―Drowsy instruction caches,‖ in Proc. IEEE Int. Symp. Microarchitecture, 2002, pp. 219–230.
[11] S. Yang et al., ―An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches,‖ in Proc. IEEE Int. Symp. High-Performance Computer Architecture, 2001, pp. 147–157.
[12] S. Kaxiras et al., ―Cache decay: Exploiting generational behavior to re- duce cache leakage power,‖ in Proc. IEEE Int. Symp. Computer Archi- tecture, 2001, pp. 240–251.
[13] H. Zhou et al., ―Adaptive mode-control: A static-power-efficient cache design,‖ in Proc. IEEE Parallel Architecture and Compilation Tech., 2001, pp. 61–70.
[14] N. S. Kim et al., ―Leakage power optimization techniques for ultra deep sub-micron multi-level caches,‖ in Proc. IEEE Int. Conf. Computer Aided Design, 2003, pp. 627–632.
[15] Standard Performance Evaluation Corporation [Online]. Available: http://www.specbench.org
[16] S. Wilton et al., ―An Enhanced Access and Cycle Time Model for
On-Chip Caches,‖, Western Res. Lab. Res. Rep. 93/5, 1993. [17] K. Ghose and M. Kamble, ―Reducing power in superscalar processor
caches using subbanking, multiple line buffers and bit-line segmenta- tion,‖ in Proc. IEEE Int. Symp. Low Power Electronic and Design, 1999, pp. 70–75.
[18] J. Hennessy et al., Computer Architecture—A Quantitative Approach, 3rd ed. San Mateo, CA: Morgan Kaufmann, 2003, pp. 406–408.
[19] 800/1066 MHz RDRAM Advanced Information (2002). [Online]. Avail- able: http://www.rambus.com
[20] V. Delaluz et al., ―Compiler-directed array interleaving for reducing en- ergy in multi-bank memories,‖ in Proc. IEEE Asia South Pacific Design Automation Conf., 2002, pp. 288–293.
[21] T. Austin et al., ―SimpleScalar: An infrastructure for computer system modeling,‖ IEEE Computer, vol. 35, no. 2, pp. 59–67, Feb. 2002.
[22] T. Mudge, ―Power: A first class design constraint,‖ IEEE Computer, vol. 34, no. 4, pp. 52–57, Apr. 2001.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0112-1
Abstract—Reliability has been important in many applications
and it has been convenient as the size and cost of chips has been
reduced drastically , reliability in electronics circuits is achieved
through fault tolerant where the system itself is able to tolerate
the fault and mask the error, fault tolerant in circuits is achieved
by various redundancy methods( as hardware , software ,
information, and time) but these redundant methods are
different for analog and digital systems so in this paper we have
discussed the important method for analog and digital circuits to
make them fault tolerable. In this paper digital fault tolerant
design has been explained with majority and minority voting and
how fault is injected in the circuits for testing using VHDL.
Analog fault tolerant design has been explained with the help of
fuzzification. The platform used for digital circuits is Xilinx-12.4i
(ISE) and for analog is MATLAB.
Index Terms: Triple modular redundancy, Majority & Minority,
Voter Fuzzification.
I. INTRODUCTION
There are various methods to make a system fault tolerable but
the most basic is TMR method where the module which has
to be made reliable, is made redundant by taking three
identical modules in parallel in both hardware and software
and so the reliability of system increases as it can give the
right output even on failure of one module.
The basic block diagram of TMR system has been shown in
figure 1 as follows:
Figure 1
Here the most important part is voting unit which plays an
important role in reliability of system, as the results in analog
systems and digital systems are different so this voting unit
plays a distinguished part in both these systems another thing is
that the voting unit is not redundant here so what happens if it
fails? So these are parts of discussion of this paper.
The distribution of this paper is as follows. In Section II,
we make a short review of the most common fault tolerant
technique with its mathematical expression that how reliability
is increased as this is the basic method for both digital and
analog systems Section III describes the fault tolerance in
digital circuit’s environment and how faults are injected in
FPGA circuits for testing. In Section IV, the fault tolerant
technique for analog systems has been discussed with the help
of fuzzy logic. The discussion of the results for both analog
and digital circuits is provided in Section V. And finally the
future work and scope have been explained in Section VI.
II. TRIPLE MODULAR REDUNDANCY
The basic block diagram of TMR system has been shown
above let the reliability of a single module is . Now the
above TMR system will give the correct output if either two or
three modules will perform correct operation so if the
reliability of above system is then
Department of Electronics Engineering
Institute Of Technology, BHU
Varanasi,India
Email-pathak.akhilesh, agarwaltarang07,[email protected],
Fault Tolerant Design for Analog and digital Circuits
Dr.Anand Mohan ,Akhilesh Pathak, Tarang Agarwal, Trailokya Nath Sasamal
2 1
1,2,3,4
4 3
During lifetime of a system it is tested and diagnosed on
numerous occasions. For the system to perform its intended
mission with high availability, testing and diagnosis must be
quick and effective. A sensible way to ensure this is to specify
testing as one of the system functions– in other words, self-test.
Reliability, availability, and safety (RAS) are the major factors
for consideration in system design to provide continuous
correct operation [1]. Since faults cannot be completely
eliminated, critical systems always employ fault tolerance
techniques to guarantee high reliability and availability. Fault
tolerance (FT) techniques try to keep the system operational
despite the presence of faults [2]. FT can be achieved through
hiding the occurrence of faults and preventing it from
generating errors (fault-masking), or through fault detection
and fault repairing.
RR
RRRRR
mm
mmmmS
23
1132
033
3
23
2
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0112-2
The reliability of above system will be greater than single
module if
So the reliability of overall system will be greater than the
single module if > 0.5. There is a single voter unit in
above circuit so if this voter unit fails the complete circuit will
fail so it is important to consider the reliability of voting unit
also.
Let the reliability of voter unit is , so the reliability of
triplicated TMR will be greater than TMR system if:
Or 2
2 > 3 - 2
These are the mathematical conditions for a triplicated system
to be more reliable as compared to TMR system.
III. FAULT TOLERANT DESIGN IN DIGITAL CIRCUITS
Figure 2
The above truth table and circuit diagram (figure 2) shows the
basic TMR system. Here the voting unit is not redundant so if
it fails the circuit will not be able to give the correct output. So
to make the circuit more reliable triplicated TMR with
minority voting is used the basic circuit diagram of it has been
shown below:
Figure 3
The method of using TMR with only one majority voter
circuit is still flawed; this is because the SEU not only could
affect the redundant modules but can also affect the voting
circuit itself. To alleviate this issue the majority voting circuit
must also be redundant. These redundant majority voters must
be compared using minority voter circuits (figure3). The
minority voters also take in three inputs, the primary path and
two other redundant paths in question. If the primary path is in
the majority with one other redundant path then the output is
low. If the primary path is in the minority in comparison with
the two other redundant paths then the output is high. Figure 4
below is a schematic and truth table of the minority voting
circuit.
Figure 4
This minority voter output is fed into the control signal
of a tri-state buffer with an inverted control input. If the path
in question is the minority then the tri-state buffer will be
placed into high-impedance. If the path in question is in the
majority then its corresponding tri-state buffer will allow the
path to follow through to output. These three outputs will
connect together outside of the FPGA into a wired-OR
fashion. Figure shows the minority voters controlling the tri-
state buffers which feed outside to the wired-OR gate.
IV. FAULT TOLERANT DESIGN IN ANALOG CIRCUITS
Voting on the results of redundant modules with discrete
values is straightforward, and is referred to as exact voting.
The 3- input exact majority voter for example produces a
correct output when 2-out of-3 of its inputs are equal.
However, exact voting on the results of redundant modules
with real number outputs is not appropriate. For data derived
directly from noisy sources, for the outputs which are read by
digital computers, for the output of replicated remote sensors
in fault tolerant data acquisition systems, or for the output of
diversely implemented software programs which handle
floating point arithmetic, an exact match is generally
impossible. So in case of analog signals exact match of results
from redundant modules is generally impossible. Various
solutions have been proposed for it, and most common method
used is median-selector algorithm method, it selects the mid
value of the voter inputs and then uses this value directly as
the voter output. Another solution for handling approximate
redundant value is the use of inexact (threshold) voters. In this
technique if the difference between outputs of two modules is
less than a threshold value then they will be in agreement
otherwise in disagreement, so to make it more reliable
dynamic threshold method is used where the threshold value is
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0112-3
not fixed but it varies according to the module outputs, fuzzy
voting comes in this category.
A. fuzzy voter
Fuzzy voter described here uses fuzzy logic to calculate the
weights of modules and the final output[10].
The basic block diagram of the fuzzy voting has been shown
in figure 5:
Figure 5
The final output y will be calculated on the basis of weights of
voter inputs as:
Here the value of weights will lie in the range of [0, 1], Where
0 means that the particular module is completely in
disagreement with other modules while 1 means that module is
in complete agreement with other modules. The membership
of difference of input pairs [8] has been defined as:
The membership of output w has been defined as:
So the weight will be calculated with the fuzzy rules as:
This fuzzy voting mechanism shows the better availability and
safety than previous methods for small and medium errors but
it does not show much effective result for large errors so if
instead of taking constant parameters [p, q, and r] we can
make them variable then this system will be able to show
better performance even with larger errors.
V. RESULTS
The results for digital circuits are as follows:
Implementation Results:
The basic circuit used for description of reliability is ALU;
here single module of ALU, Triple module of ALU and
Triplicated TMR of ALU has been implemented. The tables
below show how much area is utilized on FPGA board in
terms of slices/LUTs.
The faults have been injected in the circuit by adding extra
component to the actual circuit so that logic of circuit is
changed this is known as SABOTEUR METHOD.
Circuit Implementation without TMR:
Circuit Implementation with TMR:
XUPV5-LX110T Speed Grade-3
Used Available utilization
Number of Slice LUTs
46 69,120 1%
Number of
BUFG/BUFGCT
RLs
1 32 3%
Number of occupied Slices
22 17,280 1%
Number of bonded IOBs
53 640 8%
XUPV5-
LX110T
Speed Grade-3
Used Available utilization
Number of
Slice LUTs 49 69,120 1%
Number of BUFG/BUF
GCTRLs
1 32 3%
Number of
occupied Slices
27 17,280 1%
Number of bonded
IOBs
53 640 8%
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0112-4
Circuit Implementation with Triplicated TMR:
Comparison of maximum path delay:
XUPV5-LX110T
Speed Grade-3 Without
TMR
With TMR
With Triplicated
TMR
Utilized area (slices)
22 27 28
Maximum path
delay (ns)
5.143ns 5.618ns 5.150ns
The results for analog circuits are as follows:
A comparison of results of existing fuzzy voter and improved
fuzzy voter has been shown here with the basic formula as:
Performance= (1 - )
First graph has been plotted for correct output 1 with injecting
errors in TMR modules and max allowed error was 0.1, error
Has been injected in modules in range [-0.5, 0.5]
Second graph has been plotted for correct output 5 and max
allowed error is 0.5, error has been injected in modules in
range [-1, +1]:
Both the graphs show that improved fuzzy logic shows the
better results as compared to existing fuzzy logic even in
presence of larger errors as shown in second graph.
VI. FUTURE WORK
The demand of reliability is increasing day by day even
in less critical systems, so in future to make a system more
reliable survivability approach will be dominating where even
if the system fails the critical part should not go down. So the
next step in digital circuits in this project will be survivability
while in case of analog circuits the concept of both fuzzy logic
and genetic may come together to make a system more reliable
REFERENCES
[1] Two Flows for Partial Reconfiguration: Module Based or
Difference Based, Xilinx [2] J.C. Baraza , J. Gracia, D. Gil, P.J. Gil , “A prototype of a VHDL-based
fault injection tool: description and application.
[3] Tobias Becker, Wayne Luk1 and Peter Y.K. Cheung, “Enhancing Relocatability of Partial Bitstreams for Run-Time
Reconfiguration”.
[4] F. Lima, C. Carmichael, J. Fabula, R. Padovani, R. Reis,” A Fault
Injection Analysis of Virtex FPGA TMR Design Methodology”.
[5] C. Carmichael. Triple Modular Redundancy Design Techniques
for Virtex FPGAs. Xilinx, xapp197 (v1.0) edition, 2001 [6] Khaled Elshafey and Ahmed Elhosiny.” on-line testing and diagnosis of
microcontrollers”
[7] Fabian Vargas, Alexandre ,Amory Raoul ,” Estimating Circuit Fault-Tolerance by Means of Transient-Fault Injection in VHDL”
[8] “Fuzzy logic with engineering applications” by Timothy J Ross.
[9] “Fuzzy sets and fuzzy logic theory and applications” by George J. Klir
and Bo Yuan. [10] “A fuzzy voting scheme for hardware and software fault tolerant
systems”, G. Latif-Shabgahi, A.J. Hirst / Fuzzy Sets and Systems 150 (2005) 579–598
[11] “Fuzzy logic tutorial” from MATLAB
XUPV5-LX110T
Speed Grade-3
Used Available utilization
Number of Slice
LUTs 49 69,120 1%
Number of BUFG/BUFGCT
RLs
1 32 3%
Number of occupied Slices
28 17,280 1%
Number of
bonded IOBs 107 640 16%
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0112-5
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0113-1
Floating Point Arithmetic Operations Using
VHDL S.C. Yadav, S. S. Chauhan
1, A. R. Khan
2
Electronics & Communication Engg.,1,2
Graphic Era University
566/6 Bell Road, Dehradun (India)
[email protected], [email protected], [email protected] Abstract-
In this paper an asynchronous programmable chip capable
of performing floating point arithmetic operations has
been designed. An asynchronous chip is the one wherein
the operations performed are not clock dependent and
hence are faster. The developed chip is operated by loading
the proper values of control and status registers. The result
is obtained by reading the result register
I. INTRODUCTION
A OBJECTIVE
The objective is to design an asynchronous
programmable chip, capable of performing IEEE: 754 –
1985 standard based floating point arithmetic
operations.
The complete design of the chip constitutes the
individual modules developed for floating point
addition/subtraction, multiplication and division.
The language of choice is VHDL.
B FLOATING POINT
Floating point system was developed to provide high
resolution over a large dynamic range. Floating point
system often can provide a solution when fixed point
system, with their limited precision and dynamic range
fails. Floating point systems comply with the published
single or double precision IEEE floating point standard.
There are basically two types of IEEE floating point
Representation.
(1) Single Precision
(2) Double Precision
Single Precision
The IEEE single precision floating point standard
representation requires a 32 bit word, which may be
represented as numbered from 0 to 31, left to right. The
first bit is the sign bit, S, the next eight bits are the
exponent bits, 'E', and the final 23 bits are the fraction
'F':
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
31 30 23 22 0
A standard floating point word consists of
(1) Sign-Bit (s)
(2) Exponent (e)
(3) Normalized Mantissa (m)
1) Sign Bit
The sign bit is as simple as it gets. 0 denotes a positive
number; 1 denotes a negative number. Flipping the
value of this bit flips the sign of the number.
2) Exponent
The exponent field needs to represent both positive
and negative exponents. To do this, a bias is added to
the actual exponent in order to get the stored
exponent. For IEEE single-precision floating point,
this value is 127. Thus, an exponent of zero means
that 127is stored in the exponent field. A stored value
of 200 indicates an exponent of (200-127), or 73. For
reasons exponents of -127 (all 0s) and +128 (all 1s)
are reserved for special number. For double precision,
the exponent field is 11 bits, and has a bias of 1023.
3) Mantissa
The mantissa known as the significand, represents the
precision bits of the number. It is composed of an
implicit leading bit and the fraction bits.
To find out the value of the implicit leading bit,
consider that any number can be expressed in
scientific notation in many different ways.
In order to maximize the quantity of representable
numbers, floating-point numbers are typically stored
in normalized form. This basically puts the radix point
after the first non-zero digit. In normalized form, five
is represented as 5.0 × 100.
A nice little optimization is available to us in base
two, since the only possible non-zero digit is 1. Thus,
we can just assume a leading digit of 1, and don't need
to represent it explicitly. As a result, the mantissa has
effectively 24 bits of resolution, by way of 23 fraction
bits.
Special Values
IEEE reserves exponent field values of all 0s and all
1s to denote special values in the floating-point
scheme.
i) Zero
Zero is not directly representable in the straight
format, due to the assumption of a leading 1 ( need to
specify a true zero mantissa to yield a value of zero).
Zero is a special value denoted with an exponent field
of zero and a fraction field of zero. Note that -0 and
+0 are distinct values, though they both compare as
equal.
In particular,
0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0
ii) Denormalized
If the exponent is all 0’s, but the fraction is non-zero
(else it would be interpreted as zero), then the value is
a denormalized number, which does not have an
assumed leading 1 before the binary point. Thus, this
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0113-2
represents a number (-1)s × 0.f × 2
-126, where s is the
sign bit and f is the fraction. For double precision,
denormalized numbers are of the form (-1)s × 0.f × 2
-
1022. From this you can interpret zero as a special type of
denormalized number.
If 0<E<255 then V=(-1)S* 2
(E-127) * (1.F)
where "1.F" is intended to represent the binary
number created by prefixing F with an implicit
leading 1 and a binary point.
If E=0 and F is nonzero, then V=(-1)S * 2
(-126)
* (0.F) These are "unnormalized" values.
iii) Infinity
The values +∞ and -∞ are denoted with an exponent of
all 1s and a fraction of all 0s. The sign bit distinguishes
between negative infinity and positive infinity. Being
able to denote infinity as a specific value is useful
because it allows operations to continue past overflow
situations .Operations with infinite values are well
defined in IEEE floating point .
0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity
iv) Not A Number
The value NaN (Not a Number) is used to represent a
value that does not represent a real number. NaN's are
represented by a bit pattern with an exponent of all 1s
and a non-zero fraction.
0 11111111 00000100000000000000000 = NaN
1 11111111 00100010001001010101010 = NaN
There are two categories of NaN: QNaN (Quiet NaN)
and SNaN (Signalling NaN).
a) QNaN is a NaN with the most significant
fraction bit set. QNaN's propagate freely
through most arithmetic operations. These
values pop out of an operation when the result
is not mathematically defined.
b) SNaN is a NaN with the most significant
fraction bit clear. It is used to signal an
exception when used in operations. SNaN's can
be handy to assign to uninitialized variables to
trap premature usage.
Semantically, QNaN's
denote indeterminate operations, while SNaN's denote
invalid operations.
Summary:
The value V represented by the word may be
determined as follows:
If E=255 and F is nonzero, then V=NaN ("Not
a number")
If E=255 and F is zero and S is 1, then V= -
Infinity
If E=255 and F is zero and S is 0, then V=
Infinity
If 0<E<255 then V=(-1)S * 2
(E-127) * (1.F)
where "1.F" is intended to represent the binary
number created by prefixing F with an
implicit leading 1 and a binary point.
If E=0 and F is nonzero, then V=(-1)S * 2
(-
126) * (0.F) These are "unnormalized" values.
If E=0 and F is zero and S is 1, then V= -0
If E=0 and F is zero and S is 0, then V= 0
0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0
0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity.
0 11111111 00000100000000000000000 = NaN
1 11111111 00100010001001010101010 = NaN
0 10000000 00000000000000000000000 = +1 * 2(128-127)
* 1.0 = 2
0 00000001 00000000000000000000000 = +1 * 2(1-127)
* 1.0 = 2(-126)
0 00000000 10000000000000000000000 = +1 * 2(-126)
* 0.1 = 2(-127)
0 00000000 00000000000000000000001 = +1 * 2(-126)
*
0.00000000000000000000001 = 2(-149)
(Smallest positive value)
Special Operations
Operations on special numbers are well-defined by
IEEE. In the simplest case, any operation with a NaN
yields a NaN result. Other operations are as follows: Table 1
Special Operations in floating point
Operation Result
n ÷ ±Infinity 0
±Infinity × ±Infinity ±Infinity
±nonzero ÷ 0 ±Infinity
Infinity + Infinity Infinity
±0 ÷ ±0 NaN
Infinity – Infinity NaN
±Infinity ÷ ±Infinity NaN
±Infinity × 0 NaN
Double Precision
The IEEE double precision floating point standard
representation requires a 64 bit word, which may be
represented as numbered from 0 to 63, left to right.
The first bit is the sign bit, S, the next eleven bits are
the exponent bits, 'E', and the final 52 bits are the
fraction 'F'.
The value V represented by the word may be
determined as follows:
If E=2047 and F is nonzero, then V=NaN
("Not a number")
If E=2047 and F is zero and S is 1, then V= -
Infinity
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0113-3
If E=2047 and F is zero and S is 0, then V=
Infinity
If 0<E<2047 then V=(-1)S * 2
(E-1023) * (1.F)
where "1.F" is intended to represent the binary
number created by prefixing F with an implicit
leading 1 and a binary point.
If E=0 and F is nonzero, then V=(-1)S * 2
(-1022)
* (0.F) These are "unnormalized" values.
If E=0 and F is zero and S is 1, then V= -0
If E=0 and F is zero and S is 0, then V= 0
II WORKING PRINCIPLE
A MY CHIP
Chip consists of 2 unidirectional buses, each 32 bits to
accommodate the input and the output. It consists of a 2
bit address bus for selecting the desired register in the
chip.
Signal description:
1) r/w ( read/write) signal to perform the read or
write operation . A high indicates read
operation and the low indicates write
operation.
2) rst (Reset) signal to reset the chip contents.
3) Int (Interrupt) signal to interrupt the processor
about some abnormality in the functioning of
the chip.
Fig. 1 Block Diagram of My Chip
Control Register:
IE
X
X
Mode
X
Op2
Op1
Op0
Fig. 2 Control Register format.
The flags of the control register are defined as:
IE stands for Interrupt Enable. When this flag is low (0)
no interrupt is generated and when this flag is high (1)
interrupt is generated under certain conditions.
Mode flag is low when being used for signed
operation and high when being used for floating point
operation.
X representrs don’t care condition.
Operation to be performed by the chip is selected
using the last three bits of the control register. Table 2
Opcodes for various operations. Op2 Op1 Op0 Operation selected
0 0 0 Addition
0 0 1 Subtraction
0 1 0 Multiplication
0 1 1 Division
Status Register
F1F
F2F
RF
NAN
OF
UF
DE
Z
Fig. 3 Status Register Format
The flags of the status register are defined as:
F1F flag is high when operand 1 is loaded on the chip.
F2F flag is high when operand 2 is loaded on the chip.
RF flag when high indicates the completion of the
selected operation by the chip.
NAN flag is high when the content of the result
register is wrong i.e. NaN (not a number) condition
has been encountered.
OF flag is high when the content of the result register
exceeds the higher bound limit.
UF flag is high when the content of the result register
crosses the lower bound limit or when a denormalized
number is encountered .
DE flag is high when division by zero (0) error
occurs.
Z flag is high when the result of the operation is zero.
Register Mapping Table 3
Access codes for registers in my _chip module
Read/write Address Bus Register
X 00 F1
X 01 F2
1 10 RES
0 10 Control Register
X 11 Status Register
When address bus is loaded with 00 then register F1 is
port mapped for read or write operation. The mapping
of register F2 has been done using address bus code
01.
The optimization of address bus has been done for the
code 10 where RES register is mapped only for read
operation and control register only for write operation.
Status register has been portmapped for address bus
code 11.
B FLOATING POINT ARITHMETIC OPERATION
B.1 Addition & Subtraction:
Addition and Subtraction are performed
using module fp_ads. Steps to perform the
addition & subtraction operation are:
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0113-4
Step 1: Check which exponent is bigger and
shifts the mantissa of the smaller number till
the difference between the two numbers is
reached. If the exponents are equal, then both
numbers mantissa’s are checked for the bigger
one.
Step 2: Add the exponents of the two numbers.
If sign bit of both numbers is same otherwise
we subtract them. Same operation is performed
with the mantissa of the input operands.
Step 3: The abnormality of negative exponents
is resolved by shifting the required number of
bits to get the correct result. To see whether
result has encountered an overflow error
boundary conditions are checked.
B.2 Multiplication
Multiplication of floating point numbers is
done by using module fp_mul. Steps for the
multiplication operation are as follows:
Step 1: When we multiply two numbers having
the same base their powers are added. Similarly
here we add the exponents of the two operands.
Step 2: Booth multiplication (shift and add)
technique is employed to multiply the
mantissa’s of the two numbers along with the
‘hidden bit’. Mantissa multiplication result is
saved in a 49 bit temporary register.
Step 3: Negative exponents abnormality is
removed to get the resultant number mantissa
and exponent.
B.3 Division
Division operation is performed using module
fp_div utilizing fixed point division technique.
Steps to divide p by q, both of n+1 bits are as
follow:
Step 1: Store the numbers p & q in temporary registers
p_temp & q_temp of 2n+1 bits each
respectively.
Step 2: Compare the values of p_temp & q_temp.
If p_temp > q_temp subtract q_temp from
p_temp and store 1 in the quotient register and
move to the next iteration.
If p_temp<q_temp store 0 in the quotient
register and move to the next iteration.
Step 3:After n+1 iterations quotient is saved in quotient
register and remainder is saved in p_temp.
There are three components used in this design:
i) fp_ads used for floating point addition and
subtraction operation.
ii) fp_mul used for floating point multiplication
operation.
iii) fp_div used for floating point division
operation.
III PROGRAMMING THE CHIP
Chip programming consists of a series of step which
must be followed for the efficient functioning of the
chip.Chip programming consists of the following steps:
Step 1: Chip is made available for the floating point
arithmetic operations by making rst (reset) signal low.
At this point, all the contents of the chip registers are
erased and the chip is ready afresh for a new
calculation/computation.
Step 2: To load the first operand onto the chip register
mapping is required making read/write signal low and
the loading address bus with 00.
Step 3: To load the second operand onto the chip
read/write signal is kept low while the status of
address bus is changed to 01 for the required register
mapping.
Step 4: To select the operation to be performed last
three bits of the control register are taken into account
while the address bus indicates 11 and the read/write
signal is low.
Refer Table 2. For opcodes of various operations.
Step 5: A start signal is generated by checking the
F1F and F2F flag of the status register to commence
the selected operation while the address bus shows 10
and the read /write signal is low.
Step 6: The confirmation of operation completion is
checked by the status of the RF flag of the status
register which should be high for successful
completion of operation while the read/write signal is
high and the address bus indicates 10.
Step 7: The result of the arithmetic operation done is
viewed by checking the dataout signal while the
read/write signal is high and the address bus indicates
11
Step 8: The previous entered input values can be
viewed by keeping the read/write signal high while
keeping address bus 00 for operand 1 and 01 for
operand 2.
IV RESULTS AND DISCUSSIONS
A ADDITION
Fig. 4: Floating Point Addition Simulation
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0113-5
Table 5.
SIMULATION EXAMPLE FOR FP ADDITION.
Base 10 Sign
Bit
Exponent
Bits
Mantissa
Bits
HEX
Equivalent
F1 4444.44 0 1000
1011
0001 0101
1100 0111 0000
101
458AE385
F2 5555.56 0 1000 1011
0101 1011 0011
1000 1111
010
45AD9C7A
RES 10000 0 1000 1100
0011 1000 1000
0000 0000
000
461C3FFF
B SUBTRACTION
Fig. 5:Floating Point Subtraction Simulation
TABLE 6:
SIMULATION EXAMPLE FOR FP SUBTRACTION.
Base
10
Sign
Bit
Expone
nt Bits
Mantissa
Bits
HEX
Equivalent
F1 85.73 0 1000
0101
0101 0111
0000
1010 0011 111
42AB8517
F2 49.96 1 1000
0100
1000 1111
1010
1110 0001 010
C247D70A
RES 35.80 0 1000
1100
0001 1110
0110 0110 0110
011
420F3334
C MULTIPLICATION
Fig. 6: Floating Point Multiplication Simulation
TABLE 7
SIMULATION EXAMPLE FOR FP MULTIPLICATION
Base 10 Sign
Bit
Exponent
Bits
Mantissa
Bits
HEX
Equivalent
F1 148.75 0 1000
0110
0010
1001 1000
0000
0000 000
4314C000
F2 1092.86 0 1000
1001
0001
0001
0011 0111
0000 101
44889B85
RES 162562.925 0 1001
0000
0011
1101
1000 0001
0111 011
481EC0BB
D DIVISION
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0113-6
Fig. 7: Floating Point Division Simulation
Fig.8. Simulation of My_chip
V CONCLUSION
Floating point operations are widely used in the
digital signal processing applications and can be
implemented using PDPs (Programmable Digital
Processors). But a large amount of data processing is
required because of complex computations. This
affects the cost, speed and flexibility of the DSP
systems. In this paper floating point arithmetic
operations have been successfully simulated using
ModelSim .
Future Aspects of project
Future aspects should include the following:
1) Fast Fourier Transform computation.
2) Digital Signal Processing.
3) Infinite Impulse Response (IIR) and Finite
Impulse Response (FIR) filter design.
4) Digital Image Processing.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0113-7
REFERENCES
[1] Digital System Design Using VHDL by Charles H. Roth
Jr.
[2] The Design Warrior’s Guide to FPGA by Clive ‘Max’
Maxfield.
[3] FPGA Based System Design by Wayne Wolf.
[4] A VHDL Primer by Jayaram Bhaskar.
[5] Circuit Design With VHDL by Volnei A. Pedroni.
CONFERENCE ON ―SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)‖ MARCH 26-27 2011
VLP0114-1
A Complete CMOS Based Low Power Supply
Bandgap Voltage Reference Circuit
Implemented On TSMC 0.35-μm Process Kshitij Bhargava
#1, Kirmender Singh
*2
ECE Department (Microelectronics And Embedded Technology)
Jaypee Institute Of Information Technology University, Noida
India
[email protected]@gmail.com
Abstract— A complete CMOS based low power
supply bandgap voltage reference circuit
implemented on TSMC 0.35μm CMOS process
is presented in this paper. The designed circuit
employs a start-up circuit, a beta-multiplier
circuit(PTAT circuit) and a MOS based
differential amplifier. This circuit provides a
nominal reference voltage of 323 mV at 2V
supply voltage. Experimental results show that
the temperature coefficient is 1.16 ppm / ºC in
the temperature range from -20 ºC to +90 ºC.
The value of PSRR achieved without any
filtering capacitor is -21dB at 10KHz. The area
occupied by the design is 0.027mm² and power
consumption is 62.24μW at room temperature
(25 ºC).
Keywords— Bandgap voltage reference, PTAT,
CMOS, PSRR.
1. INTRODUCTION
The high-precision voltage reference circuit is an
important component in mixed-mode applications.
A stable reference circuit provides a reliable
reference voltage, and low supply voltage makes
the integration with low voltage analog and digital
circuits possible. Such reference circuits should
exhibit little dependence on process, supply voltage,
and temperature variations (PVT). With steadily
decreasing power supply voltages in deep
submicron CMOS technologies, a design of any
voltage/current reference on-chip becomes a non-
trivial task. Numerous approaches to achieve low
voltage supply drift as well as low temperature drift
voltage reference have been proposed till date. But
most of them have used BJT devices implemented
in standard CMOS process to implement reference
circuits [1-3] which occupies large wafer area.
Moreover, some of the implementations using non-
standard CMOS process require higher cost owing
to extra process steps [4-5] .
This paper presents a complete MOS based
bandgap voltage reference circuit with the same
general working principle of positive and negative
temperature coefficient voltages nullifying each
other to give a near about zero temperature
coefficient reference voltage along with a suitable
technique to minimize the power supply
dependence of this reference voltage[6].
The major parts of the circuit involves a start-up
circuit, a beta-multiplier circuit made up of NMOS
and PMOS current mirror circuits, and a differential
amplifier to enhance power supply rejection
capability of the reference voltage.
Section II. describes the proposed voltage
reference circuit design along with the detailed
description of its subparts viz. start-up circuit, the
beta-multiplier circuit and the differential amplifier
circuit.
Section III illustrates the experimental results.
Section IV concludes the paper.
CONFERENCE ON ―SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)‖ MARCH 26-27 2011
VLP0114-2
2. Proposed Reference Circuit
Figure 1: The Proposed Reference Circuit
2.1 Start-up Circuit
In any self-biased circuit, when the power supply
is just turned on, the current flowing in the circuit is
zero. In this circuit at this moment the gates of M1
and M2 are at ground while that of M3 and M4 are
at VDD. This forces the value of IPTAT current to
be zero initially. But since this voltage reference
can be used as precision power supply voltage in
many analog circuits, this unwanted state of the
reference circuit can lead to undesired operating
points of the transistors. Thus, a start-up circuit is
required to turn on the transistors M1 and M2 in the
initial moments of the circuit operation.
In the proposed circuit a start-up circuit has been
used which consists of transistors MU1, MU2 and
MU3. When the supply voltage VDD is just turned
on the gate of MU1 is at the zero potential and so it
is in the off state. On the other hand at this moment
the gate terminal of MU2 is somewhere between
VDD and VDD – Vth,p . The transistor MU3 acts
like an NMOS switch and leaks the current from the
gates of M3 and M4 into the gates of M1 and M2
and produces the desired value of IPTAT right
from the starting of circuit operation. When all the
transistors gets settled to a stable operating points
this start-up circuit automatically stops functioning
because MU1 starts conducting and due to this
MU3 turns off. This is very important since the
start-up circuit should not obstruct the normal
operation of the beta-multiplier circuit(which is
explained in the next subsection).
2.2 Beta-Multiplier Circuit (PTAT Circuit)
The basic building block of any bandgap voltage
reference circuit is a current mirror circuit. The
proposed circuit shows a NMOS current mirror
stacked just below a PMOS current mirror. The
purpose of using such a configuration is explained
below.
To obtain the desired value of IPTAT current it
becomes very essential to be able to force the same
value of current through M1 and M2. This can be
achieved by using a PMOS current mirror. We can
write,
VGS1=VGS2+IPTAT.Rout (1)
And,
IPTAT=(2/R²out.β1).[1-√β1/√β2]² (2)
Where,
β=μn .Cox. (W/L) (3)
The equation(1) holds good only if VGS1>
VGS2. To ensure this we have to use a beta-
multiplier circuit which can efficiently increase the
value of transistor gain ‗β‘ in M2, which is
generally achieved by simply increasing the width
of the transistor M2 such that W2 = K .W1. This
will eventually help in achieving the desired value
of IPTAT current even at low value of gate to
source voltage of M2.
2.3 Reference Voltage Generation Principle
The reference voltage is generated by adding up
two voltages one with positive temperature
coefficient and other with negative temperature
coefficient. The drop across resistor Rout i.e
VPTAT will provide a positive temperature
coefficient voltage and the drain-to-source voltage
(VDS5) of a diode connected NMOS transistor M5
CONFERENCE ON ―SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)‖ MARCH 26-27 2011
VLP0114-3
will give a negative temperature coefficient voltage.
These two opposite temperature coefficient voltages
will give a reference voltage of very small
temperature coefficient value. Mathematically, the
reference voltage can be expressed as,
VREF=VPTAT+VDS5 (4)
And,
ƏVREF/ƏT=ƏVPTAT/ƏT+ƏVDS5/ƏT (5)
2.4 Differential Amplifier
To reduce the sensitivity of reference voltage to
the power supply variation (PSRR Improvement)
we need to reduce the variations in the drain-to-
source voltages of devices M1 and M2 with change
in VDD. For this purpose a MOS based
differential amplifier has been used whose output is
connected to the common gate terminal of M3 and
M4.
The differential amplifier compares the drain
voltages of M1 and M2 and regulate them to
become equal.
Figure 2 : Differential Amplifier Circuit
TABLE 1: Component Values Of Proposed
Reference Circuit
Component Values
MU1
MU2
MU3
M1
M2
M3
M4
M5
Rout
50/2
10/20
10/1
50/2
210/2
100/2
100/2
2.85/0.35
8k
3. EXPERIMENTAL RESULTS
The proposed temperature insensitive voltage
reference circuit shown in Figure.1 generates a
voltage of 323 mV at room temperature 25 ºC.
Figure.3 shows the reference voltage variation with
temperature for the range -55 ºC to +125 ºC.
Figure.4 shows the reference voltage variation with
temperature at three different corner conditions viz.
fast corner(FF), typical(TT) and slow corner(SS).
This circuit operates at a low supply voltage of 2V
and the temperature coefficient of the reference
voltage is only 1.16 ppm/ ºC within the temperature
range of -20 ºC to +90 ºC and the value of PSRR is
-21dB at 10 KHz frequency. The power
consumption of the circuit is 62.24 μW. The area
occupied by the design on silicon wafer is 0.027
mm².
CONFERENCE ON ―SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)‖ MARCH 26-27 2011
VLP0114-4
Figure 3: Reference Voltage Versus Temperature
Curve
Table2: Performance Summary of the Proposed
Design
Parameter [7] [8] This Work
Technology(μm) 0.6 0.5 0.35
Supply Voltage 1.4V 2.6V 2V
Reference Voltage 0.309V 1.21V 0.323V
Temperature
Coefficient(ppm/ºC)
36.9 613 1.16
PSRR -47 dB
at 100
Hz
-30
dB at
100
Hz
-21dB at
10 KHz
Active Area(mm²) 0.055 0.045 0.027
Figure 4 : Reference Voltage under the three
corner conditions
4. CONCLUSIONS
A high precision temperature insensitive voltage
reference circuit has been presented in this paper.
The circuit was designed using TSMC 0.35μm
CMOS technology and experimental results were
illustrated. It shows that the proposed circuit can
provide a stable reference voltage of 323mV within
the temperature range -20ºC to +90ºC with the
power supply rejection value of -21dB at 10KHz
Hertz frequency. The proposed reference circuit
provides a stable reference voltage having very
small temperature drift. Such circuit can be used for
applications which requires a stable voltage
reference such as MEMS based temperature sensors
and low dropout regulators.
REFERENCES
[1] Karel E. Kuijk, ―A Precision Reference Voltage
Source‖ , IEEE Journal Of Solid-State Circuits,
Vol. SC-8, No. 3, June 1973, pp. 222-226.
[2] Allen, P.E. & Holberg, D.R (2002). ―CMOS
Analog Circuit Design‖. New York : Oxford.
[3] Matthew C. Guyton and Hae-Seung Lee, MIT ,
―Bandgap Current Reference‖ , March 2003.
[4] Lee, I., Kim G., & Kim, W. (1994)
―Exponential curvature compensated BiCMOS
bandgap reference‖ IEEE Journal Of Solid-
State Circuits, 29, 1396-1403.
[5] Malcovati, P.,Maloberti, F., Fiocchi, C., Pruzzi,
M. (2001). ―Curvature-compensated BiCMOS
bandgap with 1-V supply voltage‖, IEEE
Journal Of Solid-State Circuits, 36(7), 1076-
1081.
[6] Allen-Holberg, ―CMOS Analog Circuit
Design‖, Second Edition.
[7] Stair, R., Connelly, J.A. , & Pulkin M. (2000)
―A Current Mode CMOS Voltage Reference‖.
In proceedings of Southwest Symposium on
Mixed-Signal Design (pp. 23-26)
[8] Kimberly Jane S.Udy, Patricia Angela Reyes-
Abu and Wen Yaw Chung, ―A High Precision
Temperature Insensitive Current And Voltage
Reference Generator‖. In proceedings of
World Academy Of Science, Engineering and
Technology 2009.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0115-1
Performance Analysis of Carbon Nanotube FET Harish Kr. Mishra
1, S.P. Gangwar
2, Dr. Harsh V. Singh
3,
1M.Tech. Student, Department of Electronics Engineering, KNIT, Sultanpur-228 118
2,3Assistant Professor, Department of Electronics Engineering, KNIT, Sultanpur-228 118
Phone No. (+91)9415177465, (+91)9515763939 Email: [email protected], [email protected]
Abstract: We Study field-effect transistors based on individual single and multi-wall carbon nanotubes and analyzed
their performance. Transport through the nanotubes is dominated by holes and by varying the gate voltage;
we successfully modulated the conductance of a single wall device by more than 5 orders of magnitude.
Multi-wall nanotubes show typically no gate effect.
Keywords: Carbon nanotubes, Semiconductor, Singlewall Nanotube, Multiwall Nanotube, FET
Carbon nanotubes (NT) are a new form of carbon with unique
electrical and mechanical properties [1].They can be considered as
the result of folding graphite layers into carbon cylinders and may
be composed of a single wall nanotube ( SWNTs), or multiwall
nanotubes.( MWNTs).Depending on the folding angle and the
diameter, nanotubes can be metallic or semiconducting.
The band gap semiconducting NTs decreases with increasing
diameter. In this paper we study on the fabrication and
performance of a SWNT-based FET and explore whether MWNTs
can be utilized as the active element of carbon-based FETs. Despite
their large diameter, we find that structurally deformed MWNTs
may well be employed in NT-FETs. Based on the output and
transfer characteristics of our NT devices.
The SWNTs used in our study were produced by laser ablation of
graphite doped with cobalt and nickel catalysts [7]. For cleaning,
the SWNTs were ultrasonically treated in anH2SO4/H2O2 solution.
MWNTs were produced by an arc-discharge evaporation technique
[8] and used without further treatment. The NTs were dispersed by
sonication in dichlroethane and then spread on a substrate with pre
defined electrodes. A schematic cross section of a NT device is
shown in Fig. 1.
They consist of either an individual SWNT or MWNT bridging two
electrodes deposited on a 140 nm thick gate oxide film on a doped
Si wafer, which is used as a back gate. The 30 nm thick Au
electrodes were defined using electron beam lithography. For
imaging, we used an atomic force microscope operating in the
noncontact mode.
The source–drain current I through the NTs was measured at room
temperature as a function of the bias voltage VSD and the gate voltage
VG. Figure 2 a shows the output
1
Characteristics I – VSD of a device consisting of a single SWNT With
a diameter of 1.6 nm for several values of the gate voltage. At VG5 0 V,
the I-VSD curve is linear with a resistance of R5 2.9 MV. For VG, 0 V,
The I-VSD curves remain linear, whereas they become increasingly
nonlinear for VG at 0 V up to a point where the current becomes un
measurably small, indicating a controllable transition between a quasi
metallic and an insulating state of the NT. Figure 2 b shows transfer
characteristics I – VG of our NT device for different source–drain
voltages.
The behavior is similar to that of a p-channel metal oxid
semiconductor FET [9]. The source drain current decreases strongly
with increasing gate voltage, which not only demonstrates that the NT
device operates as a Feld Effect Transistor but also that transport
through the semiconducting SWNT is dominated by positive carriers
holes.
The conductance modulation of our SWNT-FET exceeds 5 orders of
magnitude. For VG, 0 V, the I – VG curves saturate indicating that the
contact resistance RC at the metal electrodes starts to dominate the total
resistance R5 RNT 1 2 RC of the device. Here, RNT denotes the gate-
dependent resistance of the NT. The saturation value of the current
corresponds to RC' 1.1 MV. Similar contact resistances were previously
found for metallic SWNTs [4]. The origin of the holes is an important
question to address. One possibility is that the carrier concentration is
inherent to the NT.
FIG.1. Schematic cross section of the FET devices. A single NT of either MW or SW type bridges the gap between two gold electrodes.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0115-2
FIG. 2. Output and transfer characteristics of a SWNT-FET: an I – VSD
curves measured for VG526, 0, 1, 2, 3, 4, 5, and 6 V. b I – VG curves for
VSD510– 100 mV in steps of 10 mV. The inset shows that the gate
Modulates the conductance by 5 orders of magnitude (VSD510 mV).
The higher work function of gold leads to the generation of holes
in the NT by electron transfer from the NT to the gold
Electrodes [2]. Assuming that the band-bending length in their
SWNT is neither very short nor very long, At VG50 V, the
Device is ‘‘on’’ and the Fermi energy is close to the valence-
band edge throughout the NT. If indeed the band-bending length
is comparable to the length of the SWNT, a positive gate voltage
would generate an energy barrier of an appreciable fraction of
eVG in the center of the tube since the gate/NT distance is shorter
than the source/drain separation. The threshold voltage VG,T
required to suppress hole conduction by depleting the tube
center would be determined by the thermal energy available
for overcoming this barrier. Thus, VG,T should be much lower
than the 6 V .
In this case, we expect a fairly homogeneous hole distribution
along the NT independent of the gate voltage. An
Estimate of the hole density can then be obtained by writing
the total charge on the NT as Q5 CVG,T , where C is the NT
capacitance and VG,T the threshold voltage necessary to
completely deplete the tube. The NT capacitance per unit
length with respect to the back gate is C /L' 2 pee0/ln(2h/r),
with r and L being the NT radius and length, and h and e the
thickness and the average dielectric constant of th device.10
Using L 5 300 nm, r50.8 nm, h 5 140 nm, and e'2.5, we
evaluate a one-dimensional hole density of p5 Q/eL '9 3
106cm
2from VG,T
56 V. This value corresponds to about 1 hole
per 250 carbon atoms in the NT. For comparison, in graphite there
is only 1 hole per 104atoms [11]. The large hole density suggests
that the NT is degenerate and/or that it is doped with acceptors, for
example, as a result of its processing [12].
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0115-3
FIG.3. I – VG curve of a typical MWNT device curve A in comparison
with that of a collapsed MWNT of similar cross section curve B .
We can estimate the mobility of the holes from the transconductance
of the FET. In the linear region, it is given by dI/dVG 5 mh( C/L2)VSD.
Subtracting the contact resistance. we obtain a NT transconductance
of dI/dVG5 1.731029 A/V at VSD510 mV, corresponding to a hole
mobility of 20 cm2/V s. This value is close to the mobility in heavily
p-doped silicon of comparable hole density,9but considerably smaller
than the 104cm
2/V s observed in graphite[11].The low value of the
NT mobility is consistent with our initial assumption of diffusive
transport and suggests that the SWNT contains a large number of
scatterers, possibly related to defects in the NT or disorder at the
NT/gate–oxide interface due to roughness. Such deformations can
lead to local electronic structure changes,[13]which may act as
scattering centers.
The low mobility is surprising in view of the coherence length of
more than 1 mm reported on the basis of energy quantization a long a
metallic SWNT at low temperature[4].However, we note that there
have been no transport experiments on individual SWNTs that
provide evidence for ballistic transport at room temperature e.g., by
observing conductance
Quantization [1] Having demonstrated FET operation for a SWNT,
We move on to explore whether transport through MWNTs can
be controlled by a gate electrode. The band gap of NTs has
been predicted to decrease with increasing tube
diameter[1].Therefore, MWNTs with diameters of 10 nm or more are
expected to show metallic rather than semiconducting behavior at
room temperature. We study a number of MWNT devices with
resistances of R; 100 kV. Most of these devices showed no gate
action, and a typical I – VG
curve is plotted in Fig. 3 curve A.
Structural deformations of NTs change their electronic properties.
Curve B in Fig. 3 shows that this can lead to a significant gate effect
in MWNTs. As is the case for the SWNT-FET, the source–drain
current of this MWNT-FE decreases with increasing gate voltage, i.e.
the dominant conduction process is hole transport. In contrast to the
SWNT device, this MWNT-FET could not be completely depleted. The
I – VSD curve remained linear independent of the gate voltage not
shown. Between VG52 35 and 25 V, the resistance increased only from
R5 76 to 120 kV, corresponding to a conductance modulation by about a
factor 2.
2
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0115-4
FIG. 4. A Noncontact AFM image of the MWNT-FET. b and c
Close up views showing three twists in the collapsed Nanotube
The gate effect reaches a sharp maximum between VG
52 15 and 0 V [13].To explain this peculiar behavior, we
consider the AFM image, Fig. 4a of the MWNT-FET. The device
consists of a collapsed MWNT, which bridges the gap between
two Au electrodes separated by about 1 mm.
This nanostripe is 3 nm high from which we conclude that it has
four or five shells and it exhibits a number of twists at Figs. 4b
and c which allow us to determine its width to be 12 nm. Based
on the structural information summarized in Fig.4d, we propose
the following explanation for the behavior of the MWNT-FET.
Since the intershell interaction in MWNTs is weak, it is
reasonable to assume that transport is confined to the outermost
shell of the nanostripe [12].The conductance modulation of
about 2 indicates that the bottom ‘‘plate’’ of the outermost shell
is depleted by the gate, whereas the top layer is less affected
due to screening by the inner shells and the bottom layer as
long as it is conducting.
Our model implies that the bottom ‘‘plate’’ is decoupled from
the top layer, which may be the consequence of lateral
quantization effects perpendicular to the tube axis. Using R5
RNT1 2 RC for the ‘‘on’’ state (VG5215 V) and R 52 RNT1 2 RC
for the ‘‘off’’ state of the MWNT-FET (VG5 0 V), we estimate
a resistance of RNT532 kV for the outer shell of the NT and
deduce a contact resistance of RC 5 23 kV. Finally, we proceed
analogously to the SWNT-FET analysis to evaluate the hole
density and mobility of the collapsed MWNT. Numerical
calculations show that the capacitance per unit length is
reasonably well described by C/L 5 2 pee0/ln(2h/r) despite the
slab-shaped geometry of the collapsed tube. Using L5 1.1 mm, r5 5
nm, and a threshold voltage of VG,T'8 V to deplete the bottom layer,
we obtain p' 1.73107cm
2for its hole density.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0115-5
From the transconductance of dI/dVG5 3.53 1028 V/A at VSD5 50 mV,
We estimate a mobility of mh' 220 cm2/V s. The hole density is
similar to the SWNT but the mobility is higher, which suggests a
reduced number of scatterers. This may arise from the fact that the
MWNTs were not ultrasonically treated in acids. Furthermore, they
do not deform as much as SWNTs in order to conform to roughness
at the NT/gate–oxide interface.
Conclusion: Transport in the Nanotubes is dominated by holes and,
at room temperature, it appears to be diffusive. Using the gate
electrode, the conductance of a SWNT-FET could be modulated by
more than 5 orders of magnitude. An analysis of the transfer
characteristics of the FETs suggests that the NTs have a higher carrier
density than graphite and a hole mobility comparable to heavily p-
doped silicon. Large-diameter MWNTs show typically no gate effect,
but structural deformations can modify their electronic structure
sufficiently to allow FET behavior.
References :
1- M. S. Dresselhaus, G. Dresselhaus, and P. C. Eklund, Science of
Fullerenes and Carbon Nanotubes Academic, San Diego,1996
2 - J. W. G. Wildo¨er, L. C. Venema, A. G. Rinzler, R. E. Smalley,
and C. Dekker, Nature ~London! 391, 59 ~1998!.
3 -T. W. Odom, J.-L. Huang, P. Kim, and C. M. Lieber, Nature
London 391, 62 ~1998.
4-S. J. Tans, M. H. Devoret, H. Dai, A. Thess, R. E. Smalley, L. J.
Geerligs,and C. Dekker, Nature ~London! 386, 474 ~1997
5-M. Bockrath, D. H. Cobden, P. L. McEuen, N. G. Chopra, A. Zettl,
A. Thess, and R. E. Smalley, Science 275, 1922 ~1997
6-S. J. Tans, A. R. M. Verschueren, and C. Dekker, Nature
~London! 393,49 ~1998
7-T. Guo, P. Nikolaev, A. Thess, D. T. Colbert, and R. E. Smalley,
Chem. Phys. Lett. 243, 49 ~1995
8-D. T. Colbert, J. Zhang, S. M. McClure, P. Nikolaev, J. H. Hafner,
D. W. Owens, P. G. Kotula, C. B. Carter, J. H. Weaver, A. G.
Rinzler, and R. E.Smalley, Science 266, 1218 ~1994
9-S. M. Sze, Physics of Semiconductor Devices ~Wiley, New York,
1981
10 -This expression was inferred from P. M. Morse and H. Feshbach,
Methods of Theoretical Physics ~McGraw–Hill, New York, 1953
11 - N. B. Brandt, S. M. Chudinov, and Ya. G. Ponomarev,
Semimetals, 1. Graphite and its Compounds ~North-Holland,
Amsterdam, 1988
12 -H. He, J. Klinowski, M. Forster, and A. Lerf, Chem. Phys. Lett.
287, 53 1998
13 J. E. Fischer, H. Dai, A. Thess, R. Lee, N. M. Hanjani, D. L. Dehaas,
and R. E. Smalley, Phys. Rev. B 55, R4921 -1997
3
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-
27 2011
VLP0201-1
POWER AWARE PHYSICAL MODEL FOR EMBEDDED
SYSTEMS
Asstt Prof Yasmeen Hasan
Mtech(Electronic Circuits &Systems (VLSI))
DEPT OF ECE, INTEGRAL UNIVERSITY, LUCKNOW
Email: [email protected]
Abstract- In this work we have proposed a
geometric model that is employed to devise
a scheme for identifying the hotspots and
zones in a chip. These spots or zone need to
be guarded thermally to ensure
performance and reliability of the
embedded system. The model namely
continuous unit sphere model has been
presented taking into account that the 3D
region of the system is uniform, thereby
reflecting on the possible locations of heat
sources and the target observation points.
The experimental results for the –
continuous domain establish that a region
which does not contain any heat sources
may become hotter than the regions
containing the thermal sources. Thus a
hotspot may appear away from the active
sources, and placing heat sinks or cooling
system near the active thermal sources
alone may not suffice to tackle thermal
imbalance.
Keywords:Embeddedsystems,continuous
model,floorplanning,Finemesh(FM),Corse
mesh(CM),Hotspots etc.
2: Introduction
In recent years, power density in
microprocessors has doubled every three
years [1,2,3], and this rate is expected to
increase within one to two generations as
feature sizes and frequencies scale faster
than operating voltages [4,7]. Because
energy consumed by the microprocessor is
converted into heat, the corresponding
exponential rise in heat density is creating
vast difficulties in reliability and
manufacturing costs. At any power
dissipation level, heat being generated must
be removed from the surface of the
microprocessor die, and for all but the
lowest-power designs today, these cooling
solutions have become expensive. For high-
performance processors, cooling solutions
are rising at $1–3 or more per watt of heat
dissipated [3, 8], meaning that cooling costs
are rising exponentially and threaten the
computer industry‟s ability to deploy new
systems.
Thermal aware floorplanning[6]
reduces the on chip hotspot by a
significant amount through lateral
spreading. In the traditional design
methodology, worst case assumption are
used to ensure that the system operates
normally in all corner cases, which
results in excessive design margin by
imposing extreme design constraints.
With the shift in design paradigm, worst
case assumptions and post design
solutions are no longer sufficient to
address thermal and power issues. It has
become important to take into
consideration right from the starting and
address them at all levels of design
cycle.
In this paper we have proposed a geometric
model which is employed to devise a
scheme for identifying the hotspots in an
embedded system.We propose a model here
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0201-2
which may facilitate in identifying the hot
spots/zones in a VLSI chip. In the
continuous domain we have used the
concept of a unit sphere model to calculate
the local thermal effect at a point due to the
heat being dissipated from several point
heat sources distributed over the chip
plane. We establish that a point on a chip
can become very hot due to the conduction
effects of other heat sources, although it may
not have a heat source in its immediate
vicinity. In this model, the heat loss due to
radiation has been ignored. If it is to be
considered, an appropriate heat loss function
has to be incorporated
Fig1: SIDEVIEW OF A TYPICAL
PACKAGE[9]
2.1: Time Invariant Heat Sources
The study is made with the assumption that
there are constantly active (i.e. always on)
heat generating sources placed randomly
throughout the chip .For continuous
thermal sources; we also assume that the
heat from the sources is being propagated
through the 3D surface of the chip without
being dissipated in the ambience. The
objective is to identify the zones in the
chip, which have heat content greater than a
certain threshold. 3.2: Continuous Spatial Domain
The position of a heat source may be any
point on the chip which is assumed to be an
embedded system. In the unit sphere model,
the contribution of a point heat source S at
any target point T is expressed as the
amount of heat from S received within the
unit sphere centered at the point T. This unit
is the same as that of the distance between S
and T, and may be related to the minimum
dimension of the chip. The cumulative heat
received at the point T is evaluated as the
linear superposition of the amounts received
at T from all heat – generating sources on
the chip.
As illustrated with Fig. 2, let a heat source at
a point S generate an amount Q, henceforth
denoted as the strength of the source S. Let
the target point T be at a Euclidian distance
d from S. Let CT and Cs intersect at the two
points A and B.
Then the area cut out on the surface of the
sphere CS is equal to the product of solid
angle with its vertex at the center of the
sphere Cs and the square of the sphere‟s
radius A …. (1)
Fig 2: Unit Sphere Model of Heat Received at a
Point T
Where formed by the conical surface of the
spherical sector and d is the radius of the
source sphere.
A complete sphere forms a solid angle of 4
If the
solid angle is not formed by the entire sphere, but only
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0201-3
by a conical surface of a spherical sector, the angle in
this case is equal to the ratio of the sector‟s spherical
surface to the square of the sphere‟s radius [5].) By
denoting the plane angle at the vertex of the spherical
sector as θ, it is possible to express its height h as
… (2)
where r is the radius of the source sphere.
Therefore the spherical area of the sector
can be represented as
A=
… (3)
Fig 3: Section of a cone and a spherical cap
inside a sphere
By denoting the solid angle which subtends
the spherical surface of the sector as we
obtain
… (4)
Thus the contribution of heat from S at T is
Q
… (5)
Where is the surface area of the sphere S.
Consider OCTB in the figure 10
(CTB)2 = (OB)
2 +(OCT)
2
( 1)2 = (d(1-cosθ))
2 +
d2sin
2 θ
1=21-cosθ)
… (6)
Putting eqn (6) in eqn (5) we get
The contribution of heat from S to T is
=Q
... (7)
Our concerns are the hottest points on the chip.
Intuitively, the source points definitely belong to the
above class. But the more pertinent question is
whether these are the only points that need to be
considered. The question may be re-phrased as
follows: does there exist any non-source point on
the floor with heat content greater than that of any
of the source points?
The observations reported, answer in the
affirmative. Before we proceed further, we point out
two special cases of the unit sphere model based on
the distance d between S and T:
0.5<d<1 and (2) 0<d<0.5
Case1:1/2<d<1 Case
2:0<1/2<d
Fig 4: Special cases of the unit sphere
model
In the boundary case when S lies on CT is
equal to , as SAT becomes an equilateral
triangle
Q= (1-cosθ)
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0201-4
TABLE 1: RESULTS FOR THE CONTINUOUS
DOMAIN
Q= (1- cos)
=
… (8)
Hence in case (1) the angle 2θ as defined
earlier will be greater than, and
consequently more than of the heat
emanating from S reaches the unit sphere
centered at T. In case (2) T is nearer to S
and hence the sphere with radius „d
„around S will now lie entirely within the
Unit sphere at T. Hence the unit sphere CT
receives the entire heat of S in this case.
Using the formula derived above we
calculated the cumulative heat received at
each point along the diagonal joining any
two vertices of the geometrical 3D
structure taken into consideration .In this
work we proceeded by taking a regular
cuboid and a regular Octagonal prism
structure. We worked out with the formula
by taking the above mentioned 3D
structures of different dimensions. While
proceeding with this approach we consider
the medium throughout the geometrical
structure as isotropic.
An active source of unit strength (Q0=1) was
placed at each of the vertices of the 3D
structure(8 in cuboids and 16 in octagonal
prism).
4: EXPERIMENTAL RESULTS
Using the Continuous Domain formula we
try to find the hottest spot along a given
direction in a 3D structure. Here we had
taken cuboids of different dimensions and
placed an active sources of unit strength at
each of its 8 vertices. The target points are
taken along the longest diagonal.
RESULTS FOR THE CONTINUOUS
DOMAIN
We performed more experiments in the
continuous domain model implemented in C
to simulate the effect of active sources
placed at random points on the 3D floor.
Keeping the
dimensions of the 3D structure the same we
varied the number of sources from 5 to
50.We have studied five trail runs, keeping
the number and range of the power strength
of the active sources fixed, just allowing the
position of the sources to vary.
We actually considered a fine grid around
each source point and evaluated the
cumulative power at each of those points
along with the source points. Also across the
whole
floor we considered a relatively coarse grid
and evaluated the power at all the grid points
of this coarse grid.
The formula derived from the unit sphere
model has been used for the calculation. In
table 1 we have reported our results. The
threshold value is the minimum of the total
power at the active source points including
the contribution from all other sources.
5: CONCLUSION
In this work we have proposed a model in
the continuous domain to model the thermal
behavior in an embedded system. The
hotspots were usually concentrated near the
NO OF
SOUR
CES
THRESH
OLD
VALUE
TOTA
L
PROB
ES
POIN
TS
PRO
BES
POIN
TS
IN
FM
PROB
E
POIN
TS
IN CM
HOTS
POT
IN
FM
HOT
SPOT
IN
CM
%HOT
SPOT
IN
FM
%HOT
SPOT
IN
CM
5 1.24876 2029
160 2916
0 2000
000 9198 2562
0.45
% 0.19%
10 1.24338 2058
320 5832
0 2000
000 1679
8 8376
0.81
% 0.41%
20 1.26821 2116
640 1166
40 2000
000 4223
8 1204
8 1.72
% 0.68%
40 1.26441 2233
280 2332
80 2000
000 9397
2 1903
0 4.21
% 0.82%
50 1.20101 2291
600 2916
00 2000
000 1182
62 2596
9 5.16
% 1.13%
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0201-5
active source points, but some points away
from the source were found to be much
hotter than the sources itself. The
randomness of the source did not affect the
result much. One important aspect we have
observed in all the models is that there are
zones in the chip which become much hotter
even without containing a heat source. We
conclude that it may not be enough to guard
only the active regions to make the chip
thermally stronger. This also requires the
need for more efficient power and thermal
management techniques
References
[1] S. Borkar. Design challenges of
technology scaling. IEEE Micro, pp. 23–29,
Jul.–Aug. 1999.
[2] G. Roos, B. Hoefflinger, M. Schubert,
and R. Zingg, “Manufacturability of 3D-
epitaxial-lateral-overgrowth CMOS circuits
[3] R. Mahajan. Thermal management of
CPUs: A perspective on trends, needs and
opportunities, Oct. 2002. Keynote
presentation,THERMINIC-8.
[4] Y.K.Cheng and S.M.Kang, “An Efficient
Method for Hot-spot Identification in ULSI
Circuits”, Proc. Of IEEE Int. conf. on
Computer Aided (ICCAD), pp. 124-127,
1999.
[5] Solid Angle “, on the Wikipedia, the
free encyclopedia Website.
[6] T. Sherwood, E. Perelman, and B.
Calder. Basic block distribution analysis to
find periodic behavior and simulation points
in applications. In Proc. PACT, Sept. 2001.
[7] SIA. International Technology Roadmap
for Semiconductors,2001.
[8] S. Gunther, F. Binns, D. M. Carmean,
and J. C. Hall. Managing the impact of
increasing with three stacked channels,”
Microelectron, 1991
microprocessor power consumption. Intel
Tech. J., Q1 2001.
[9] Fig: 1.From: K. Skadron,S.Velusam, K. Sankaranarayanan and D. Tarjan.
“Temperature-Aware
Microarchitecture”.Published in the
Proceedings of the 30th International
Symposium on Computer Architectures,
June 9–11, 2003 in San Diego, California,
USA.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0202-1
Abstract:- In the present IT age, we are in
need of fully automatic system for remotely
controlling and monitoring appliances. This
paper mainly focuses on the remotely
controlling the industrial and home
appliances and making efficient utilisation of
power supply[1]. This system is SMS based
using GSM (Global System for Mobile
Communication) and uses a wireless
technology. It provides an perfect solution to
the problem faced by home owner when they
forget to switch off their home appliances
while going out of home. It is one of the
emerging and new application of GSM
technology. It is of great use for efficient
utilisation of power in industry and cutting
down the electric bill. Here we are
representing a design of a stand alone
embedded system that can monitor and
control different appliances installed at
industries and home using built-in input and
output pheripherals. Basically this system
allows the home owner and industry owner
to control and monitor their appliances
remotely via mobile phone by sending
command in form of SMS message and
receiving the appliances current status. The
software used for simulation is ecllispse with
a java run time environment.
Keywords- GSM , SMS, Signal Processing
and Embedded System .
I. INTRODUCTION
The objective of this paper is to control home
appliances remotely and reduce the power
wastages by providing cost effective solution.
The motivation was to make possible the users
to automate their homes having universal
access. The home appliances control system
with an reasonable cost was thought to be built
that should be mobile providing remote access
to the appliances. There was a need to
automate home and industry so that user can
take advantage of the advancement in such a
way that a person getting off the office does not
get melted with the hot climate. The motive of
this paper is to propose a system that allows
user to be control home appliances universally
via SMS using GSM technology and make a
efficient utilisation of power supply. A design
and implementation of SMS based control for
monitoring systems is proposed in[2]. This
paper has three modules involving sensing unit
for monitoring the complex applications, a
processing unit that was microcontroller and a
communication module that used GSM module
or cell phone. The primary health-care
management for the rural population is
explored in [3]. Providing PHC services to the
rural population by the use of the mobile web-
technologies was prposed in the paper [3]. The
system above involves the use of SMS and cell
phone technology for information management,
transactional exchange and personal
communication. Internet and wireless
communications have been utilized in home
automations [6-8].
In this paper , I have tried to
implement a method in which a
acknowledgement from receiver could be
received without any additional cost. It would
be beneficial on the user aspect to receive a
feedback from the receiver.
Efficient Power Utilisation By Controlling
Industrial And Home Appliances Using GSM and
Microcontroller Raj Singh Yadav* and Nidhi Mishra**
*B.Tech IIIrd Year, **Assistant Professor
Krishna Institute of Engineering and Technology
Electronics and Communication Department
Ghaziabad-201206, India
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0202-2
II. HOME APPLIANCE CONTROL
SYSTEM WITHOUT FEEDBACK
We proposed home and industrial
appliance control system based on GSM
network technology for transmission of SMS
from sender to receiver. The GSM network
provides full duplex link to support the user
requirement[4] SMS sending and receiving is
used for universal access of appliances and
allowing remotely monitoring and controlling
the appliances at home. The home appliance
control system consists of mainly three
following components:- microcontroller, GSM
module and mobile device. Microcontroller is
used for storing software program coding on
which the system is functioning. GSM module
is used for receiving the message from the user.
Mobile device is used for sending the command
which has to be performed by the
microcontroller.
III.PROPOSED PAPER WITH
FEEDBACK SYSTEM
In this proposed paper, the system is capable
enough to give feedback to user about the
condition of the home appliance according to
the user‟s needs and requirements. The current
status of the appliances can be checked. The
working of feedback system can be explained
with help of below fig.[1]
Fig:- Diagram for Home Appliances control
system with feedback [1]
This system has basically two units. They
are transmitter and receiver unit with a
feedback system. The message consists of a set
of commands to turn a specific appliancels
ON/OFF [5]. The working of this system can be
explained as:- Microcontroller, GSM module
and Mobile phone.
Microcontroller being the main component
has home appliances control system installed on
it. Appliances control is responsible for
everywhere access of appliances. Systems work
on GSM technology for transmission of
commands from sender to receiver.
GSM module is a plug and play device and is
attached with the help of port RS232 to the
Microcontroller which then communicates with
the Microcontroller via this port. GSM module
is like a link responsible for enabling/ disabling
of SMS capability.
Mobile device with a GSM sim
communicates with the GSM module via radio
waves. The method of communication is
wireless and mechanism works on the GSM
technology. Cell phone has an authorised SIM
card and a GSM subscription. Sender transmits
instructions via SMS and the system takes
action against those instructions.
IV. CONSTRAINTS OF HOME
APPLIANCES
CONTROL SYSTEM
The system functionality is based on GSM
technology and microcontroller and it needs a
power supply so the technological constraints
must be kept in mind. The system is helpless to
power failure but this disruption can be avoided
by attaching the voltage source thus allowing
users to avail the great advantage of this
system.
V. RESULTS AND SIMULATION
The result of the system can be explained as:-
The system will check various GSM hardware
tests and will run to check the all the hardware
component support. The system then opens the
serial port RS232 for communication with the
GSM module. On successful port opening the
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0202-3
system communicates with the GSM Module
but there is no communicate if the run fails.
The system checks support for battery
level, signal strength, GSM Module and other
components by SMS sending and receiving
capability. If these tests succeed the system
gives response of „Ok‟, if it fails then „ERROR‟
is returned. The remote user sent SMS with
security code (as defined in the program code)
from a cell phone on the home appliances
control system to turn on/off the specified
appliance and the system performed the
respective function by simulating the appliance
on/off as directed by the user.
Appliances SMS
send by
User
System
Response
Feedback
Message
(current
status)
Air
conditioner
AC on
AC off
AC
button
simulated
to on/off
AC on
AC off
Light Light on
Light
off
Light
button
simulated
to on/off
Light on
Light off
Fan Fan on
Fan off
Fan
button
simulated
to on/off
Fan on
Fan off
Fig. Results of home appliances control
system with feedback response[1].
Achieved analytical results:-
System allowed the provision of security
such that system took no action against
the instructions received from SMS
without security code or if the SMS
received is from unregistered number.
The required task was performed only
when the SMS with correct security
code instructed the system.
Remote Controlling capability of the
system allowed user to switch on/off
and check the status through simulating
the appliance as directed by the
incoming SMS.
The system automatically performed tests
and checked support for available
features, hardware and SMS sending
and receiving capability and configured
system accordingly.
The program code is written using high
level language like C, C++ and the compiler
converts it into machine code and it is stored in
microcontroller . The software used is ecllipses
with a java run time enviroment. The code is
transferred from the computer to
microcontroller with help of USB port,
USBtiny and RS232 device. The compiler used
is AVRdude. The program code can be edited
and compiled using the ecllipse software . The
sender and receiver GSM number with the
security code is defined in the program code.
VI. CONCLUSION
In the paper low cost, secure, universally
accessible, remotely controlled with a feedback
solution for automation of homes has been
introduced[1]. The target of achieving the
control over home appliances remotely using
the SMS-based system is possible by this
system. GSM technology capable solution has
proved to be controlled remotely, provide home
automation and is cost-effective as it can reduce
the electric bill by efficient utilisation of the
home appliances. The appliances are used only
when they are required. It is of great use for the
industrial appliances also. Hence we can
conclude that the required objectives and goal
of home appliances control system have been
achieved.
VII. FUTURE DIRECTION
The basic level of home appliance
control and remote monitoring with feedback
has been implemented. In case of remote
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0202-4
monitoring other home appliances can also be
monitored and controlled such that if the level
of temperature rises above certain level then it
should generate SMS or sensors can also be
applied that can detect gas, smoke or fire in
case of emergency the system will
automatically generate SMS.
In future the system will be small box
containing the microcontroller and GSM
Module with a reduced size.
REFERENCES
1) Tahmina Begum, Md. Shazzat Hossain,
Md. Bashir Uddin and Md. Shaheen
Hasan Chowdhury “Design and
Development of Activation and
Monitoring of Home Automation
System via SMS through
Microcontroller” in 2009 International
Conference on Computers and Devices
for Communication
2) B. Ciubotaru-Petrescu, D.Chiciudean,
R.Cioarga, D. Stanescu. “Wireless
Solutions for Telemetry in Civil
Equipment and Infrastructure
Monitoring” in 3rd Romanian
Hungarian Joint Symposium on Applied
Computational Intelligence (SACI) May
25-26, 2006.
3) Z. Alkar, U. Buhur, (2005). “An Internet
Based Wireless Home Automation
System for Multifunctional Devices” in
IEEE Consumer Electronics, 51(4),
1169-1174.
4) A.Alheraish, W. Alomar, and M. Abu-
Al-Ela “Programmable Logic
Controller System for Controlling and
Monitoring Home Application Using
Mobile Network” in IMTC 2006 -
Instrumentation and Measurement
Technology Conference Sorrento, Italy
24-27 April 2006 , pp. 469
5) A.R. AI-Ali & M. AL Rousan . M.
Mohandes GSM-Based Wireless Home
Appliances Monitoring & Control
System IEEE. Pp.237
6) Liang, Li-Chen Fu and Chao-Lin W, “An
integrated, flexible, and Internet-based
control architecture for home
automation system in the Internet era”,
The IEEE Proceedings of the
International Conference on Robotics
and Automation, Volume: 2,2002, pp:
1101 -1106.
7) W. Qinglong, F.Y. Wang and; L Yueton,
“A mobile-agent based distributed
intelligent control system architecture
for home automation”, The IEEE
International Conference on Systems,
Man, and Cybernetics”, Volume: 3, 200
1, pp: 1599 - 1605.
8) R. Shepherd “Bluetooth wireless
technology in the home”, Electronics &
Communication Engineering Journal,
V. 13, I. 5, Oct 2001, pp: 195 -203.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0202-5
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0301-1
Design and Implementation of Radix-2 & Radix-4
Booth Multipliers Using VHDL
S. S. Chauhan1, S.C. Yadav
2, A. R. Khan
3
Graphic Era University (E&CE Deptt.) 1, 2, 3
[email protected], [email protected], [email protected]
Abstract This paper presents Low power consumption and
smaller area are some of the most important criteria for the
fabrication of DSP systems and high performance systems.
Optimizing the speed and area of the multiplier is a major
design issue. However, area and speed are usually conflicting
constraints so that improving speed results mostly in larger
areas. In this paper, we try to determine the best solution to
this problem by comparing a few multipliers.
This project presents an efficient implementation of high
speed multiplier using the shift and add method, Radix_2,
Radix_4 modified Booth multiplier algorithm. In this paper
we compare the working of the three multiplier by
implementing each of them separately in Transversal FIR
filter.
Index Terms-Transversal FIR Filter, Booth algorithms,
VHDL, Xilinx.
1. INTRODUCTION
Multipliers are key components of many high
performance systems such as FIR filters, microprocessors,
digital signal processors, etc. A system’s performance is
generally determined by the performance of the multiplier
because the multiplier is generally the slowest clement in
the system. Furthermore, it is generally the most area
consuming. Hence, optimizing the speed and area of the
multiplier is a major design issue. However, area and
speed are usually conflicting constraints so that improving
speed results mostly in larger areas. As a result, a whole
spectrum of multipliers with different area-speed
constraints has been designed with fully parallel.
Multipliers at one end of the spectrum and fully serial
multipliers at the other end. In between are digit serial
multipliers where single digits consisting of several bits
are operated on. These multipliers have moderate
performance in both speed and area. However, existing
digit serial multipliers have been plagued by complicated
switching systems and/or irregularities in design. Radix
2^n multipliers which operate on digits in a parallel
fashion instead of bits bring the pipelining to the digit level
and avoid most of’ the above problems. They were
introduced by M. K. Ibrahim. These structures are iterative
and modular. The pipelining done at the digit level brings
the benefit of constant operation speed irrespective of the
size of’ the multiplier. The clock speed is only determined
by the digit size which is already fixed before the design is
implemented.
2. THE BASIC TRANSVERSAL FILTER
An N-Tap transversal was assumed as the basis for this
adaptive filter. The value of N is determined by practical
considerations. An FIR filter was chosen because of its
stability. The use of the transversal structure allows
relatively straight forward construction of the filter, as
shown in figure 1.
As the input, coefficients and output of the filter are all
assumed to be complex valued, and then the natural choice
for the property measurement is the modulus, or
instantaneous amplitude. If y (k) is the complex valued
filter output, then |y(k)| denotes the amplitude. The
convergence error p (k) can be defined as follows:
Aykpk−=)(
where the A is the amplitude in the absence of signal
degradations. The error p (k) should be zero when the
envelope has the proper value, and non-zero otherwise.
The error carries sign information to indicate which
direction the envelope is in error. The adaptive algorithm
is defined by specifying a performance/cost/fitness
function based on the error p (k) and then developing a
procedure that adjusts the filter impulse response so as to
minimize or maximize that performance function.
Yk = 10iNi=−=Σwk (i) xk-i
Figure 1: Transversal FIR Filter
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0301-2
The gradient search algorithm was selected to simplify the
filter design. The filter coefficient update equation is given
by:
WK+1 = WK – μ eK XK
Where XK is the filter input at sample k, eK is the error term
at sample k = pk . yk and μ is the step size for updating the
weights value.
3. MULTIPLIERS
3.1. BINARY Multiplier
A Binary multiplier is an electronic hardware device
used in digital electronics or a computer or other electronic
device to perform rapid multiplication of two numbers in
binary representation. It is built using binary adders.
The rules for binary multiplication can be stated as
follows
(i) If the multiplier digit is a 1, the multiplicand is
simply copied down and represents the product.
(ii) If the multiplier digit is a 0 the product is also 0.
For designing a multiplier circuit we should have
circuitry to provide or do the following three things:
It should be capable identifying whether a bit 0 or 1
is.
It should be capable of shifting left partial
products.
It should be able to add all the partial products to
give the products as sum of partial products.
It should examine the sign bits. If they are alike, the
sign of the product will be a positive, if the sign bits
are opposite product will be negative. The sign bit
of the product stored with above criteria should be
displayed along with the product. From the above
discussion we observe that it is not necessary to
wait until all the partial products have been formed
before summing them. In fact the addition of
partial product can be carried out as soon as the
partial product is formed.
Binary multiplication (eg n=4)
p=a×b
an−1 an−2…. a1 a0
bn−1 bn−2…. b1 b0
pn−1 pn−2…. p1 p0
where a – multiplicand, b– multiplier, p – product
x x x x a
x x x x b
---------
x x x x b0a20
x x x x b1a21
x x x x b2a22
x x x x b3a23
---------------
x x x x x x x x p
3.2. Multiply Accumulate Circuit
Multiplication followed by accumulation is an
operation in many digital systems, particularly those
highly interconnected like digital filters, neural networks,
data quantizes, etc. One typical AC (multiply-accumulate)
architecture is illustrated in figure. It consists of
multiplying 2 values, then adding the result to the
previously accumulated value, which must then be
restored in the registers for future accumulations. Another
feature of MAC circuit is that it must check for overflow,
which might happen when the number of MAC operation
is large. This design can be done using component because
we have already design each of the units shown in figure.
However since it is relatively simple circuit, it can also be
designed directly. In any case the MAC circuit, as a whole,
can be used as a component in application like digital
filters and neural networks
3.3. Architecture OF A RADIX 2^n Multiplier
The architecture of a radix 2^n multiplier is given in
the Figure. This block diagram shows the multiplication of
two numbers with four digits each. These numbers are
denoted as V and U while the digit size was chosen as four
bits. The reason for this will become apparent in the
following sections. Each circle in the figure corresponds to
a radix cell which is the heart of the design. Every radix
cell has four digit inputs and two digit outputs. The input
digits are also fed through the corresponding cells. The
dots in the figure represent latches for pipelining. Every
dot consists of four latches. The ellipses represent adders
which are included to calculate the higher order bits. They
do not fit the regularity of the design as they are used to
“terminate” the design at the boundary. The outputs are
again in terms of four bit digits and are shown by W’s. The
1’s denote the clock period at which the data appear.
3.4. BOOTH MULTIPLIER
The decision to use a Radix-4 modified Booth
algorithm rather than Radix-2 Booth algorithm is that in
Radix-4, the number of partial products is reduced to n/2.
Though Wallace Tree structure multipliers could be used
Figure 2: Radix 2n multiplier architecture
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0301-3
but in this format, the multiplier array becomes very large
and requires large numbers of logic gates and
interconnecting wires which makes the chip design large
and slows down the operating speed.
3.5. BOOTH MULTIPLICATION ALGORITHM:
(a) Booth Multiplication Algorithm for radix-2
Booth algorithm gives a procedure for multiplying
Binary integers in signed –2’s complement representation.
We will illustrate the booth algorithm with the following
example:
Example: 2ten*(-4) ten
0010two*1100two
Step 1: Making the Booth table
I. From the two numbers, pick the number with the
smallest difference between a series of consecutive
numbers, and make it a multiplier.i.e., 0010 -- From 0 to 0
no change, 0 to 1 one change, 1 to 0 another change, so
there are two changes on this one 1100 -- From 1 to 1 no
change, 1 to 0 one change, 0 to 0 no change, so there is
only one change on this one. Therefore, multiplication of 2
x (– 4), where 2ten (0010two) is the multiplicand and (– 4)ten
(1100two) is the multiplier.
II. Let X = 1100 (multiplier) Let Y = 0010 (multiplicand)
Take the 2’s complement of Y and call it –Y
–Y = 1110
III. Load the X value in the table.
IV. Load 0 for X-1 value it should be the previous first
least significant bit of X
V. Load 0 in U and V rows which will have the product of
X and Y at the end of operation.
VI. Make four rows for each cycle; this is because we are
multiplying four bits numbers.
U V X X-1
0000 0000 1100 0 Load the value
1st cycle
2nd
cycle
3rd
cycle
4th
cycle
Step 2: Booth Algorithm
Booth algorithm requires examination of the multiplier
bits, and shifting of the partial product. Prior to the shifting,
the multiplicand may be added to partial product,
subtracted from the partial product, or left unchanged
according to the following rules:
Look at the first least significant bits of the multiplier “X”,
and the previous least
significant bits of the multiplier “X - 1”.
I 0 0 Shift only
1 1 Shift only.
0 1 Add Y to U, and shift
1 0 Subtract Y from U, and shift or add (-Y) to U
and shift
II Take U & V together and shift arithmetic right shift
which preserves the sign bit of 2’s complement number.
Thus a positive number remains positive, and a negative
number remains negative.
III Shift X circular right shifts because this will prevent us
from using
two registers
for the X
value.
Repeat the same
steps until the four
cycles are completed.
We have finished four cycles, so the answer is shown,
in the last rows of U and V which is: 11111000two.
Note: By the forth cycle; the two algorithms have the
same values in the product register.
(b) Booth Multiplication Algorithm for radix-4:
One of the solutions of realizing high speed multipliers is
to enhance parallelism which helps to decrease the number
of subsequent calculation stages. The original version of
the Booth algorithm (Radix-2) had two drawbacks. They
are:
(i) The number of add subtract operations and the number
of shift operations becomes variable and becomes
inconvenient in designing parallel multipliers.
(ii) The algorithm becomes inefficient when there are
isolated 1’s. These problems are overcome by using
U V X X-1
0000 0000 1100 0
0000 0000 0110 0
0000 0000 0011 0
U V X X-1
0000 0000 1100 0
0000 0000 0110 0
0000 0000 0011 0
1110
1111
0000
0000
0011
1001
0
1
U V X X-1
0000 0000 1100 0
0000 0000 0110 0
0000 0000 0011 0
1110
1111
0000
0000
0011
1001
0
1
1111 1000 1100 1
Shift only
Shift only
Add-Y (0000+1110) = 1110
Shift only
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0301-4
modified Radix-4 Booth algorithm which scan strings of
three bits with the algorithm given below:
1) Extend the sign bit 1 position if necessary to ensure that
n is even.
2) Append a 0 to the right of the LSB of the multiplier.
3) According to the value of each vector, each Partial
Product will be 0, +y, -y, +2y or -2y.
The negative values of y are made by taking the 2’s
complement and in this paper Carry-look-ahead (CLA)
fast adders are used. The multiplication of y is done by
shifting y by one bit to the left. Thus, in any case, in
designing a n-bit parallel multipliers, only n/2 partial
products are generated.
4. RESULTS & CONCLUSION
This paper gives a clear concept of different multiplier and
their implementation in tap delay FIR filter. We found that
the parallel multipliers are much option than the serial
multiplier. We concluded this from the result of power
consumption and the total area. The power consumption
for radix-2 and radix-4 multiplier as shown on Table 2 and
Table 3 respectively.
Number of Slices 130
Number of 4 input LUTs 249
Number of bounded input 16
Number of bounded output 17
CLB Logic Power 79mW
Multiplier output
In case of parallel multipliers, the total area is much less
than that of serial multipliers. Hence the power
consumption is also less. This is clearly depicted in our
results. This speeds up the calculation and makes the
system faster. While comparing the radix 2 and the radix 4
booth multipliers we found that radix 4 consumes lesser
power than that of radix 2. This is because it uses almost
half number of iteration and adders when compared to
radix 2.When all the three multipliers were compared we
found that array multipliers are most power consuming
and have the maximum area. This is because it uses a large
number of adders. As a result it slows down the system
because now the system has to do a lot of calculation.
Multipliers are one the most important component of
many systems. So we always need to find a better solution
in case of multipliers. Our multipliers should always
consume less power and cover less power. So through our
project we try to determine which of the three algorithms
works the best. In the end we determine that radix 4
modified booth algorithm works the best.
REFRENCES
1. Y. C. Lim, “Single-Precision Multiplier with Reduced Circuit Complexity for Signal Processing Applications, ” IEEE Trans.
Computers, vol. 41, no. 10, pp. 1333-1336, Oct. 1992.
2. J. Isoaho, J. Pasanen, O. Vainio, and H. Tenhunen, “DSP System Integration and Prototyping with FPGAs,” Journal of VLSI Signal
Processing, Vol. 6, pp. 155-172, 1993.
3. S. S. Kidambi, F. El-Guibaly, and A. Antonious, “Area-Efficient Multipliers for Digital Signal Processing Applications, ” IEEE Trans.
Circuits and Systems-II: Analog and Digital Signal Processing, vol.
43, no. 2, pp. 90-95, Feb. 1996. 4. J. E. Stine and O. M. Duverne, “Variations on Truncated
Multiplication,” in Proc. Euromicro Symposium on Digital System Design, 2003, pp. 112-119.
5. C. Ebeling, C. Fisher, G. Xing, M. Shen, and H. Liu, “Implementing an
OFDM Receiver on the RaPiD Reconfigurable Architecture,” IEEE Trans. on Computers,Vol. 53, No. 11, pp. 1436-1448, 2004.
6. Xilinx Staff, “Celebrating 20 years of innovation,” Xcell Journal, No.
48, Spring 2004. 7. S. Knapp, “Using Programmable Logic to Accelerate DSP Functions,”
http://www.xilinx.com/appnotes/dspintro.pdf
X(i) X(i-1) X(i-2) y
0 0 0 +0
0 0 1 +y
0 1 0 +y
0 1 1 +2y
1 0 0 -y
1 0 1 -y
1 1 0 -2y
1 1 1 +0
Number of Slices 229
Number of 4 input LUTs 300
Number of bounded input 16
Number of bounded output 16
CLB Logic Power 47mW
Table1: Radix-4 modified Booth Algorithms scheme for odd values of i.
Table 2: Results of Radix-2 multiplier
Table 3: Results of Radix-4 multiplier
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0302-1
A Novel Approach to Design of a Multiplier Using
Reversible Logic Gates
S. S. Chauhan1, S.C. Yadav
2, A. R. Khan
3
Graphic Era University (E&CE Deptt.) 1, 2, 3
[email protected], [email protected], [email protected]
Abstract Reversible logic gates are very much in demand for
the future computing technologies as they are known to
produce zero power dissipation under ideal conditions. This
paper proposes an improved design of a multiplier using
reversible logic gates. Multipliers are very essential for the
construction of various computational units of a quantum
computer. The quantum cost of a reversible logic circuit can
be minimized by reducing the number of reversible logic
gates. For this two 4*4 reversible logic gates called a DPG
gate and a BVF gate are used.
Index Terms- Reversible logic circuits; Quantum computing;
Nanotechnology.
1. INTRODUCTION
Reversible logic has received great attention in the
recent years due to their ability to reduce the power
dissipation which is the main requirement in low power
VLSI design. Quantum computers are constructed using
reversible logic circuits. It has wide applications in low
power CMOS and Optical information processing,
quantum computation and nanotechnology. R. Landauer
[1] demonstrated that high technology circuits and
systems constructed using irreversible hardware result in
loss of one bit of information dissipates KTln2 joules of
energy where K is the Boltzmann‟s constant and T is the
absolute temperature at which the operation is performed.
The heat generated due to the loss of one bit of information
is very small at room temperature but when the number of
bits is more as in the case of high speed computational
works the heat dissipated by them will be so large that it
affects the performance and results in the reduction of
lifetime of the components. Furthermore, Bennett [2]
showed that reversible circuits do not lose information due
to the one-to-one mapping between inputs and outputs;
hence no extra energy loss.
In the design of reversible circuits two restrictions should
be considered:
Fan-out is not permitted
Loops are not permitted
Due to these restrictions, synthesis of reversible circuits
can be carried out from the inputs towards the outputs and
vice versa.
2. BACKGROUND OF REVERSIBLE CIRCUITS
An n×n reversible circuit consists of n inputs and n
outputs with mapping of each input assignment to a unique
output assignment and vice versa. Also in the synthesis of
reversible circuits direct fan-out is not allowed as
one–to-many concept is not reversible. However fanout in
reversible circuits is achieved using additional gates. A
reversible circuit should be designed using minimum
number of reversible logic gates.
A. Reversible Gates and Circuits
There are two main types of reversible gates: Toffoli [3]
and Fredkin [4]. An n×n Toffoli gate passes the first (n-1)
inputs to outputs unaltered (as control signals) and for the
last output the nth
input inverts (as target signal) if all the
previous (n-1) signals are „1‟. Assuming xi as
input and yi as output, then [3]:
yi= xi 1< i < n-1
yn= xn + (x1,x2….xn)
Toffoli Gate: A 3*3 Toffoli gate [3] as shown in figure 1.
The input vector is I (A, B, C) and the output vector is O (P,
Q, R). The outputs are defined by P=A, Q=B, R=AB xor C.
Quantum cost of a Toffoli gate is 5.
A Toffoli gate with one (two) input(s) is also known as
NOT (CNOT or Feynman) gate respectively.
Fredkin Gate: A 3*3 Fredkin gate [4] as shown in figure
2. The input vector is I (A, B, C) and the output vector is O
(P, Q, R). The output is defined by P=A, Q=A′ B xor AC
and R=A′ C xor AB. Quantum cost of a Fredkin gate is 5.
Fig.1 Toffoli Gate
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0302-2
BVF Gate: A 4 * 4 BVF gate as shown in figure 3. This is
a reversible double XOR gate and can be used for
duplication of the required inputs to meet the fan-out
requirements. The input vector is I (A, B, C, D), the output
vector is O (P, Q, R, S) and the output is defined by P = A,
Q = A xor B, R = C and S = C xor D. Quantum cost of a
BVF gate is 2. In the proposed design this gate is used to
copy the operand bits and it is shown that the number of
gates required to copy is reduced by 50% with same
quantum cost.
Peres Gate: A 3*3 Peres gate [10] as shown in figure 4. The input vector is I (A, B, C) and the output vector is O (P, Q, R). The output is defined by P = A, Q = A xor B and R=AB xor C. Quantum cost of a Peres gate is 4. In the proposed design Peres gate is used because of its lowest quantum cost.
Double Peres gate: A Double Peres Gate as shown
in figure 5. The inputs and outputs are as shown in Table-1.The full adder using DPG is obtained with C=0 and D= Cin and its quantum cost is calculated to be equal to 6 from its quantum realization [11] shown in figure 5.
Inputs Outputs
A B C D P Q R S
0 0 0 0 0 0 0 0
0 0 0 1 0 0 1 0
0 0 1 0 0 0 0 1
0 0 1 1 0 0 1 1
0 1 0 0 0 1 1 0
0 1 0 1 0 1 0 1
0 1 1 0 0 1 1 1
0 1 1 1 0 1 0 0
1 0 0 0 1 1 1 0
1 0 0 1 1 1 0 1
1 0 1 0 1 1 1 1
1 0 1 1 1 1 0 0
1 1 0 0 1 0 0 1
1 1 0 1 1 0 1 1
1 1 1 0 1 0 0 0
1 1 1 1 1 0 1 0
B. REVERSIBLE GATES IMPLEMENTED USING ELEMENTARY QUANTUM GATES Reversible implementations of 3×3 Toffoli, Peres and Fredkin gates using elementary quantum gates are shown in figure 6, figure 7, and figure 8 respectively.
3. PARALLEL MULTIPLIERS
There are two types of multipliers which are known as
sequential and parallel multipliers. The first type
iteratively computes the final product. It needs to use
feedbacks and loops to compensate for the iterative
portion. This design is too slow and not suitable for the
reversible implementation. The second type (i.e., parallel
multiplier), conventionally, consists of two main steps:
Partial product generation
Multi-operand addition
Algorithm 1 (The n×n parallel multiplier):
Inputs: Two n-bit operands
X: xn-1…….. x1, x0 , Y: yn-1…….. y1, y0
V V V+
++
V V V+
++
Fig.2 Fredkin Gate
Fig.3 BVF Gate
Fig.4 Peres Gate
Fig.6 Implementation of the 3×3 Toffoli gate [11]
Fig.7 Implementation of the 3×3 Peres gate [12]
V V+
++
V
Fig.8 Implementation of the 3×3 Fredkin gate [11, 13]
Fig.5 Double Peres Gate
Table 1 Truth Table of Double Peres Gate
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0302-3
Output: A 2n-bit product Z: zn-1…….. z1, z0
I. Generate n partial products
P: pin-1…….. pi1, pi0 where, 0 < i < n-1
Such that pij = xj* yi
II. Produce the final product Z= Σ pi
where, 0 < i < n-1
The operation of a 4*4 reversible multiplier is shown in
figure 9. It consists of 16 Partial product bits of the X and
Y inputs to perform 4 * 4 multiplications. However, it can
extended to any other n * n reversible multiplier.
3.1 Partial Product Generation
Partial products can be generated in parallel using 16
Peres gates as shown in figure 10.
An important point that should be considered is that in
an n×n parallel multiplier (in reversible logic) for
generating partial products in parallel, n copies of each bit
of the operands are needed. Therefore, some fan-out gates
are needed. The number of fan-out gates needed for the
reversible 4×4 multiplier is 24. It uses 4*4 BVF gates with
two constant inputs as shown in figure 11.
3.2 Multi-operand Addition (MOA)
As discussed in previous section, next step is an noperand
addition. To implement this part of circuit, we use carry
save adder (CSA). The CSA tree reduces the four
operands to two. Thereafter, a Carry Propagating Adder
(CPA) adds these two operands and produces the final
8-bit product. The proposed four operand adder shown in
figure 12 uses Double Perer Gate (DPG ) gate as a
reversible full adder and Peres gate as half adder.
The proposed reversible multiplier circuit uses 8
reversible DPG gates and 4 Peres gates. The Peres gate
half adder
has quantum cost of 4 and the DPG adder has quantum
cost of 6 and the total quantum cost of this circuit is 64.
4. RESULTS & DISCUSSION
We have encountered three different designs for
reversible multipliers in literatures where all of them, for
the sake of simplicity, have implemented their design for a
4-bit multiplier. Therefore, here in this section, we
compare our proposed multiplier with prior counterparts
based on the 4-bit reversible multiplier. In order to have a
reasonable comparison, first, we examine the detailed
implementation of the previous works. Next, compare the
proposed design based on the quantum cost, and the
number of garbage outputs with the previously mentioned
cases as follows:
x3 x2 x1 x0
y3 y2 y1 y0
p03 p02 p01 p00
p13 p12 p11 p10
p23 p22 p21 p20
p33 p32 p31 p30
z7 z6 z5 z4 z3 z2 z1 z0
Partial Product
Generation
Multi-
Operand
Addition
Fig.10 Partial product generator using Peres gates
Fig.9 The operation of the 4×4 parallel multiplier
Fig.11 Fan-out circuit to duplicate the operand bits
Fig.12 Four-operand Addition
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0302-4
A. Reversible 4-bit multiplier of [10]
For the partial product generation phase of their multiplier,
they used 24 gates of 2×2 Toffoli (TOF2), for preparing
the essential fan-outs. Moreover, 16 Fredkin gates are used
so as to generate the partial products. For the
multi-operand addition phase they used three 4-bit binary
adders, where each of them is composed of 4 TSG, plus
and extra TSG for the generation of the most significant bit
of the final product.
By and large, the overall gate consumption of their
reversible multiplier is equal to (24×TOF2) +
(16×Fredkin) + (13×TSG). The overall critical path of
their multiplier consists of two TOF2, a Fredkin gate, and
seven TSG gates. Unfortunately, there is no reference for
how the TSG can be implemented. Moreover, there is
nothing mentioned in [14] about how a TSG can be built
by means of elementary 2×2 reversible/quantum gates. For
the sake of a fair comparison we assume the QC, and GO
of a TSG gate as equal as that of a fulladder. Nevertheless,
we believe that the QC, and GO of a TSG gate are much
more than that of a FA.
B. Reversible 4-bit multiplier of [11]
For the partial product generation phase of their
multiplier, like that of [10], they used 24 gates of TOF2 for
preparing the necessary fan-outs. Moreover, 16 Peres
gates are used in order to generate the partial products.
For the multi-operand addition phase they used 12
MKG gates where a MKG gate is a 4×4 reversible gate.
Therefore, the overall gates used in their reversible
multiplier is (24×TOF2) + (16×Peres) + (12×MKG). The
overall critical path of their multiplier consists of two
TOF2 gates, a Peres gate, and seven MKG gates. As the
case for TSG, there is also no reference for the
implementation of the MKG. Therefore, although we
believe that the QC, Depth, and GO of a TSG gate is much
more than that of a FA, we assume, for the sake of a fair
comparison, the QC, Depth, and GO of a MKG gate the
same as that of a full-adder.
C. Reversible 4-bit multiplier of [12]
This multiplier and that of [11] are somehow the same
except for the multi-operand addition phase which is
implemented in [12] by means of 8 HNG gates along with
four Peres gates. This modification leads to the following
critical path: (2×TOF2) + (2×Peres) + (6×HNG).
D. The proposed reversible 4-bit multiplier
In the proposed design for the partial product generation
phase, like those of [11] and [12], we take advantage of the
Peres gates in order to generate the partial products. For
the multi-operand addition phase as is shown in Fig. 15,
we use 8 full-adders and 4 halfadders. The critical path of
this new design consists of two TOF2 plus a Peres gate for
the partial product generation phase and 5 full-adders plus
a half-adder for the multi-operand addition phase. Table-2
gives the comparative study of partial product generation
of the circuit.
Partial
Product
generation
No
of
gates
N
No of
Garbage
outputs
GO
Quantum
cost
QC
Proposed 20 32 88
TSG [10] 40 32 104
MKG [11] 40 32 88
HNG [12] 40 32 88
Table-3 gives the comparative study of multi-operand
addition of the proposed design with other existing
designs.
Multi-operand
addition
(MOA)
No
of
gates
N
No of
Garbage
outputs
GO
Quantum
cost
QC
Proposed 12 20 62
TSG [10] 13 26 130
MKG [11] 12 24 120
HNG [12] 12 20 64
Table-4 Comparative study of different reversible
multipliers as shown in Table-4.
Reversible
multipliers
No
of
gates
N
No of
Garbage
outputs
GO
Quantum
cost
QC
Proposed 40 50 150
TSG [10] 13 26 130
MKG [11] 52 56 208
HNG [12] 53 58 234
From the above study in our opinion the proposed design
is better when compared to the other existing designs as
the total circuit cost is much less compared to the other
designs.
4. CONCLUSION
Multiplier is a basic arithmetic cell in computer arithmetic
units. Furthermore, reversible implementation of this unit
is necessary for quantum computers. For this purpose,
various designs can be found in the literature. We
proposed in this paper a novel reversible multiplier, no
increase in quantum cost or the number of garbage outputs
with respect to previous counterparts. In proposed design,
partial products were generated using Peres gates. Next,
the final product was obtained using a multi-operand adder
including CSA tree and carry propagate addition,
REFERENCES
TABLE.2 Partial product generation
TABLE.3 Multi-operand addition (MOA)
TABLE.4 Comparative study of different reversible multipliers
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0302-5
[1] R. Landauer, "Irreversibility and heat generation in the computing
process", IBM J. Res. Develop., Vol. 5, pp. 183–191, July 1961.
[2] C.H. Bennett, “Logical Reversibility of Computation”, IBM Research and Development, pp. 525-532, November 1973.
[3] T. Toffoli, "Reversible computing", MIT, Tech. Rep., 1980.
[4] E. Fredkin and T. Toffoli, “Conservative logic,” Int‟l J. Theoretical
Physics, Vol. 21, pp.219–253, 1982 [5] A. Peres, “Reversible logic and quantum computers”, Physical
Review A, Vol 32, pp. 3266-3276, 1985. [6] J. A.Smolin and D. P.DiVincenzo, “Five Two-Bit Quantum Gates are
Sufficient to Implement the Quantum Fredkin Gate”, Physical
Review A (Atomic, Molecular, and Optical Physics), Vol. 53, No. 4, pp. 2855-2856, April 1996.
[7] D. Maslov, G. W. Dueck and D. M. Miller, “Simplification of Toffoli
Networks via Templates”. Proc. 16th Symposium on Integrated Circuits and Systems Design, pp. 53-58, September 2003.
[8] W. N. N. Hung, X. Song, G. Yang, J. Yang and M. A Perkowski,
“Quantum Logic Synthesis by Symbolic Reachability Analysis”, Proc. 41st annual conference on Design automation DAC,
pp.838-841, January 2004.
[9] D. Maslov, C. Young, D. M. Miller, and G. W. Dueck, “Quantum Circuit Simplification Using Templates”, Proc. Design Automation
and Test in Europe (DATE), Vol 2, pp.1208-1213, March 2005.
[10] H. Thapliyal and M.B. Srinivas, “Novel Reversible Multiplier Architecture Using Reversible TSG Gate”, Proc. IEEE International
Conference on Computer Systems and Applications, pp. 100-103,
March 2006. [11] M. Shams, M. Haghparast and K. Navi, “Novel Reversible
Multiplier Circuit in Nanotechnology”, World Applied Science
Journal Vol. 3, No. 5, pp. 806-810, 2008. [12] M. Haghparast, S. Jafarali Jassbi, K. Navi and O.Hashemipour,
“Design of a Novel Reversible Multiplier Circuit Using HNG Gate
in Nanotechnology”, World Applied ScienceJournal Vol. 3 No. 6, pp. 974-978, 2008.
[13] M.S. Islam et al., “Low cost quantum realization of reversible
multiplier circuit”, Information technology journal, 8 (2009) 208.
VHDL environment for floating point Arithmetic Logic Unit - ALU design and simulation
1Rajit Ram Singh 2Vinay Kumar Singh 3poornima shrivastav 4Dr. GS [email protected] VINDHYAIndore- India
[email protected] Motors Ltd. Luck now -India
[email protected] Gwalior -India
ABSTRACTVHDL environment for floating point arithmetic and logic unit design using pipelining is introduced; the novelty in the ALU design.Pipeling provides a high performance ALU. Pipelining is used to execute multiple instructions simultaneously. In top-down design approach, four arithmetic modules, addition, subtraction, multiplication and division are combined to form a floating point ALU unit. Each module is divided into sub- modules. Two selection bits are combined to select a particular operation. Each module is independent to each other .all modules in the ALU design are realized using VHDL, design functionalities are validated through VHDL simulation .all components and module is successfully run, Synthesisand Simulation in the Xilinx12.1i software.
Keywords: ALU- Arithmetic Logic Unit, Top-Down design, Validation, Floating point, Test-Vector\
I.INTRODUCTION
Floating point describes a system for representing numbers that would be too large or too small to be represented as integers. Floating point representation is able to retain its resolution and accuracy compared to fixed point representation. Numbers are in general represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16. The typical number that can be represented exactly is of the form:×
Significant digits × Baseexponent
S ×Be
IEEE 754 standard for floating point representation in 1985. Based on this standard ,floating point representation for digital system should be platform –independent and data are interchanged freely among different digital systems.
Arithmetic logic unit (ALU) is a digital circuitthat performs arithmetic and logical
operations. The ALU is a fundamental building block of the central processing unit(CPU) of a computer. e inputs to the ALU are the data to be operated on (called operands) and a code from the control unit indicating which operation to perform. Its output is the result of the computation.
In many designs the ALU also takes or generates as inputs or outputs a set of condition codes from or to a status register. These codes are used to indicate cases such as carry-in or carry-out, overflow, divide-by-zero, etc. Floating Point Unit also performs arithmetic operations between two values, but they do so for numbers in floating point representation. And the ALU with floating point operations is called a FPU.
Top-down approach (is also known as step-wise design) is essentially the breaking down of a system to gain insight into its compositional sub-systems. In a top-down approach an overview of the system is formulated, specifying but not detailing any first-level subsystems. Each subsystem is then refined in yet greater detail, sometimes in many additional subsystem levels, until the entire specification is reduced to base elements. A top-down model is often specified with the assistance of "black boxes", these make it easier to manipulate. However, black boxes may fail to elucidate elementary mechanisms or be detailed enough to realistically validate the model
In order to stimulate a device off board, a series of logical vectors must be applied to the device inputs. These vectors are called test vectors and are mostly used to stimulate the design inputs and check the outputs against the expected values.
An pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput
(the number of instructions that can be executed in a unit of time).
The fundamental idea is to split the processing of a computer instruction into a series of independent steps, with storage at the end of each step. This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once. The term pipeline refers to the fact that each step is carrying data at once (like water), and each step is connected to the next (like the links of a pipe.)The origin of pipelining is thought to be the IBM Stretch project(1954) .Implementing pipeline requires various phases of floating point operations be separated and be pipelined into sequential stages. We propose VHDL environment for floating point ALU design and simulation. To ease the description, verification, simulation and hardware realization. VHDL is widely adopted standard and has numerous capabilities that are suited for designs of this sort .the use of VHDL for modeling is especially appealing since it provides formal description of the system and allows the use of specific description styles to cover the different abstraction levels(architectural, register , transfer and logic level) employed in design .
II MATERIAL AND METHODS
The main objective of this paper is to describes the implementation of pipelining in design the floating -point ALU using VHDL.. the sub-objective s are to design a 16-bit floating point ALU operating on the IEEE 754 standard rd .floating point representations ,supporting the four basic arithmetic operations; addition, subtraction, multiplication and division .second sub-objective is to model the behavior of the ALU design using VHDL.Specifications for a 16-bit floating-point ALU design.
i. Input A and B and output result are 16-bit binary floating point.
ii. Operands A and B operate as follows A (operation) B=results Operation can be addition (+), subtraction (-), Multiplication (*), division (/)iii. ‘Selection’ a 2-bit input signal that selects
ALU operation and operate as shown in table1.iv. Status a 4-bit output signal work as a flag an
microprocessor.
Table1: select ALU operation.
v. Clock pulse is only provided to the module which is selected using demux.vi. Concurrent processes are used to allow processes to run in parallel hence pipelining
Fig:1 top level view of the ALU design
ALU is separated into smaller modules: addition,subtraction,moltiplication and division,demux and mux.each arithmetic module is further divided into smaller modules .the top level view of fig.1 shows the top level view of the ALU .it consist of four functional arithmetic modules, three demultiplexes and two multiplexers. the demuxs and muxes are used to route input operands and the clock signal to the correct functional modules .they also route outputs and status signals based on the selector pins.
Output status0000 Normal operation0001 Overflow0010 Underflow0100 Result zero1000 Divide by zero
Selection Operation
00 Addition 01 Summation 10 Multiplication 11 Division
Fig: 2 view of selection of a add module
After a module completes its task, outputs and status signals are sent to the muxes where they multiplexes with other outputs from corresponding modules to produce output result selector pins are routed to these muxes such that only the output from currently operating functional module is sent to the output port. Clock is specifically routed rather then tied permanently to each module since only the selected functional modules need clock signals. This provides power savings since the clock is supplied to the required modules only and avoid invalid results at the output since the clock is used as a trigger in every process.
Pipelining floating point addition module:
Addition module has two 16 bit inputs and one16 bit output selection input is used to enable or disable the module this module is further divided into 4 sub modules zero check, align, add_ sub and normalize module.
Fig: 3 pipeline floating point addition
Zero check modules:
This module detects zero operands early in the operation and based on the detection result it has two status signals. This eliminates the need of sub sequent processes to check for the presence of zero operands table 1 summarize the algorithm
Tab:1 setting zero check bit
Align moduleIn this module operations are perform based on status signal from previous stage zero operands are Checked in the align module as well this module introduces implied into the operands shown in table.
Tab:2 setting of implied bit
Add_ sub moduleThis module performs actual addition and subtraction of operands. Firstly operands are checked via the status signals are carried out results are automatically obtained if either of the operand are zero shown in table 3 normalization is needed if no calculation are done here the operation is done based on the science and the relative magnitude of mantissa i.e. summaries in table 4 status signal is set to one is indicate the need of normalization by the next stage
Zero_a2 &zero_b2
Zero_a1 xor zero_b1
Zero_a2 Result
0 0 X Perform add_sub
0 1 1 b stage20 1 0 a stage2
1 X X 0
Tab:3 check for add_sub module
Tab: 4 add_sub operation
I/P a I/P b Zero_a1 Zero_b10 0 1 10 NZ 1 0NZ 0 0 1NZ NZ 0
Zero_a1 xor zero_b1
a_sign Implied bit for a
Implied bit for b
0 X(do’t care)
0 0
1 1 0 11 0 1 0
Operation a_sign xor b_sign
a>b Result Sign
a + b 0 X a+b +ve(-a)+(-b) 0 X a+b -vea+(-b) 1 Yes a-b +vea+(-b) 1 No b-a -ve(-a)+b 1 Yes a-b -ve(-a)+b 1 No b-a +ve
Normalize module
Input is normalize and packed into the IEEE 754 floating point representation if the normalize status signal is set normalization is perform otherwise MSB is dropped .
Pipeline floating point subtraction module:
Subtraction module has two 16-bits inputs and one 16-bit output. Selection input is used to enable/ disable the entity depend on the operation. This module is divided further into four sub-modules: zero-check alignsadd_sub and normalize module. The subtraction algorithm differs only in the add_sub module where the subtraction operator changes the sign of the result. the reaming three modules are similar to those in the addition module.tab5 and tab 6 summarize the operation
Tab: 5 checks for add_sub module
Operation a_sign xor b_sign
a>b Result sign
(-a)-b 1 X a+b -vea-(-b) 1 X a+b +ve(-a)-(-b) 0 Yes a-b -ve(-a)-(-b) 0 No b-a +vea-b 1 Yes a-b +vea-b 1 No b-a -ve
Tab: 6 add_sub operation and sign fixing
Pipelined floating point multiplication module
Multiplication entity has three 16-bit inputs and two 16-bit outputs. Selection input is used to enable/disable the entity. multiplication module is divided into check-zero, check-sign, add-exponent and normalize –and-concatenate all modules, which are executed concurrently .status signal indicates special result cases such as overflow, underflow and result zero, in this project pipelined floating point multiplication is divided in to three stages(fig-4).stage1 checks whether the
operand is zero and report the result accordingly.stage2 determines the product sign, add exponents and multiply fractions.stage3 normalize and concatanitate the product.
Fig 4. Pipeline structure of multiplication module
Check-zero moduleInitially two operands are checked to determine whether they contain a zero .if one of the operand is zero ,the zero_flag is set to 1 .the output results zero. if neither of them is zero then the inputs with IEEE 754 format is unpacked and assigned to the check sign, add exponent and multiply mantissa modules, the mantissa is packed with hidden bit 1.
Add exponent moduleThe module is activated if the zero flag is set .else zero
is passed to the next stage and exp_flag is set to 0,two extra bit are added the exponent indicating overflow and underflow.
Multiply mantissa moduleIn this stage zero_flag is checked first. if the zero_flag is set to 0,then no calculation and normalization is performed. The mant_flag is set to 0 if both the operands are nonzero after the multiplication is done mant_flag is set to 1 to indicate that this operation is executed.
Check sign moduleThis module determines the product sign of two operands .the product is positive, when the two operands have the same sign; otherwise it is negative. The sign bit are compared using XOR circuit. the sign_flag is set to 1.Normalize and concatenate module
This module checks the overflow and underflow occurs if the 9th bit is 12.overeflow occurs if the 8th bit is 1.if exp_flag, sign_flag and mant_flag are set, the normalization is carried out. Otherwise, 16-zero bits are assigned to the result.
Zero_a2 &zero_b2
Zero_a2 xor zero_b2
Zero_a2
b_sign Result sign
0 0 X X Perform add_sub
NA
0 1 1 0 b_stage2 b_sign=10 1 1 1 b_stage2 b_sign=00 1 0 X a_stage2 a_sign1 X X X 0 NA
During the normalization operation, the mantissa MSB is 1, hence no, normalization is needed. the hidden bit is dropped and the reaming bit is packed and assigned to the output port .normalization module set the mantissa MSB to 1.the current mantissa is shifted left until 1 is encountered .foe each shift the exponent is decreased by 1,if the mantissa MSB is 1,normalization is completed and first bit is the implied bit dropped. Theremaining bits are packed and assigned to the output port. The final normalization product with the correct biased exponent is concatenated with product sign.
Pipelined floating point division module
Division entity has three 16-bit inputs and two 16-bit outputs. Selection input is used to enable or disable the entity. Division module is divided into six modules: check zero, align, dividend check sign, subtract exponent, divide mantissa and normalize concatenate modules. Each module is executed concurrently. Status indicates the special cases such as overflow, underflow, and result zero and divides by zero. Fig shows the pipeline structure of the division module.
Fig: 5 pipeline structure of the division module
Check-zero modules:
Initially two operands are checked to determine whether they contain a zero .if one of the operand is zero, the zero_flag is set to 1 .the output results zero. Ifneither of them is zero then the inputs with IEEE 754 format is unpacked and assigned to the check sign, add exponent and multiply mantissa modules, the mantissa is packed with hidden bit 1.
Add exponent module:
The module is activated if the zero flag is set .else zero is passed to the next stage and exp_flag is set to 0,two extra bit are added the exponent indicating overflow and underflow.Multiply mantissa module:
In this stage zero_flag is checked first. if the zero_flag is set to 0,then no calculation and normalization is performed. The mant_flag is set to 0 if both the operands are nonzero after the multiplication is done mant_flag is set to 1 to indicate that this operation is executed.
Check sign module:
This module determines the product sign of two operands .the product is positive, when the two operands have the same sign; otherwise it is negative. the sign bit are compared using XOR circuit. the sign_flag is set to 1.
Align dividend module:
This module compares both mantissas. if mant_a is greater than or equal to the msant_b then the mant_a must be aligned .for every bit right shift of the mant_a mantissa ,the mant_a exponent is then increased by 1.this increase may result in an exponent overflow, in this case an overflow flag is set. Otherwise, the process continues with the parallel operation of exponent subtraction and mantissa division. Align_flag is set to 1.
Subtract exponent module
This module is activated if the zero flag is set. if not ,zero value is passed to the next stage and exp_flag is set to 0.two extra bits are added to the exponent to indicate overflow .here two exponents are subtracted .the bias is added back. after this the exp_flag is set to 1.Divide mantissa module
In this stage ,align flag is checked first. if align flag is 0 then no mantissa division is performed .mant_flag is set to 0.if both operand are not zero, mant_a is divided by mant_b .in division algorithm ,comparison between two mantissa is done by subtracting the two values and checking the output sign.
III .SIMULATION AND DISCUSSION
Design is verified through simulation, which is done in a bottom –up fashion .small modules are simulated in separate test benches before they are integrated andtested as a whole.Align RTL1:
Simulation Result of Align:
RTL of Demux:
Demux wave:
Multiplexer:
Simulation result of Mux:
RTL of division:
RTL division:
Iv COCLUSION
By simulating with various test vectors the proposed approach of pipeline floating point proposed approachOf pipeline floating point ALU design using VHDL is successfully designed, tested and implemented currently, we are conducting further research that consider the further reduction in the hardware complexity in terms of synthesis and fully download the code into Altera FLEXIOK: EPFIOKIOLC, FPGA chip on LC 84 package for hardware realization
Reference:
[IIANSIWEE Std 754-1985, IEEE Standard forBinary Flooring-Point Arithmetic, IEEE, NewYork, 1985.[2]M. Daumas, C. Finot, "Division of Floating PointExpansions with an Application to the
Computation of a Determinant", Journal o/
Universol Compurer Science, vo1.5, no. 6, pp. 323-338, June 1999.[3]AMD Athlon Processor techmcal brief, AdvanceMicro Devices Inc., Publication no. 22054, Rev. D,Dec. 1999.[4]S. Chen, B. Mulgeew, and P. M. Grant, "A
Clustering techmque for digital communicationsChannel equalization using radial basis functionNetworks,'' IEEE Trans. Neural Networks, vol. 4,pp. 570-578, July 1993.[5] Mamu Bin Ibne Reaz, MEEE, Md. Shabiul Islam, MEEE, Mohd. S. Sulaiman, MEEE. ICSE2002 Proc. 2002,penang-Malaysia.
Simulation of division:
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0401-1
OTRA based Grounded Inductor and its application
Rajeshwari Pandey(member IEEE),Neeta Pandey(member IEEE),Ajay Singh,
B.Sriram, Kaushalendra Trivedi
Delhi Technological University, Delhi
Abstract — In this a lossless grounded
inductor has been proposed using
Operational Transresistance Amplifier
(OTRA). PSPICE Simulation results have
been included to demonstrate the
performance and verify the theoretical
analysis.
Index Terms— Inductor simulators,
OTRA , grounded inductor.
I. INTRODUCTION
The Operational transresistance amplifier
(OTRA) is gaining considerable attention
amongst analog integrated circuit designers
as it inherits all the advantages offered by
current –mode techniques. The OTRA is a
high gain current input voltage output
device. The input terminals of OTRA are
internally grounded, thereby eliminating
response limitations due to parasitic
capacitances and resistances at the input[1].
Although the OTRA is commercially
available from several sources under the
name of current differencing amplifier or
Norton amplifier, it has not gained attention
until recently. These commercial
realizations do not provide internal ground
at the input port and they allow the input
current to flow in one direction only. The
former disadvantage limits the functionality
of the OTRA where as the later forces to use
external DC bias current leading to complex
and unattractive designs [2]. Several high
performance CMOS OTRA topologies have
been proposed in literature [1,2,3,4,] leading
to growing interest in OTRA based analog
signal processing circuits. In recent past
OTRA has been extensively used as an
analog building block for realizing a number
of signal processing circuits such as
filters[5,6,7,8], oscillators[9,10,11],
multivibrators[12,13] and immittance
simulation circuits [9,14,15,16] an
application which has been dealt with in this
paper.[14] presents simulation of lossy
grounded inductor, whereas a negative
inductance has been proposed in
[15].Lossless grounded inductor topologies
have been presented in [9,16].In this paper
another lossless grounded inductor topology
with its applications has been proposed
which will give further flexibility to analog
circuit designers.
II. CIRCUIT DESCRIPTION
OTRA is a three terminal device, shown
symbolically in Fig.1 and its port relations
can be characterized by matrix ((1)
(1)
Fig.1 OTRA Circuit symbol
For ideal operations the transresistance gain
Rm approaches infinity and forces the input
currents to be equal. Thus OTRA must be
used in a negative feedback configuration.
The proposed circuit is shown in Fig. 2.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0401-2
Fig. 2. Grounded Inductor
Routine analysis yields
(2)
subject to the condition
(3)
For simulation CMOS implementation of
the OTRA, proposed in [4] and reproduced
in Fig. 3, was used. Aspect ratios used for
different transistors are same as in [4] and
are given in Table.1.The supply voltages
taken are ± 1.5 V for SPICE simulation.
Fig. 3. CMOS Implementation of OTRA[4]
Table.1 Aspect ratio of the transistors in
OTRA circuit
Transistor W(µm)/L(µm)
M1-M3 100/2.5
M4 10/2.5
M5,M6 30/2.5
M7 10/2.5
M8-M11 50/2.5
M12,M13 100/2.5
M14 50/0.5
III.APPLICATION
The proposed inductor is used to design (i)A
high pass filter (ii)an LC oscillator
A. High Pass Filter
A high pass filter, as shown in Fig. 4(a), can
be constructed using proposed inductor. The
transfer function for high pass response is
(4)
Where
, (5)
Fi
g.4 (a) High Pass Filter
To verify theoretical propositions a HP filter
with cutoff frequency 159 KHz is designed
for which the component values are
computed as R=1KΩ, C=1nF and Leq
=1mH.For this value of Leq component
values are chosen as =
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0401-3
1K, and =1nF. The frequency
response of the filter simulated using
PSPICE is depicted in Fig. 4(b) and is found
to be in close agreement with theoretical
response.
Fig. 4 (b) HP Response
B. LC Oscillator
An LC oscillator is designed as a signal
generating application, employing proposed
inductor, and is shown in Fig. 5(a). The
condition of oscillation and frequency of
oscillation are given as
(6)
(7)
A typical simulation for component values
=1K,
=10pF, which results in Leq =0.1mH, and
C=1nF is shown in Fig. 5(b). The simulated
frequency of oscillation is 775 KHz and is in
close agreement with the theoretically
calculated value of 795.77 KHz. Fig. 5(c)
shows the output frequency spectrum. Total
harmonic Distortion is measured as
4.906%.
Fig.5 (a) Oscillator
Fig.5 (b) Oscillator Output.
Fig. 5(c) Frequency Spectrum
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0401-4
V. CONCLUSION
A new OTRA based lossless grounded
inductor topology is presented. A high pass
filter and an oscillator are realized to
illustrate the applications of the proposed
topology. PSPICE simulation results are
included to verify the theoretical
prepositions.
It is expected that the proposed circuits will
be useful in design of analog signal
processing and generation applications and
will provide further possibilities to the
designer in the field.
References
[1]J.-J.Chen,H.-W.Tsao and C.-C.Chen,
“Operational Transresistance Amplifier
using CMOS Technology” Electronics
letters Vol.28, No.22, pp.2087-2088,
October 1992.
[2]K. N. Salama and A. M. Soliman,
“CMOS OTRA for analog signal processing
applications.” Microelectron. J. 30, pp. 235–
245, 1999.
[3]Hasan Mostafa, Ahmed M. Soliman, “A
Modiefied realization of the OTRA”,
frequenz 60(2006) pp 70-76.
[4]Abedelrahman K.kafrawy and Ahmed M.
Soliman, “A modified CMOS differential
OTRA” Int.J. Elect. Comm. (AEU), Vol 63,
issue12, Dec2009, pp 1067-1071
[5] Selcuk Kilinc, Ugur Cam, “Cascadable
allpass and notch filters employing single
operational transresistance amplifier”,
Computers and electrical Engineering
31(2005), pp 391-401.
[6] Cem Cakir, Ugur Cam and Oguzhan
Cicekoglu, “Novel All pass Filter
Configuration Employing Single OTRA”,
Ieee Transactions on Circuits and systems-
II: Express briefs,Vol. 52,No.3,march 2005,
pp 122-125.
[7] J.-J.Chen,H.-W.Tsao and S.-I.Liu,
“Parasitic- capacitance-insensitive current-
mode filters using OTRA” IEE Proc.-
Circuits Devices Syst., Vol. 142, No.3 June
1995.
[8]Ahmet Gokcen, Ugur Kam, “MOS-C
single amplifier biquads using the OTRA”
Int.J. Elect. Commun. (AEU), Vol 63,
(2009), pp 660-664.
[9]K.N. Salama and A.M. Soliman, “Novel
oscillators using operational transresistance
amplifier,microelectron.j.,31, 39-47,2000.
[10]U. Cam, “A Novel Single-Resistance-
Controlled Sinusoidal Oscillator Employing
Single Operational Transresistance
Amplifier”, Analog Integrated Circuits and
Signal Processing, Vol. 32, pp. 183-186,
August 2002.
[11]Rajeshwari Pandey, Mayank Bothra,
“Multiphase Sinusoidal oscillator using
Operational Transresistance Amplifier”,
IEEE Symposium on Industrial Electronics
and Applications (ISIEA-2009), pp 371-
376,oct 2009.
[12] C.L.Hou, H. C. Chien and Y. K. Lo, “
Squarewave generators employing OTRAs,
IEE proc.-Circuits Devices Syst., Vol.152,
no. 6, Dec 2005
[13] Y. K. Lo, H. C. Chien, H. G. Chiu
“Switch Controllable OTRA Based Bistable
Multivibrator,” IET Circuits Devices Syst.,
2008, Vol. 2, No. 4, pp. 373–382.
[14] U.Cam, F.Kacar, CommunicationO.
Cicekoglu, h. Kuntman and A.Kuntman,
“Novel grounded parallel immittance
simulator topologies employing single
OTRA,” AEU- Int. J Electronics and
Communications,vol. 57, no.4, pp. 287-
290,2003.
[15] Selcuk Kilinc, Khaled n. Salama,and
Ugur Cam, “Realization of fully
Controllable negative Inductance with single
operational Transresistance
Amplifier”Circuits Systems Signal
Processing,Vol 25,no.1, pp.47-57,2006
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0401-5
[16] U.Cam, F.Kacar, CommunicationO.
Cicekoglu, h. Kuntman and A.Kuntman,
“Novel two OTRA-based grounded
Immittance simulator topologies,” Analog
Integrated circuit and Signal Processing
,Vol. 29,pp. 233-235,2001.Analog
Integrated circuit and Signal Processing
,Vol. 39,pp. 169-175,2004.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0402-1
OTRA based Precision Full Wave Rectifier Rajeshwari Pandey (member IEEE)
, Ajay Singh, B.Sriram, Kaushalendra Trivedi
Department of Electronics and Communication Engineering, Delhi Technological University, Delhi
Abstract — This paper presents an
operational transresistance amplifier based
precision full-wave rectifier using an all-
pass filter as a 90 phase shifter. The circuit
gives a dc output voltage that is almost the
same as the peak input voltage over a
frequency range of 50 Hz–30 MHz with a
very low ripple voltage having low
harmonic distortion.
Index Terms—OTRA, All-pass filter,
harmonic distortions, precision rectifier,
ripple voltage.
I. INTRODUCTION
State-of-the-Art analog integrated circuit
design is receiving a tremendous boost due
to the development and application of
current-mode processing[1].It is well known
that the key performance features of current-
mode technique are inherent wide
bandwidth which is virtually independent of
closed loop gain, greater linearity and large
dynamic range. Recently operational
transresistance amplifier (OTRA) has
emerged as an effective alternate analog
building block. It is a high gain current
input, voltage output amplifier [2].OTRA
being a current processing building block
inherits all the advantages of current mode
technique. It is also free from parasitic input
capacitances and resistances as its input
terminals are virtually grounded thus
eliminating response limitations due to
parasitics. OTRA is now being used as an
analog building block for realizing a number
of circuits having applications in signal
processing and generation[2-6 ].
Precise rectification function is one of the
important requirements in instrumentation
and measurement. It finds applications in ac
voltmeters, ammeters, signal-polarity
detectors, averaging circuits, sample-and-
hold circuits, peak value detectors and
amplitude-modulated signal detectors [7-
10]. In general diodes are used as a rectifier
having the drawback of threshold voltage,
and hence rectification is not permitted
below a voltage of ∼0.7 V for a silicon
diode and ∼0.3 V for a germanium diode.
Low-voltage rectification is required in
applications such as amplitude modulated
signal detectors. Slew rate limitation
prevents the fast turning on of the diodes in
high frequency range and thus results in
distortion. In view of above a precision
rectifying circuit using OTRAs has been
proposed in this paper. The performance of
the circuit has been verified in the frequency
range 50Hz-30MHz using P-SPICE.
II. PROPOSED RECTIFIER CIRCUIT
The circuit symbol of OTRA is shown in
Fig.1and its port relations can be
characterized by the following matrix:
Fig.1 OTRA Circuit symbol
II. CIRCUIT DESCRIPTION
OTRA is a three terminal device, shown
symbolically in Fig.1 and its port relations
can be characterized by matrix ((1)
(1)
Fig.1 OTRA Circuit symbol
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0402-2
For ideal operations the transresistance
gain Rm approaches infinity and forces the
input currents to be equal. Thus OTRA must
be used in a negative feedback
configuration.
Fig.2(a) shows the block diagram of the
proposed rectifier circuit. It consists of an
all-pass filter that acts as a 90° phase shifter,
two squaring circuits, one summer, and one
square rooter. The phase of the input
sinusoidal signal Vin = A sin (2πft) is shifted
by 90° by adjusting the resistance (R) and
capacitor (C) of the RC network of the all-
pass filter in accordance with equation (2).
The amplitude of phase-shifted output of all
pass filter remains same as that of input
signal.
φ = −2 tan−1 (2πfRC) = 90. (2)
The output of the all-pass filter can be
written as Vp = Acos(2πft). The squaring of
Vin and Vp is done by using analog
multiplier. These squared signals are
summed up using summer circuit
implemented through OTRA. The summed
signal, after square rooting, becomes ~A,
which provides a rectified output.
Fig.2 (a) block diagram of proposed circuit
Fig.2(b)Circuit diagram of proposed circuit
III. SIMULATION RESULTS
To verify the theoretical propositions the
rectifier circuit is simulated using P-SPICE
program. For simulation C-MOS
implementation of OTRA, proposed in [11]
and reproduced in Fig 3, was used.
Simulation was carried out for frequency
range 50Hz-30MHz and the results are
compared with diode based full wave
rectifier circuit.
Fig.3 CMOS implementation of OTRA[11]
A. Rectified Output
The waveform tests were performed for both
the proposed circuit and previously reported
circuits. Fig 4 (a) shows an input sinusoidal
signal of frequency 100Hz. fig4. (a) Shows
input signal, rectified output of the proposed
circuit has been shown in fig 4(b).
Fig. 4(a) sinusoidal input
Fig.4 (b) Rectified output of proposed
circuit
It is seen that output of the proposed circuit
contains less ripple in comparison to
previously reported circuit [7] in which one
diode conducts for one half cycle and other
diode conducts for the other half cycle as
shown in fig.4(c).
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0402-3
Fig.4(c) Rectified output of diode based
Fig.6 (a)10mV,100KHz Input signal and 90
degrees phase shifted signal
In the proposed circuit, rectification is not
performed by diodes, and
therefore, it has fewer ripples.
Low voltage rectification i.e. below the
threshold level of the diode was also carried
out. Fig 5shows typical output of the
proposed circuit for 100Hz frequency.
Fig5(a) sinusoidal input of frequency 100
Hz and amplitude 10mV along with 90
degrees phase shifted signal
Fig 5(b) rectified output with Input signal
Similarly a high frequency signal of
frequency 100KHz and amplitude of 10mV
is analyzed and the result is shown in, (b) is
rectified output.
Fig 6(b) rectified output with Input signal.
B. Harmonic Distortion
The harmonics in the signal cause distortion
in the output of the circuit. Thus the
harmonic components are required to be
examined for circuit performance analysis.
Being periodic in nature, these harmonic
components can be analyzed by Fourier
series. The magnitude of each harmonic of a
waveform is obtained with fast Fourier
transform using PSPICE. In fig 7(a) FFT of
input signal of frequency 100Hz is shown
along with rectified output .whereas 7(b)
shows FFT of the input of frequency 100
kHz is shown.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0402-4
Fig7(a)
Fig 7(b)
Fig. 7(a) shows frequency spectrum of
rectified output and 100Hz input Frequency
spectrum of rectified output and input at a
frequency of 100 kHz is shown in Fig. 7(b).
C. Ripple Factor:
Ripple factor of output has been computed.
Ripple factor is given by
r = =
Where,
r = ripple factor,
Vrms = rms value of AC component of
output,VDC = DC component present in
output
In fig 8(a) ripple factor is shown for input of
100 Hz and it is clearly seen that max value
of ripple factor is 0.316 while its average
value is 0.03. Ripple factor for an input of
frequency of 100 kHz is shown in fig 8(b)
having an average value of 0.035.
Fig8 (a)
Fig8 (b)
Previously reported circuit gives a ripple
factor of 0.483[12].
V. CONCLUSION
In this paper, a precision full wave rectifier
is implemented using Operational
transresistance amplifier (OTRA).The
circuit provides an output voltage amplitude
being almost equal to input voltage. The
circuit works well in frequency range of
50Hz – 30MHz.The excellent performance
of circuit is obtained by using OTRA that
makes it work in much higher frequency
range than previously reported circuit.
.
REFERENCES:
[1] “Analog IC design : The current mode
approach” C.Toumazou,F.J.Lidgey,Peter
Peregrinus Ltd. 1990
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0402-5
[2] Salama Khaled N., Soliman Ahmed M.,
CMOS operational transresistance amplifier
for analog signal processing,
MicroelectronicsJournal,Vol.30,No.9,pp.235
-245, March 1999.
[3] U. Cam, “A Novel Single-Resistance-
Controlled Sinusoidal Oscillator Employing
Single Operational Transresistance
Amplifier”, Analog Integrated Circuits and
Signal Processing,Vol. 32, pp. 183-186,
August 2002.
[4]Rajeshwari Pandey, Mayank Bothra,
“Multiphase Sinusoidal Oscillators Using
Operational Trans-Resistance Amplifier”,
IEEE Symposium on Industrial Electronics
and Applications (ISIEA 2009),pp 371-376
October 4-6, 2009.
[5] U.Cam, F.Kacar, CommunicationO.
Cicekoglu, h. Kuntman and A.Kuntman,
“Novel grounded parallel immittance
simulator topologies employing single
OTRA,” AEU- Int. J Electronics and
Communications,vol. 57, no.4, pp. 287-
290,2003.
[6] U.Cam, F.Kacar, CommunicationO.
Cicekoglu, h. Kuntman and A.Kuntman,
“Novel two OTRA-based grounded
Immittance simulator topologies,” Analog
Integrated circuit and Signal Processing
,Vol. 29,pp. 233-235,2001.Analog
Integrated circuit and Signal Processing
,Vol. 39,pp. 169-175,2004.
.
[7] S. J. G. Gift and B. Maundy, “Versatile
precision full-wave rectifiers for
instrumentation and measurement,” IEEE
Trans. Instrum. Meas., vol. 56, no. 5, pp.
1703–1710, Oct. 2007.
[8] S. R. Djukic, “Full-wave current
conveyor precision rectifier,” Serbian J.
Elect. Eng., vol. 5, no. 2, pp. 263–271, Nov.
2008.
[9] P. Gray, P. J. Hurst, S. H. Lewis, and R.
G. Meyer, Analysis and Design of Analog
Integrated Circuits. New York: Wiley, 2001.
[10] S. J. G. Gift, “A high-performance full-
wave rectifier circuit,” Int. J. Electron., vol.
87, no. 8, pp. 925–930, Aug. 2000.
[11] Hasan Mustafa, Ahmed M.Soliman,”A
Modified realization of the
OTRA”,frequenz60(2006) pp70-76.
[12] R. A. Gayakwad, Op-Amps and Linear
Integrated Circuits., 3rd ed. New Delhi,
India: Prentice-Hall, 2007, pp. 316–318.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0403-1
GaN-based HEMTs for Communication Circuits
T R Lenka1 and A K Panda
2
National Institute of Science and Technology
Palur Hills, Berhampur, Odisha, Pin-761008
E-mail: [email protected] and [email protected]
2
Abstract:
In this paper the role of GaN-based high electron mobility
transistors (HEMTs) in microwave communication
circuits have been discussed. Due to superior material
properties, GaN-based devices produce a record
maximum frequency of oscillation of around 300GHz and
high cutoff frequency. It has become one of the prime
candidates for solid-state power amplifiers at frequencies
upto 50GHz. The unique properties of GaN include a peak
saturation velocity of 2.5x107 cm/s, high breakdown
electric field of 3.3MV/cm, output power densities in
excess of 10W/mm at 40GHz and more than 2W/mm at
80.5GHz. Recent wide-spread R&D to advance the
HEMT technology has led to high-speed low-power LSI
circuits and ultra-low noise amplifiers. In this paper the
microwave characteristics of HEMT which includes
available gain (GA), maximum available gain
(MAG/GMax), unilateral gain (GU), Maximum Stable Gain
(MSG), Noise Figure (NF) and Minimum noise figure
(NFmin) are discussed. The potential usability of HEMT as
an amplifier and Oscillator are also discussed.
Key Words: GaN, HEMT, Microwave, MMIC, Gain
1. INTRODUCTION
GaN-based semiconductor devices are
currently the main focus of great interest in
academia as well as industry because of its very
interesting material properties. [1] These
semiconductor alloys have a wide bandgap
(>3.4eV), high temperature sustainability and
high electric breakdown fields, which allow
them to be used for the fabrication of short-
wavelength (blue, UV) optical devices, high-
frequency and high power electronics [2].
Due to conduction band discontinuity, two
dimensional electron gas (2DEG) channel is
created at the heterointerface between two
undoped materials by piezoelectric and
spontaneous polarizations [3]. The 2DEG is the
heart of the HEMTs. The modeling of GaN-
based HEMTs still presents many challenges to
the worldwide research community. Due to lack
of scattering effects, the mobility of the
electrons is very high in the 2DEG, which leads
the device towards microwave applications [4].
Advanced HEMT Monolithic Millimeter-wave
Integrated Circuits (MMIC) for Millimeter and
Sub-millimeter-Wave power sources and power
amplifiers for applications to heterodyne
receivers, transmitters, and communication
circuits are highly popular and dominated by
GaN based devices [5]. In discrete device
applications, low-noise HEMTs are
commercially available and are in use in
broadcast satellite and radio telescope systems.
This paper reviews the state-of-the-art HEMT
technology for communication systems.
The commonly used HEMT structure is
discussed in section 2. The microwave
characteristics of GaN-based HEMTs are
discussed in section 3 and finally the conclusion
is drawn in section 4.
2. HEMT STRUCTURE
The AlGaN/GaN heterostructure is generally
grown on sapphire/SiC substrate by Molecular
beam epitaxy (MBE) or metal organic vapor
phase epitaxy (MOVPE) process [6]. For
Schottky ohmic contacts Ti/Al/Ni/Au is mostly
used. The TCAD simulated structure of this
device is shown in figure 1. Schrödinger’s wave
equation and Poisson equation are solved self
consistently to give rise to a two dimensional
electron gas (2DEG) which is created at the
heterointerface of AlGaN/GaN due to the
growth of wideband material over narrow
bandgap material and it is the heart of any
heterostructure device [7]-[8]. The electron
concentration at the 2DEG is dependent upon
the conduction band discontinuity. However in
order to reduce the scattering in the 2DEG
formed at the heterointerface, a binary nanoscale
AlN layer is epitaxially grown at the
heterointerface of AlGaN/GaN heterostructure
[7]-[8].
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0403-2
Fig. 1 Simulated Structure of AlGaN/GaN-based
HEMT
The two dimensional electron gas (2DEG)
created at the heterointerface of AlGaN/GaN
with a mole fraction of 0.3 is shown in figure 2.
Fig. 2 Formation of 2DEG at the heterointerface
3. MICROWAVE CHARACTERISTICS
3.1 HEMT as an Amplifier
Fig. 3 Small Signal Model of HEMT
Two port network analyses have been done by
microwave office to understand the microwave
characteristics of HEMT [9]. When embarking
on any amplifier design it is very important to
understand the stability of the device chosen,
otherwise the amplifier may well turn into an
oscillator. The microwave parameters include
available gain (GA), maximum available gain
(MAG), unilateral gain (GU), Maximum Stable
Gain (MSG), Noise Figure (NF) and Minimum
noise figure (NFmin) etc [9]. The small signal
model of HEMT is shown in figure 3.
The main way of determining the stability of a
device is to calculate the Rollett’s stability
factor (K), which is calculated using a set of S-
parameters for the device at the frequency of
operation. We can calculate two Stability
parameters K & |Δ| to give us an indication to
whether a device is likely to oscillate or not or
whether it is conditionally/unconditionally
stable [9].
1
12
1
21122211
2112
22
22
2
11
SSSSwhere
SS
SSK
(1)
The parameters must satisfy K > 1 and |Δ| < 1
for a transistor to be unconditionally stable.
Once the K factor is calculated and we find that
the device is unconditionally stable then we can
calculate the Maximum available gain (MAG).
12
12
21KK
S
SGMAG Max (2)
when K is on the limit of unity the above
equation reduces down to
12
21
S
SMSG (3)
In this case the MAG is known as the maximum
stable gain MSG and is shown in figure 4.
Fig. 4 MAG/GMax and MSG of HEMT
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0403-3
As frequency increases from 1 to 50 GHz the
maximum available gain (GMax) and maximum
stable gains (MSG) decreases and both are
coinciding together. It means K=1 and the
device is unconditionally stable. The various
gains at different frequencies are mentioned in
figure 4.
Fig. 5 Available Gain (GA) and Unilateral Gain (GU)
of HEMT
Mason’s unilateral gain (MUG/GU) and the
available gain are plotted in figure 5, in a
frequency range of 50GHz. It is seen from
figure 5 that the available gain reaches to a peak
of 29.7dB at 1GHz and the unilateral gain (GU)
varies from 56dB at 1GHz to 24dB at 50GHz.
Fig. 6 S21 and S12 with respect to frequency
The two-port network is connected to load
impedance ZL, source impedance ZS, and
characterized by a scattering matrix [S]. The S
parameters such as S21 and S12 are the forward
voltage gain and reverse voltage gain
respectively. and are shown in figure 6. As per
the values of the lumped elements of the small
signal model the forward gain of the device is
measured to be 22.94dB at 1GHz, and then it
decreases with the frequency whereas the
reverse gain is in negative values. By taking
suitable values of the lumped elements of the
small signal circuit, the forward gain can be
increased to the desired value.
Fig. 7 Noise Figure (NF) and NFMin of HEMT
The microwave noise figure (NF) and NFMin
are shown in figure 7. It is seen from this figure
that the NF increases with the frequency of
operation and it is minimum upto 5.5GHz
whereas the NFMin is very negligibly small with
the span of frequency from 1 to 50GHz. The NF
can be optimized to the required value by tuning
the values of the lumped elements of the small
signal circuit.
3.2 HEMT as an Oscillator
In spite of the great progress in performance
achieved during the last few years, there are still
several important issues that need to be
overcome to further increase the performance of
GaN HEMTs at millimeter frequencies (30-
300GHz). One of the key challenges to achieve
high-gain millimeter-wave power amplification
is to increase the maximum power-gain cutoff
frequency (fmax) and it is the maximum
frequency at which the transistor still provides a
power gain and can be expressed as [6]-[11]
Tgdgdsgsi
T
fCRRRRR
ff
2/2max (4)
Where Tf is the current-gain cutoff frequency
and gdC is the gate-drain (depletion region)
capacitance, while dsgsi RandRRR ,,, represent the
gate-charging, source, gate and output resistance,
respectively. To maximize maxf , each parameter
needs to be carefully optimized. In FETs the
short-channel effects play an important role in
the high frequency characteristics [6]. So gate-
recess technology can suppress the short-
channel effects and it leads to the improvement
of high frequency characteristics.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0403-4
The general figure of merit given in equation 5,
for comparing microwave circuits is the cut-off
frequency (Fc) and is defined by the on
resistance (Ron) and off–state capacitance (Coff)
of the device [10]-[11].
offonC
CRF
2
1 (5)
The on resistance of the HEMT is governed by
the total source-drain resistances at microwave
frequencies for voltages higher than threshold.
Below threshold voltage the 2DEG is
suppressed under the gate and the resistance
increases dramatically.
The general channel resistance DSR is composed
of several resistance components and may be
written as
dgsggDS RRRR (6)
where gR is the interface (or channel) resistance
under the gate, sgR and dgR are the source-gate
and drain-gate channel resistances respectively.
The contribution of sgR and dgR to the total on-
state resistance Ron, depends on the gate-drain
and gate-source electrode spacing. This spacing
governs the high breakdown voltage with wider
spacing yielding higher break-down voltages.
The resistances making up DSR are governed by
the 2DEG that is induced at the heterointerface.
Below the threshold voltage, the 2DEG carrier
density goes to zero and DSR approaches
maximum maxR due to carriers in the GaN
material.
Since the 2DEG governs the resistance in the
conductive channel, the resistance of each
element may be estimated as [10]
W
LRR i
si (7)
where sR is the sheet resistance of interface
channel, W is the gate width of HEMT and Li is
the approximate geometrical length. The value
of the sheet resistance is dependent on the
density of the 2DEG and the mobility of the
carrier in the channel and can be written as [11]
nss
qnR
1 (8)
where q is the single charge, n is the low-field
mobility of the 2DEG and sn is the 2DEG
density.
In estimating the resistance directly under the
gate gR , the 2DEG is assumed to be under the
influence of the gate voltage, making sn a
function of the gate voltage gV [11]. The
resistance elements sgR and dgR are assumed to
not to be controlled by the applied gate voltage
and thus sn is not a function of gV in the source-
gate and drain-gate regions.
The capacitance model includes both voltage-
dependent and parasitic capacitances. The
voltage-dependent capacitances used in
modeling the GaN HEMT are the source-gate
and drain-gate capacitances gC and the
capacitances between the gate and inner side of
the source and drain electrodes, igC . The total
capacitance DSC can be written as [11]
pariggDS CCCC (9)
where par
C is the total parasitic capacitance.
4. CONCLUSION
The small signal model of HEMT is designed
for two-port network analysis using microwave
office and its corresponding GaN-based HEMT
is simulated using TCAD tool. Various
microwave parameters such as MSG, MUG,
MAG, NF and NFMin are discussed. The
Amplifier and Oscillator behavior of HEMT is
also discussed.
ACKNOWLEDGEMENT
The authors acknowledge the DST-FIST and
DST-SERC fund received by National Institute
of Science and Technology from Department of
Science & Technology (DST), Government of
India.
REFERENCES
1. David F. Brown et al: N-Polar InAlN/AlN/GaN MIS-
HEMTs, IEEE Electron Device Letters, Vol. 31,
No.8, Aug, 2010
2. T R Lenka and A. K. Panda, “Role of Nanoscale AlN
and InN for the Microwave Characteristics of AlGaN/
(Al, In) N/GaN - based HEMT,” Accepted for
publication in “Fizika i Tehnika Poluprovodnikov”/
Semiconductors (Springer) (2011).
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0403-5
3. Haifeng Sun et al: 205-GHz (Al, In)N/GaN HEMTs, .
IEEE Electron Device Letters, Vol. 31, No.9, Sept,
2010.
4. T R Lenka and A. K. Panda, “Characteristics Study of
2DEG Transport Properties of AlGaN/GaN and
AlGaAs/GaAs-based HEMT,” “Fizika i Tehnika
Poluprovodnikov”/ Semiconductors (Springer), Vol.
45, No 5, 2011, pp.660-665.
5. Haifeng Sun et al: 102 GHz AlInN/GaN HEMTs on
Silicon With 2.5-W/mm Output Power at 10GHz,
IEEE Electron Device Letters, Vol. 30, No.8, Aug,
2009.
6. Jinwook W. Chung et al: AlGaN/GaN HEMT with
300-GHz fmax, IEEE Electron Device Letters, Vol. 31,
No.3, Aug, Mar 2010.
7. T R Lenka and A. K. Panda, “Self-consistent
Subband Calculations of AlxGa1-xN/(AlN)/GaN-based
High Electron Mobility Transistor,” Advanced
Materials Research, Vol. 159, pp 342-347, 2011.
8. T R Lenka and A. K. Panda, “Effect of Nanoscale
AlN layer for improving 2DEG Transport properties
in AlGaN/AlN/GaN-based HEMT,” International
Journal of Pure and Applied Physics (IJPAP), Vol. 6,
No.4, pp.419-427, 2010.
9. Microwave Office Manuals.
10. Kelson D. Chabak et al: Strained AlInN/GaN HEMTs
on SiC with 2.1-A/mm Output Current and 104GHz
Cutoff Frequency, IEEE Electron Device Letters,
Vol. 31, No.6, June, 2010.
11. Nikolai V. Drozdovski et al: GaN-Based High
Electron-Mobility Transistors for Microwave and RF
Control Applications, IEEE Trans on Microwave
Theory and Techniques, Vol. 50, No.1, Jan, 2002.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0405-1
Abstract—In this paper “Digital Transceiver using
Advance Ternary Technique” gives the details
about digital transmitter and receiver with the
design of a ternary line coding. In this scheme
computer data (byte) will be converted into base-3
data elements. Current applications of line codes are
enormous in data transmission networks and in
recording and storage of information systems. The
applications include local and wide area networks
both wireless and wire connected. A coding
technique named advanced ternary line code can be
derived from three popular line codes NRZ-L, NRZ
and polar RZ. In this scheme six signal patterns are
required for eight binary data patterns.
I INTRODUCTION
This scheme focused on the electric signal and data
processing. Implementation of this scheme will
improve the means for encoding a binary data word
as ternary code word. At the decoding time ternary
codeword to recapture the binary data word. The
main advantage of this scheme is to maintain the
DC balance at the time of ternary data word
transmission. And other advantage of this scheme is
that ternary coding carries more data per bit than
binary data. Six binary bits can represent the 64
different values (0-63) whereas six ternary bits can
represent 365 different values from 000000-
111111).
Line Coding is the process of converting digital data
to digital signals. We assume that data, in the form
of text, numbers, graphical images, audio, or video
are stored in computer memory as sequences of bits.
Line coding converts a sequence of bits to a digital
signal. At the sender, digital data are encoded into a
digital signal; at the receiver, the digital data are
recreated by decoding the digital signal [1]
Fig. 1: Digital data to digital signal encoding
Line codes data transmission categorized into three
ways. The first type is still in binary in nature. The
second type of line codes are ternary codes which
operate on three signal levels (+, 0, and -). The third
type of line codes are called as multilevel codes
which has more than three output levels. The
encoder and decoder circuits can be able to simulate
and implement by using simple combinational logic
circuits..
Ternary logic in digital communication for high
speed and performance
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0405-2
Figure 2: Unipolar NRZ, Polar NRZ,Unipolar RZ
II TERNARY REPRESENTATION
(a) DECIMAL TO TERNARY CONVERSION
S.NO. Decimal Ternary
1. 0 0 0 0 0
2. 1 0 0 0 1
3. 2 0 0 1 -1
4. 3 0 0 1 0
5. 4 0 0 1 1
6. 5 0 1 -1 -1
7. 6 0 1 -1 0
8. 7 0 1 -1 1
9. 8 0 1 0 -1
10. 9 0 1 0 0
11. 10 0 1 0 1
12. 11 0 1 1 -1
13. 12 0 1 1 0
14. 13 0 1 1 1
15. 14 1 -1 -1 -1
16. 15 1 -1 -1 0
17. 16 1 -1 -1 1
18. 17 1 -1 0 -1
19. 18 1 -1 0 0
20. 19 1 -1 0 1
21. 20 1 -1 1 -1
22. 21 1 -1 1 0
23. 22 1 -1 1 1
24. 23 1 0 -1 -1
25. 24 1 0 -1 0
26. 25 1 0 -1 1
27. 26 1 0 0 -1
28. 27 1 0 0 0
29. 28 1 0 0 1
30. 29 1 0 1 -1
31. 30 1 0 1 0
32. 31 1 0 1 1
33. 32 1 1 -1 -1
34. 33 1 1 -1 0
35. 34 1 1 -1 1
36. 35 1 1 0 -1
37. 36 1 1 0 0
38. 37 1 1 0 1
39. 38 1 1 1 -1
40. 39 1 1 1 0
41. 40 1 1 1 1
Table1: Decimal -Ternary
(b) DECIMAL TO TERNARY CONVERSION
The decimal (base 10) numeral system has ten
possible values (0, 1, 2,3,4,5,6,7,8 or 9) for each
place value. In contrast, the ternary (base 3)
numeral system has three possible values, often
represented as -1, 0 or 1, for each place-value.
Like a decimal to Binary Conversion, it takes
following steps:
Algorithm:
Step-1: Write the decimal number
Step-2 : Divide the decimal value by three (3), write
quotient and remainder
Step-3: If the remainder becomes 2 then the value
of quotient becomes increase by one and the
resultant remainder decrease by 3.
Step-4: Repeat step 2 on the quotient; keep on
repeating until the quotient becomes zero
Step-5 Write all remainder digits in the reverse
order (last remainder first) to form the final result.
Example: (25)10 =(X)3
(25)10 =(1 0 -1 1)3
(c) TERNARY TO DECIMAL CONVERSION
(1 0 -1 1)3 =(X)10
1 *33 +0*3
2 + (-1)*3
1 +1*3
0
27+0+(-3)+1 = 25
(1 0 -1 1)3 =(25)10
(d) TERNARY ADDITION
Ternary addition can be performed by the
following rules:
A B C Carry Sum
0 0 0 0
0 1 0 1
0 -1 0 -1
1 0 0 1
1 1 1 -1
1 -1 0 0
-1 0 0 -1
-1 1 0 0
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0405-3
-1 -1 -1 1
-1 -1 -1 -1 0
1 1 1 1 0
Table 2: Rules of Ternary addition
Example:
(e) TERNARY SUBSTRACTION
if negative numbers are considered, then by
changing all +1’s to -1’s and vice versa, leaving all
zeroes unchanged, gives the negative of the
corresponding number. Hence it follows that
addition and subtraction may be performed with the
same hardware in the balanced ternary system by
sign changes of the addend or subtrahend,
respectively.
(i) A-B =X
(ii) X=A + B’ where in B’ change all +1 to -1
and vice versa
Here there is no need to convert the negative
magnitude such as (-28) can be represented as
(0 0 -1 0 0 -1)
(f) TERNARY MULTIPLICATION
Ternary multiplication can be performing in
following ways similar to Binary multiplication.
Here the some basic rules are applied for
multiplication
S. No. A B A x B
1 0 0 0
2 0 1 0
3 0 -1 0
4 1 0 0
5 1 1 1
6 1 -1 -1
7 -1 0 0
8 -1 1 -1
9 -1 -1 1
Table 3: Rules for Ternary Multiplication
Example:
(i) (37)10 x (4)10= (148)10
(1 1 0 1 ) 3 * (0 0 1 1]) 3 = [X]3
1 1 0 1
X 0 0 1 1
---------------------------------------
1 1 0 1
1 1 0 1 x
0 0 0 0 x x
0 0 0 0 x x x
----------------------------------------
0 1 -1 -1 1 1 1
----------------------------------------
(0 1 -1 -1 1 1 1)3 = (148)10
(ii) (14)10 x (15)10= (210)10
(1 -1 -1 -1 ) 3 * (1 -1 -1 0]) 3 = [X]3
1 -1 -1 -1
X 1 -1 -1 0
---------------------------------
0 0 0 0
-1 1 1 1 x
-1 1 1 1 x x
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0405-4
1 -1 -1 -1 x x x
---------------------------------
0 1 0 -1 -1 1 0
---------------------------------
(0 1 0 -1 -1 1 0)3 = (210)10
III PRINCIPLES OF TERNARYDATA
PATTERNS ENCODING
The method for transmitting an 8-bit binary data as
a 6-ternary code includes encoder, decoder. For
each 8- bit binary data has a unique 6-ternay
codeword that is optimized for communication
Ternary data Patterns encoded as three signal
patterns[3]
. The three signal levels are represented as
--, 0 and +. The first 8 bit binary pattern 10100011
is converted in (163)10 and this encoded as in 6-
ternary (1 -1 0 0 0
1)3.
This data patterns is encoded in signal patterns (+ --
0 0 0 +)
Figure3: Binary Data Communication
Figure 4:Ternary Data Communication
The logic circuitry of this method is optimized to
accomplish the translation using a small number of
combinational logic gates. Implement of Ternary
communication increase the speed and performance
over the 8 bit data word communication and also
decrease the size of encoder.
IV PRINCIPLES OF TERNARY DATA
PATTERNS DECODING
The principle of decoding system is very simple and
reverse process of encoding system. Decoding
system receive the 6 ternary pattern. And decoder
circuit converts into the 8-bit binary pattern which
format is understandable by receiver.
V ADVANTAGES OF TERNARY CODES
The Concept of 6-Ternary data communication
between two devices make the system high speed
and high performance and also reduce the size of
the overall circuitry system. This concept relates
generally to electric signal and data processing[5]
.
The encoder converts the 8-bit binary data word
into a 6-ternary data word and decoder also converts
the 6-ternaty data word into a 8 bit binary data
word. Ternary data transmission can be use in a
high speed network. Ternary data transmission
maintains the DC balance in transmission. Binary
data word to ternary conversion has beneficial for
placement of data on an electromagnetic channel.
CONFERENCE ON “SIGNAL PROCESSING AND REAL TIME OPERATING SYSTEM (SPRTOS)” MARCH 26-27 2011
VLP0405-5
Ternary data communication increase the data
carrying can be used to increase the speed of data
transmission. In future, this can be increase the data
capacity in storage media [8]
.
VI CONCLUSION
In this Paper we have discussed about the Ternary
logic in digital communication for providing high
speed & performance. In this Scheme, we have
discussed about the encoding of binary data word to
ternary data word for improving the data word
transmission time & correspondingly high speed
communication as compared to the Binary logic that
is generally used in digital communication. For the
implementation of this scheme, we have discussed
some major algorithms and various conversion
methods that are very useful for understanding this
logic. In this paper, we also tried to light on the
principle of ternary data pattern encoding which is
responsible for ternary data communication.
VII REFERENCES
[1] Glass, A., Ali, B. and Bastaki, E. “Design and
modeling of H-Ternary line encoder for digital data
transmission”. International Conference on Info-
Tech & Info-Net, Beijing, China, 2001, pp 503-
507.
[2] A. Mahadevan, Digital Transceiver using H
Ternary Line Coding Technique, Proceedings of the
World Congress on Engineering 2007 Vol I
[3] A. Srivastava and K. Venkatapathy, “Design and
Implementation of a Low Power Ternary Full
Adder”1996 OPA (Overseas Publishers
Association) Amsterdam B.V. Published in The
Netherlands under license by Gordon and Breach
Science Publishers SA
Printed in Malaysia
[4] Abdullatif Glass and Bahman Ali, Nidhal
Abdulaziz, “H-Ternary Line Decoder for Digital
Data Transmission:
Circuit Design and Modelling”,
[5] Bylanski, P. and Ingram, D., “Digital
transmission systems,” Peter Peregrinus, 1976, pp.
216-246.
[6] Lathi, P., “Modern digital and analog
communication systems (3rd Ed),” Oxford
University Press, 1998, pp.294-353.
[7] Takasaki, Y., “Digital transmission design and
jitter analysis,” Artech House, 1991, pp.35-60.
[8] Sandeep Patel, Howard W. Johnson, “Methods
and apparatus for implementing a type 8B6T
Encoder and decoder “, 1996, patent no 5,525,983.
1
Abstract - This Paper introduces the working principle of space vector pulse width modulation (SVPWM), and presents a new circuit realization of SVPWM generator based on a flexible, high computation speed and cost effective field programmable gate array (FPGA) embedded technique. Controlling of the machines using the vector control techniques is becoming more popular nowadays. The need for extensive computations has no more become an objection to the vector control implementation. This is due to the wide availability of high speed digital processors. The method of decoupling the variables and controlling them independently is known as vector control. To relieve the controller from the time consuming computational task of PWM signal generation, a new method of Space Vector PWM signal generation is implemented in FPGA using Hardware Description Language VHDL. The Space Vector PWM pulses are first designed in MATLAB/SIMULNK environment and relevant coding are written to generate the pulses and then by using software conversion tool the M files are converted into VHDL coding. Thus the triggering pulses are given to the inverter circuit and hence the switching pattern generated will reduce the harmonic content and switching losses. Keywords : FPGA- Field Programmable Gate Array, SVM, Space Vector PWM, VHDL, Induction motor drive
1 Introduction The Pulse Width Modulation (PWM) Technique called “Vector Modulation”, which is based on space vector theory, is the most important development in the last few years [1]. Although, several of PWM methods have been created in the past, the vector modulation technique appears to be the best alternative. FPGA’s development reached a level of maturity that made them the good choice of
implementation in many fields [2]. FPGA based embedded implement of SVPWM can make the computing power of processor and the logical processing power of hardware circuit combined, thus the processing efficiency of CPU and the logical units utilization can be improved . Figure 1 shows a SVPWM control system based on FPGA- embedded technique – Figure 1: SVPWM control system based on FPGA-
embedded technique Recent applications of FPGA’s in industrial electronics include mobile- robot path planning and intelligent transportation [3], current control applied to power converters, real-time hardware in the loop testing for control design, Controller implementation, separating and recovering independent source signals, and neural computation. Since the concept of multilevel PWM converter was introduced, various modulation strategies have been developed and studied in detail, such as multilevel sinusoidal PWM, multilevel selective harmonic elimination and space vector modulation. Among these strategies, the space vector PWM (SVPWM) [4]stands out because it offers significant flexibility to optimize switching waveforms and is well suited for digital implementation. Complexity and computational cost of traditional SVPWM techniques increases with the number of levels of the converter, and most of all use trigonometric functions or pre-computed tables. A symmetrical space vector modulation PWM pattern is proposed
“Embedded Implementation of Space Vector PWM using FPGA”
Ashish Gupta
Assistant Professor Department of Electronics Engineering,
MPEC, Kanpur [email protected]
2
in this paper, it shows the advantage of lower THD without increasing the switching losses. Thus this paper demonstrates that a more efficient and faster solution is the use of Field Programmable Gate Array (FPGA’s), it investigates how to generate a variable PWM waveform based on Xilinx FPGA [5].The rest of the paper is organized as follows. Section II introduces the principle of symmetrical space vector PWM method. Section III shows details on FPGA. Section IV shows the m-file coding/Simulink blocks required to generate Space Vector Pulses. Section V explains the experimental results and Section VI is the conclusion
2. Principle of Space Vector PWM In vector coordinates, the combinations of three-phase inverter output voltages form eight space vectors shown in Figure. 2 There are six nonzero space vectors forming an origin centered hexagon, and two zero space vectors (V0-V7) located at the origin. The hexagon is the maximum boundary of the space vector, and the circle is the maximum trajectory of the regular sinusoidal outputs in linear modulation. This figure also explains the PWM output patterns in the six regions (denoted as sector I–VI) separately. In accordance with three-phase to two-phase transformation, the three-phase inputs (Va, Vb, Vc) are transformed into (Vα, Vβ) as the reference vector. Figure 2: Basic Eight Switching Vector and Vector
Representing of Sector 1. As shown in Figure. 3, there are eight possible combinations of on and off patterns for the three upper power switches. The on and off states of the lower power devices are opposite to the upper one and so are easily determined once the states of the upper power transistors are determined. According
to above equations, the eight switching vectors, output line to neutral voltage (phase voltage), and output line-to-line voltages in terms of DC-link Vdc, are given in Table.1 shows the eight inverter voltage vectors (V0 to V7)
Figure 3: Circuit model of PWM inverter with center-taped grounded DC bus.
Table-1 Details of different phase and line
voltages for the eight states.
3. Field Programmable Gate Array
A Field-Programmable Gate Array or FPGA is a silicon chip containing an array of configurable logic blocks (CLBs). Unlike an Application Specific Integrated Circuit (ASIC) which can perform a single specific function for the lifetime of the chip an FPGA can be reprogrammed to perform different function in a matter of microseconds. The design used Xilinx development tools, and is realized in a single FPGA chip with no external memory. The benefits of this design are as follows The whole system is implemented in only a
single chip consequently the circuit is very compact.
Systems of FPGA chip are more reliable because they do not need any control software
Voltage Vectors
Switching Vectors
Line to Neutral Voltage
Line to line voltage
a b c Van Vbn Vcn Vab Vbc Vca V0 0 0 0 0 0 0 0 0 0 V1 1 0 0 2/3 -1/3 -1/3 1 0 -1 V2 1 1 0 1/3 1/3 -2/3 0 1 -1 V3 0 1 0 -1/3 2/3 -1/3 -1 1 0 V4 0 1 1 -2/3 1/3 1/3 -1 0 1 V5 0 0 1 -1/3 -1/3 2/3 0 -1 1 V6 1 0 1 1/3 -2/3 1/3 1 -1 0 V7 1 1 1 0 0 0 0 0 0
3
Faster design and verification time, design change without penalty.
In this paper programming FPGA using Hardware Description Languages and coding are used to generate the Space Vector Modulation for the inverter circuit. The point to be remember here is that instead of writing the direct VHDL coding firstly the M-File coding is written to generate the SVPWM pulses and then after by using he software converter VHDL coding is generated. Hence the work requires less time and fast operation. The MATLAB/SIMULNK environment is familiar to large number of software programmers and since m-file coding is very much common to most of the programmers it becomes easier to work in this software. A very attractive high-level design/ simulation tool is provided by FPGA and is called XILINX. It is a very flexible design tool, which allows Testing of a high-level structural description of the design and makes possible quick changes and corrections. The circuit description structure is very similar to the way the design could be implemented later. Therefore mapping tool allowing conversion of such a structure into VHDL code would save the designer’s time, which otherwise has to be spent in rewriting the same structure in VHDL and probably making mistakes that will need debugging.
4. Simulation Steps:
(1) Initialize system parameters in MATLAB/ SIMULNK .
(2) Perform M-File coding to (i) Determine sector. (ii) Determine time duration T1, T2, T0. (iii) Determine the switching time (Ta,Tb
and Tc) of each transistor (S1 to S6). (iv) Generate the inverter output voltages
(VAB, VBC, VCA). (v) Generate VHDL Codings through
software convertion tool. (vi) Burn the program in the FPGA kit.
(3) View the SVPWM waveform by XILINX.
4.1 Simulink Model to generate Space Vector PWM
Figure 4.1: Simulink Model for Overall System
Figure 4.2: Subsystem Simulink Model for
“Space Vector PWM Generator”
4
Figure 4.3: Subsystem Simulink Model for “Making Switching Time”
5. Results and Discussions
The control scheme is simple in architecture and thus facilitates the realization of the developed SVPWM controller using FPGA based circuit design approach. The designed SVPWM control IC has been realized using single FPGA.The simulation results of internal module and the final output of Space Vector PWM switching pattern has been achieved with a fundamental frequency of 50 Hz. Such a wide frequency control with very high frequency-switching is only possible by utilizing the state-of-art VLSI digital circuit design approach. From the result the switching pattern generated will reduce the harmonic content and switching losses. A comparisons between spwm and svpwm by varying modulation index is shown in the below mentioned table 2 and which evidently shows the greater advantage of controlling the drive by SVPWM technique. Figure 5 shows the Locus comparison of maximum linear control voltage in Sine PWM and SVPWM. Figure 6, 7 and 8 represents the axis converter, Delay time, Output of each inverter respectively. Figure 9, 10 shows the simulation results of Van, Vab, Vac and Simulation results of pulse patterns
Table 2: Comparisons between SPWM and SVPWM by varying modulation index.
Figure 5: Locus comparison of maximum linear control voltage in Sine PWM and SVPWM.
Fig 6: Three to Two axis converter. (Va, Vb, Vc) are transformed into (Vα, Vβ)
Tech- nique SPWM SVPWM
M. I. (M)
Output line
voltage (peak V)
THD (%)
Output line
voltage (peak V)
THD (%)
0.4 180.80 162.11 192.70 154.07 0.5 266.50 123.35 312.20 108.78 0.6 289.40 117.12 318.10 105.69 0.7 369.20 94.52 436.60 81.19 0.8 396.10 89.73 442.90 78.56 0.9 472.90 70.69 552.30 53.62 1.0 502.40 64.83 567.90 49.15
Parameter used : Fundamental frequency :50 Hz, Switching frequency:10 KHz , DC Voltage : 600 volts
5
Fig 7: Delay time
Fig 8: Output of each inverter
Fig 9: Simulation results of Van, Vab and Vac
Fig 10: Simulation results of pulse patterns
6. Conclusion In this paper, a theoretical study concerning the SVPWM control strategy on the voltage inverter based on FPGA is presented. This aims on one hand to prove the effectiveness of the SVPWM in the contribution in the switching power losses reduction. SVPWM is among the best solution to achieve good voltage transfer and reduced harmonic distortion in the output of an inverter. On the other hand since Field programmable gate array (FPGA) have better advantages compared to microprocessor and DSP control, this modulation technique is implemented in an FPGA by initially generating m-file through Matlab-Simulink environment. The FPGA coding makes it easier in designing the vector modulation pattern generator using field programmable Array. Moreover the MATLAB/ SIMULNK environment is familiar to large number of software programmers and since m-file coding is very much common to most of the programmers it becomes easier for individuals to work in this software. The switching pattern generated will reduce the harmonic content, provides efficient as well as flexible control and reduces the total size of the system. This SVPWM IC can be used for high performance ac drives and power conditioning equipment as a modulator.
References [1] Ying-yu Tzou; Hau-Jean Hsu; Tien-Sung Kuo. Industrial Electronics, Control, and Instrumentation, 1996., Proceedings of the 1996 IEEE IECON 22nd International Conference. “FPGA based SVPWM control IC for 3-phase PWM inverters”. Volume 1, Issue, 5-10 Aug 1996 Pages(s):138-143. [2] J.J. Rodriguez-Andina, M.J. Moure, and M.D. Valdes, “Features, design tools, and application domains of FPGAs”, IEEE Trans. Ind. Electron., vol.54, no.4, pp.1810 – 1823, Aug. 2007. [3] K. Sridharan and T. Priya, “The design of a hardware accelerator for realtime complete visibility graph construction and efficient FPGA implementation,” IEEE Trans. Ind. Electron., vol.52, no.4, pp. 1185 – 1187, Aug. 2005.
6
[4] L. Franquelo, M. Prats, R. Portillo, J. Galvan, M. Perales, J. Carrasco, E. Diez, and j. Jimenez, “Three-dimensional space-vector modulation algorithm for four-leg multilevel converters using abc coordinates”, IEEE Trans. Ind. Electron., vol. 53, no.2, pp. 459-466, Apr. 2006. [5]Xilinx Inc.,”Foundation Series ISE 3.11 User Guide’”2000.