+ All Categories
Home > Documents > A 90-Nm CMOS Embedded Low Power SRAM Compiler

A 90-Nm CMOS Embedded Low Power SRAM Compiler

Date post: 09-Feb-2018
Category:
Upload: mary-bakhoum
View: 222 times
Download: 0 times
Share this document with a friend

of 4

Transcript
  • 7/22/2019 A 90-Nm CMOS Embedded Low Power SRAM Compiler

    1/4

    A 90-nm CMOS EmbeddedLow Power SRAM CompilerZhao-Yong Zhang, Chia-Cheng Chen and Jian-Bin Zheng

    bstract- In this paper a highly flexible low power singleport Static Random Access Memory SRAM compiler designis presented. The Divided Word Line DWL and Divided BitLine DBL scheme were implemented for reducing activepower. Particular emphasis was put to decrease standbypower consumption in word line driver. The forced-stackdevices as pulse generation element was introduced fo rsensing enable. This guarantees SRAM can work in lowvoltage without losing design margin. A test-chip with 17embedded SRAMs has been fabricated in UMC 90-nm lowleakage CMOS logicprocess

    We use the self-timing and replica techniques in the SRAMcircuits design for different memory density, which will givethe SRAM compiler low power and high speed results with itsadvantageous characters.The organization of this paper is as follows. In Section II, abrief overview of the architecture and the replica self-timingtechnique are discussed. Section III discusses the design ofsome SRAM circuits including word line driver, sensingenable pulse generation. Section IV presents the experimentalresults on the performance of the test-chip. Section V givesconclusion of the paper.II. RCHITECTURE

    The SRAM is synchronous single port memory. Array coreuses a 6T high threshold voltage SRAM cell with 9991lm2area, which owned by DMC for 90-nm low leakage CMOSlogic process. Different combinations of words, bits, andaspect ratios BM (Block Multiplexing) can be used togenerate the most desirable configuration. Table I shows theconfiguration information of the SRAM compiler.TABLE I

    SINGLE PORT SR M COMPILER CONFIGURATION

    The SRAM can be organized as containing one to eightbanks (1 to 32 sub-arrays) in a memory by utilizing DWL andDBL techniques. Each memory array can have maximum of256 rows (word lines) and 256 columns (bit lines). Fig. 1presents an architecture diagram by using DWL=2 andDBL=2 as example. The control circuit and block selector areused to select one of two blocks (BSI , BS2). Each blockmemory array is divided into left and right to decrease accesstime [9] as well as data line (SAODT) tiling wire. The DBLscheme [5], [8] is embedded in the SRAM compiler, whichcan divide the bit line into one to four partitions by followingthe rule listed in table I. In Fig. 1 the X address highest bit isused for generating bit line partition signals (BPI , BP2),which makes bit line be divided into two partitions. So theSRAM array finally is spit into eight sub-arrays by usingDWL and DBL techniques. The local bit line multiplexingcircuit (column Mux.) is used to connect local bit line toglobal bit line, then to the input of sense amplifier.

    Index Terms - Low power, SRAM compiler, divided wordline, divided bit line, forced-stack device, part power-gating,replica technique, self-timing.I. INTRODUCTION

    With the scaling of CMOS transistor, a larger fraction ofchip area is devoted to the embedded SRAM modules.Simultaneously, the need of lighter portable electronicapplications with extended battery life has made low powermemory circuit design become more and more necessary andimportant. SRAM compiler product as a highly flexiblememory generation system can meet many increased demandsof SOC designer for compact, fully diffused embeddedmemories [2], [6], [9].There are numerous techniques to reduce the SRAM powerdissipation [1], [3]-[11]. The DWL [1] and DBL [5], [8]techniques were employed by our SRAM compiler to reducethe active power. In this paper a new word line driver circuitwith part power-gating scheme [10] is presented, which cangreatly decrease the standby current. Not only the PVT(Process, Voltage, and Temperature) conditions will affect theperformance of SRAM generated by compiler, but also theparameters of configuration or density will affect the finalperformance. So a circuit designer of SRAM compiler mustbuild up a reasonable architecture to deal with those variations.We use memory cells as replica circuits [4] to minimize theeffect of operating and configuration conditions variability onthe speed and power.A latch-type sense amplifier is used in our SRAM compiler,which can give a good result in terms of speed and powerbecause it is able to amplify a very low bit line swing voltage.

    Zhao-Yong Zhang is with the Memory Design Department, AiceStarTechnology Corporation, Suzhou, China (e-mail: [email protected]).Chia-Cheng Chen is with the Module Intellectual Property Development,Faraday Technology Corporation, Hsin-chu City, Taiwan (e-mail:[email protected]).Jian-Bin Zheng is with the Memory Design Department, AiceStarTechnology Corporation, Suzhou, China (e-mail: [email protected]).

    978-1-4244-3870-9/09/ 25.00 2009 IEEE

    ParameterWordsBitsBytesBit line partitions (DBL)

    Aspect ratios (DWL)

    Ranges64b to 32Kb, increment=BMx161b to 128b, increment=l128b to 1b, decrement=l1 when 16::;WL ::;2562 when 256 < WL ::; 5123 when 512 < WL ::;7684 when 768 < WL ::;1024Block Mux. (BM) 1,2 , 4, 8

    625

  • 7/22/2019 A 90-Nm CMOS Embedded Low Power SRAM Compiler

    2/4

    The aspect ratio for each configuration can be controlled byselecting one of four different block multiplexing (BM)schemes (namely DWL): 1:1, 2:1, 4:1, and 8:1 ratios. Thelocal bit line multiplexing all are 4:1 in four BM options.

    The dual-threshold voltage technique [11] is used to tradeof f the high speed and low power operation in the SRAMcompiler. Memory cells use high threshold voltage andperipheral circuits use regular threshold voltage.

    RL Replica word l ine RL Replica wordl ineepl icaword l ine RL Replica word l ine RL Replica Global WLDRV CIRCUIT DESIGN

    The design of an SRAM consists of three major blocks, thedesign of the memory cell, the decoder circuits and the senseamplifiers. In the following sections, we will have a briefoverview of row decoders and sensing enable circuit. Becausethe memory cell is owned by UMC, here we will ignore thedesign.A. Row Decoders

    Row decoders are used to assert the word lines based on theinput addresses. The decoder structure mainly consists of aninitial pre-decoder circuit and a word line decoder circuit. Fig.3 shows an 8 to 256 decoder structure designed for globalword line decoder.

    PAOA7C7 pco PB3 PBOLV Column Muxs(4:1)Column Muxs(4:1)V BP Global Column Decoder LV

    Global WLDRVQ Global WLDRV =---------------- 5

    C o

    8 ~e ;jColumn Muxs

    (4:1)ColumnMuxs(4:1) LV

    DGBL DGBLB)

    Block Se l ec t BS)

    GWL7

    GWLO

    I: :: 3 t 0 8 :i p r e d e c o d e r ~ I C L KI ,

    ~ r T T JAX2 AX AXO

    WLDRV8 II ~ G W L 5 5

    I.. L __ :III..- ; I r Ii 3 to 8 ; 2 to 4 iipredecoder i :predecode:I I I I~ r T r J L _ T - - r - - J

    AX7 AX6 AX5 AX4 AX3Fig. 3. Block diagram of global word line decoder with part power-gating.

    In the pre-decoder stage, the address inputs X - AX2and internal clock (ICLK) are combined using a 3 to 8 CMOSstatic pre-decoding circuit. The ICLK is a self-reset pulsesignal (refer to Fig. 2), which can make the asserted word linealso be a pulse. Other address inputs AX3 - AX? are used togenerate pre-decoding signals PB and PC. The two sets of predecoder outputs are then combined to give the outputs whichdrive the NAND gate (PBCO- PBC31) and part power-gatingPMOS (PBCOB- PBC31B). Before asserting a word line thePB and PC must firstly be enabled for avoiding global wordline glitch. Although this will increase the setup time ofaddress, it can decrease access time.

    Power-gating technique [10] can reduce leakage power byshutting of f the idle blocks. But power-gating technique alsoexist some negative effects including a combination of noise,performance penalty, area and power overhead, etc. In the

    t 0g 0Q Q QQi I 00=

    D SA

    ICLKGW L IDGW L

    Sensing Enable SAEN)

    Fig. 1. Memory architecture block diagram (BM=2, bit line partition=2).RL=Replica Local Word line Driver, LD=Local Column MUX driver,RCM=ReplicaColumn MUX, DCM=Dummy Column MUXFor ensuring fast and low-power operation, the internal

    timing control path uses replica technique and self-timingscheme to match the data path [4]. Fig. 2 presents the selftiming read scheme (which uses the replica structure presentedin Fig. 1) waveform. The replica path uses replica word linecells and bit line cells for tracking memory core cells. Thereplica word line cells are the same as core word line cells; thereplica bit line cells are programmed to always store a zero.Hence, its capacitance is the same as core cells includinggate/junction and wire parasitic capacitances. The replicaschemes can make the global bit line swing be around a tenthof the supply when the replica bit line cells justly dischargereplica global bit line to around half of the supply during aread cycle. The replica technique presented in Fig. 1 can varywith the memory configuration variation.

    CK

    GBL GBLB)

    LWLIDLWL

    Fig. 2. Self-timing read scheme waveform.

    626

  • 7/22/2019 A 90-Nm CMOS Embedded Low Power SRAM Compiler

    3/4

    to localWLDRV

    II

    0100_0 I

    . :

    I BlockDriver i : 1L.-.-.-.-.-.-.-' . . . .-.-.-.-.-.-.-.- '-.-. -.-.-.-.-.-.-.- ' B l ~ ~ k l - - - - - - - - - - - - _..J

    to localWLDRV

    [ ~ ~ ~ i ~ t = _ o _ = _ o _ [ [ = i l ~ ~ T [ = ~ ~ : ~ = i l l: 11 : :: : : :1 Ii :::: Ii SAEN SA /.. : .. : I: : I: :, I: :1: : : : :/: : I: i i 11: : : :: /: : I: Block Driver : : I

    -- - --'-- L_ :1----...JJ l==== : B ~ k i ~ = ~ _ ~ ~I:i 1Other BlockControl Circuits i Ii: iSIBS2

    DOFig. 4. Sensing enable pulse generating circuits and sense amplifier with d ata o u tp u t block diagram (BM=2).

    SAOUT

    Forced-stack

    Long-channel-length0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50

    Voltage (V)

    1.10= iJ

    1.05bl l=s1.00z

  • 7/22/2019 A 90-Nm CMOS Embedded Low Power SRAM Compiler

    4/4

    Fig. 7. Test-chip layout.

    ranging from 64b to 5l2kb in a variety of aspect ratios. Thefeatures of test-chip can be found in table II.

    32768X16 8192X8Configuration BM CM BM CMArea (mrrr') 0.852 0.686 0.146 0.108Area Comparison 24.20 35.18Static Current IlA 9.661 8.277 1.320 1.187DC Comparison 16.72 11.20

    Dynamic Current (mA/MHz) 0.029 0.041 0.011 0.019AC Comparison - 29.27 -42.11

    REFERENCES[1] M. Yoshimoto, K. Anami, H. Shinohara, T. Yoshihara, H. Takagi, S.Nagao, S. Kayano, and T. Nakano, A divided word-line structure in thestatic RAM and its application to a 64K full CMOS RAM, IEEEJournal Solid-State Circuits vol. SC-18, no. 5, pp. 479-485, Oct.1983.[2] 1. C. Tou, P. Gee, 1. Duh, and R. Eesley, A submicrometer CMOSembedded SRAM compiler, IEEE Journal Solid-State Circuits vol.27, no. 3 pp. 417-424, Mar. 1992.[3] 1. S. Caravella, A low voltage SRAM for embedded applications,IEEE Journal Solid-State Circuits vol. 32, no. 3, pp. 428-432, Mar.1997.[4] B. S. Amrutur and M. A. Horowitz, A replica technique for wordlineand sense control in low-power SRAM's, IEEE Journal Solid-StateCircuits vol. 33, no. 8, pp. 1208-1219, Aug. 1998.[5] A. Karandiskar and K. K. Parhi, Low power SRAM design usinghierarchical divided bit-line approach, Proceeding InternationalConference on Computer Design: VLSI in Computers and Processorspp. 82-88, Oct. 1998.[6] M. Jagasivamani and D. S. Ha, Development of a low-power SRAMcompiler, IEEE International Symposium on Circuits and SystemsISCASj vol. 4, pp. 498-501, May 2001.[7] S. Narendra, S. Borkar, V. De, D. Antoniadis, and A. Chandrakasan,Scalling of stack effect and its application for leakage reduction,Proceedings the International Symposium on Low Power Electronicsand Design pp. 195-200, Aug. 2001.[8] B. Yang and L. Kim, A low-power SRAM using hierarchical bit lineand local sense amplifiers, IEEE Journal Solid-State Circuits vol. 40,no. 6 pp. 1366-1376, Jun. 2005.[9] S. Singh, S.Azmi, N. Agrawal, P. Phani, and A. Rout, Architecture and

    design of a high performance SRAM for SOC design, DesignAutomation Conference pp. 447-451,2002.[10] H. Jiang, M. M. Sadowska, and S. R. Nassif, Benefits and costs ofpower-gating technique, Proceedings the 2005 InternationalConference on Computer Design pp. 559-566, 2005[11] 1. T. Kao and A. P. Chandrakasan, Dual-threshold voltage techniquesfor low-power digital circuits, IEEE Journal Solid-State Circuits vol.35, no. 7, pp. 1009-1018, July 2000.

    v. CONCLUSIONA highly configurable embedded low power SRAM

    compiler based on an industrial 90-nm CMOS process hasbeen demonstrated. The SRAMs compiled can greatly reducedynamic current by combining DWL and DBL techniqueswith the help of replica and self-timing scheme. Enoughmargin simulation and verification with the help of robustcircuits further guarantee the SRAMs compiled with widermargin for correct functionality and accurate characterization.The measurement results of test-chip have proved the designcorrectness and low power efficiency.

    ACKNOWLEDGMENTIt is our pleasure to thank Teddy and James for help with

    the test-chip design, W. T. and Jason for testing of the chips,Willis, Jack, Alex and Ya-Qi for helpful discussion on thecircuits design.

    UMC90-nmlP9M low leakage CMOS90-nm IP5M1.2V Jlmx JlmQFP 208

    TABLE IIFEATURESOF TEST-CHIP

    FoundryProcessSRAMmacrosSupply voltageDie sizePackageTable III gives the power measurement results at the

    operating voltage of 1.2 Y for two SRAM macros (in the tablecolumn BM) in the test-chip. The data of CM (Column Mux.)macros (with bit line partition architecture) come fromFaraday commercial SRAM compiler datasheets. The resultsshow that this design can reduce dynamic current by 29 forthe 5l2Kb SRAM and by 42 for the 64Kb SRAM. Thestatic current actually has a little increasing in this work due toadditional circuits overhead for the DWL implementation.However, the total average current dissipation is still reducedas it is dominated by dynamic current dissipation.

    TABLE IIICOMPARISON WITH OTHERWORKS

    PLL

    Silicon measurement confirmed complete functionality overvoltage (0.9 - 1.8Y) and temperature (-40 - 125C) rangeswith all memories. The SCAN, March C- and March C+patterns were utilized by memory BIST (Built-in Self-Test)embedded in the test-chip. An embedded PLL was used forSRAM timing measurements and high speed memory BIST(maximum frequency can reach 500MHz) testing.

    628


Recommended