+ All Categories
Home > Documents > Challenges at Circuits Designs for Nonvolatile Memory and ......2013/12/06  · 100Kb 10Mb MRAM...

Challenges at Circuits Designs for Nonvolatile Memory and ......2013/12/06  · 100Kb 10Mb MRAM...

Date post: 21-Jul-2020
Category:
Author: others
View: 1 times
Download: 0 times
Share this document with a friend
Embed Size (px)
of 42 /42
Challenges at Circuits Designs for Nonvolatile Memory and Logics in Dependable Systems Dec. 6, 2013 @ JST DVLSI, Tokyo, Japan Prof. Meng-Fan (Marvin) Chang Memory Design Lab. (MDL) Department of Electrical Engineering National Tsing Hua University (NTHU), Taiwan
Transcript
  • Challenges at Circuits Designs for

    Nonvolatile Memory and Logics in

    Dependable Systems

    Dec. 6, 2013 @ JST DVLSI, Tokyo, Japan

    Prof. Meng-Fan (Marvin) Chang

    Memory Design Lab. (MDL)

    Department of Electrical Engineering

    National Tsing Hua University (NTHU), Taiwan

  • # 2

    Outline

    Nonvolatile memory (NVM) and logics (nvLogics)

    in dependable systems

    Challenges at designing ReRAM

    Challenges at designing Flash

    Challenges at designing 3D NVM & nvLogics

    Summary

  • # 3

    Volatile vs. Nonvolatile Memory

    0.0 0.5 1.0 1.5 2.010

    -1

    100

    101

    102

    103

    104

    Access

    tim

    e(n

    s)

    VDDmin(V)

    SRAM

    MRAM

    ReRAM

    PCM

    Flash

    (charge-pump)

    Power-off

    Data

    storage

    Volatile:

    High

    Speed

    operation

    Low VDDmin

    NVM

    DRAM

    Volatile memory:

    Fast, low VDDmin

    High endurance

    Working memory

    Non-volatile memory (NVM):

    Slow, high write-voltage

    Limited endurance

    Power-off storage

    Two-macros (volatile+ NVM) structure in SoCs

  • # 4

    NVM in Dependable Systems

    Typical Chips:

    SRAM+ NVM + Logics

    Logics

    NVM

    SRAM

    NVM enables power-off operations

    Provides power-off storage for program and data (RAM)

    Provides states storage for selected logics (flip-flops)

    Reduce standby power

    Reduce thermal effect

    Reduce voltage/thermal stress time

    Pow

    er

    Time

    Power-off

    Data stored

    to NVM Program read

    + computing Data restored

    to SRAM

  • # 5

    Systems Using NVM - Challenges

    Today’s challenges

    Large store power + long store time

    => Limited power on/off frequency

    => Vulnerable to sudden power failure

    Slow restore (wake-up/read) time

    Lost local states/data for logics

    Time P

    ow

    er

    Power-off

    Data stored to NVM

    (slow & large power) Program read

    + computing Data restored to

    SRAM (slow)

    Idle period: Wasted Power &

    Voltage/Thermal stress

    Typical Chips:

    SRAM+ NVM + Logics

    Logics

    NVM

    SRAM

  • # 6

    Using Nonvolatile Logics (nvLogic)

    Two-Macro solution

    Complex interface

    Serial data transfer

    – Slow store/restore

    Large area penalty

    Lost local states

    Nonvolatile SRAM + Flip-flop

    SRAM + NVM within a cell

    – Direct connect (nvSRAM)

    Flip-Flop + NVM (nvFF)

    Fast power on/off

    – parallel store/restore operations

    nvSRAM

    Macro

    cell

    NVM

    cell

    SRAM

    cell

    Logic Chip

    eFlash

    SRAM

    nvLogic Chip

    nvSRAM

    Flip-Flop

    NVM

    NVM

  • # 7

    Using Emerging NVM and nvLogics

    Preferred NVM

    Low-power write

    Low write-voltage

    eliminate HV devices

    Fast read and write

    Low-voltage read

    Using nvLogics

    Fast store/restore

    Store local states

    => Enable frequent

    power interrupts

    Time Regular-voltage operation

    Po

    wer

    Time

    Power-off

    Frequent on/off + Low-VDD operation

    computing + store +

    restore

    Pow

    er

    Power-off

    Data stored to NVM

    (Slow & Large power) computing

    Data restore

    (slow)

    Low-voltage nvLogic

    Reduce V/T stress

  • # 8

    Low

    Voltage

    SRAM

    NVM &

    ReRAM ISSCC

    2011

    ISSCC

    2010

    3D

    Memory

    0.29V

    NAND-ROM

    100nA

    CSA

    OTP

    7.2ns

    4Mb

    ReRAM

    0.5V

    4Mb

    ReRAM

    ISSCC

    2011

    ISSCC

    2012 VLSI

    2013

    1Mb BJT-

    ReRAM

    (4.2ns)

    VLSI

    2013

    VLSI

    2011

    VLSI

    2010

    ReRAM+

    SRAM 3D (TSV)

    SRAM

    3D NAND

    (MXIC)

    VLSI

    2013

    VLSI

    2011

    VLSI

    2010

    VLSI

    2009

    540mV

    A2P8T

    230mV

    Z8T

    210mV

    D2AW8T 260mV

    L7T

    VLSI

    2012

    SRAM

    Char.

    (TSMC)

    ISSCC

    2014

    0.27V

    ReRAM

    ISSCC

    2014

    eNose L7T

    Recent Researches in MDL, NTHU

  • Challenges at ReRAM Designs

    Examples:

    High-Speed ReRAM

    Area-Efficient ReRAM

    Low-Voltage ReRAM

  • # 10

    Recent ReRAM Devices

    Larger write current is required for

    High uniformity, long data retention,

    Rapid write => Large-area switches

  • # 11

    Recent ReRAM Macros

    ~2010 2013 2011 2012

    1Mb BJT-ReRAM

    (0T1R, 4.2ns-Read)

    VLSI Symp.

    64Mb ReRAM

    (3D Cross-point)

    ISSCC 2010

    (Unity)

    2Mb ReRAM

    (1T1R)

    JSSC 2007

    4Mb ReRAM

    (1T1R, 7.2ns-R/W)

    ISSCC

    (ITRI+NTHU)

    4Mb ReRAM

    (1T1R)

    ISSCC

    1Mb ReRAM

    (1T1R, 0.27V-R)

    ISSCC

    8Mb ReRAM

    (Cross-point)

    ISSCC

    Embedded (1T1R)

    Mass-storage (Cross-point)

    32Gb ReRAM

    (Cross-point)

    ISSCC

    ISSCC

    2014

    4Mb ReRAM

    (1T1R, 0.5V-R)

    ISSCC

    16Gb ReRAM

    ISSCC

    http://tw.sandisk.com/

  • # 12

    ReRAM Challenges: Disturb vs. Bias Write operation

    Set: HRS (Hi-R) to LRS

    Reset: LRS (Low-R) to HRS

    Read operation

    Large VR cause read disturb

    => Requires low BL bias (VBL-R)

    SET RESET Read

    WL VG_SET VDD VDD

    BL 0 VBL-R

    SL 0 VRESET 0

    State LRS( R L ) HRS( R H ) ‘ 1 ’ / ‘ 0 ’

    0

    ) ‘ ’ ‘ 0 ’

    I ILRS IHRS ICELL

    Lee, H. Y., VLSI-TSA 2010

    VSET

  • # 13

    Wide resistance distribution Large resistance (R) and ILRS variation

    Ultra-small-R reference cells cause large/tail IREF

    ReRAM Challenges – R Variation

    0

    500

    1,000

    1,500

    2,000

    2,500

    3,000

    -10 0 10 20 30 40 50 60 70 80 90 100

    Nu

    mb

    er

    of

    Sa

    mp

    les

    LRS Cell / Ref. Current (uA)

    1-cell Reference

    2-cell Reference

    RH+RL Ref.

    LRS Cell

    HRS Cell

    2-cell Ref.

    VREAD

    RL RL

    VDD

    RH+RL Avg.

    VREAD

    RH RL

    VDD

    1-cell Ref.

    VREAD

    RL

    VDD

    RH+RL Avg.

    Read Fail

    IHRS

    Sensing Window

    IREF

    ILRS

  • # 14

    ReRAM Challenges: Bias & Speed

    Bitline bias fluctuation BL-bias cannot exceed 0.3V

    Conventional dynamic VBL generation

    Sensitive to process and Temp. variation

    Read access time Small ICELL

    MLC, low VBL

    Read vs. write speed

    Slow read speed for long BL (large capacity)

  • # 15

    A High-Speed ReRAM Device - ITRI

    BL

    WL

    SL

    MSEL

    Resistive Device

    1T-1R configuration

    MLC

  • # 16

    Example: High-Speed ReRAM

    Parallel-Series Reference-Cell (PSRC)

    Narrow reference current (IREF)

    distribution

    Process-Temperature-Aware

    Dynamic BL-bias circuit (PTADB)

    Stable BL bias to avoid read

    disturb

  • # 17

    4Mb High-Speed ReRAM:

    7.2ns random read/write

    access time

    Small reference variation

    High-speed read circuit

    Read disturb, R-variation

    0

    10

    20

    30

    40

    50

    0.1 1 10 100

    Capacity

    Acce

    ss T

    ime

    (ns)

    1Mb 100Mb

    PCM

    (ISSCC 2010)

    This Work

    MRAM

    (2008)

    MRAM

    (ISSCC 2010)

    10Mb100Kb

    MRAM

    (ISSCC 2009)

    MRAM (2007)

    STT-RAM (2010)

    CBRAM

    (2007)

    300

    SS Sheu & MF Chang, ISSCC, 2011

    Example: High-Speed ReRAM

  • # 18

    Low-VDD Read Challenges

    CM/Diode (M1)

    Headroom

    BDD-CSA CM-CSA w/o BLC CM-CSA + BLC

    Lower VDD

    BL Bias (VBL)

    VD

    D

    RRCS

    BL Clamper (BLC)

    Headroom

    Dynamic & Higher

    VBL(0.35V~ 0.2V)

    1V

    0.6V

    0.4V (high yield)

    0.75V

    RRCS + BDD-CSA

    0.3V

    0.05V

    0.25V

    Use RRCS for read

    => Removal of BL clamper

    Body-Drain-Driven CSA

    (BDD-CSA)

    => Reduced SA headroom

  • # 19

    Standby mode

    SE=0, VMAT = VREF = VDD

    BL=DBL= 0V

    Active mode (Ymux on) :

    BL-VMAT charge sharing

    causes drop in VMAT

    M1/M2 precharge BL/DBL

    (Dummy BL)

    IBL IREF

    Body-Drain-Driven (BDD): 1st-stage (M1/M2)

    M3 M4

    Example: Low-VDD Read Scheme

  • # 20

    Faster read speed at low VDD 2.9x faster than voltage-mode SA (VSA) at VDD=0.5V

    2.1x faster than conventional CSA (CM-CSA) at VDD=0.5V

    Example: Low-VDD Read Scheme

  • # 21 MF Chang, ISSCC 2012

    Example: Low-VDD ReRAM Macro

  • # 22

    Examples: High-Density ReRAM Cells

    Vertical Parasitic-BJT (VPBJT)

    Logic process, npn

    Emitter: NLDD implant

    Base: thin self-aligned P-pocket

    Collector: N-Well (SL)

    Min. 4F2

  • # 23

    VPBJT-ReRAM vs. NMOS-ReRAM

    Larger current density

    >10x than NMOS

    Enable smaller cell area +

    sufficient write current

    • Smaller macro area

    – 4~7x reduction

    – Larger capacity, greater

    reduction

    (Measured results)

  • # 24

    Thermal-Aware Bitline Bias (TABB)

    Dynamic bitline (BL) bias voltage (VBL-R)

    Track VBE across temperatures (T)

    Constant VR across T

    => Larger ICELL

    MF Chang, VLSI 2013

  • # 25

    Technology 0.18um Logic 65nm Logic

    Capacity 1Mb (8b-IO) 2Mb (16b-IO)

    Sub-blocks 256Kb x 4 1Mb X 2

    RRAM Cell HfO2 RRAM

    (NTHU+ITRI)

    TION RRAM

    (NTHU+TSMC)

    Read Power 6.3mA @

    100Mhz

    2.8mA @

    100MHz

    Read Speed 4.2ns 4.7ns

    Write Speed < 5ns

  • Challenges for Fast-Read NOR-

    Flash

    Example: Calibration-based

    CSA

  • # 27

    Point-B

    Point-A

    Cu

    rre

    nt

    Point-A Point-B

    IREFIPRE

    ICELL0-TAIL

    ICELL1-TAIL

    IOS-SUM

    ISM0 0

    Current-Mode Sense Ampliier (CSA)

    ICELLIREF+ -

    Current-Mirror (CM) Pairs

    Reference

    CellArray

    Cell

    Current

    Mirror

    SA

    DoutCell

    current

    Reference

    current

    Cell Mismatch

    DeviceMismatch

    Current

    Mirror

    Read-path input offsets

    Variations in BL bias, SA

    device, Icell and Iref

    (Icell) (Iref)

  • # 28

    Concept of High-Speed CSA - AVB

    Summed Read-path offset ( IOS-SUM) = (1) +(2) + IOS-SA= (1) + (2) + (3) +(4)

    Asym. Voltage

    Bias (AVB) +

    Short TPEE

    VTH Nulling for (4)

    VTH -nulling CSA

    (1) VBL, CBL variations (3) ( Input-stage

    VTH mismatch)(4) (VTH mismatch)

    (2) IREF variationsCSA Offset

    Sources

    Proposed

    AVB-CSA

    CSA

    OperationIREF

    IREFgeneration

    VCP

    VRP

    IV-Conversion

    (current-

    mirror,current-

    load,etc.)

    Comparator

    (VCMP)

    Digital

    Out

    ICELLVBL, Bias

    BL Precharge Sensing Operation

    Conventional

    CM-CSA Long TPEE to

    suppress (1)

    Use ΔVAP to

    compensate

    (1)~(4)

    • Asymmetric-Voltage-Biased (AVB)

  • # 29

    Schematic of Proposed AVB-CSA Use inactive sub-array to provide dummy BLs for IREF With ΔIAP-OS = –IOS-SUM to compensate offset

    ΔVAP option unit (VOU) provides trimmed ΔVAP to each

    AVB-CSA. (ΔVAP =VAP-CP –VAP-RP) B

    L

    ICELL

    WL

    Tim

    er

    WL

    Dri

    ve

    rs

    Du

    mm

    y

    BL

    (D

    BL

    )

    Selected cell

    AVB-CSAk VAP-CP

    Programmable Dummy WL-driver

    Voltage

    generator

    WE-/WP-/REF-Pages

    eFlash Array1

    eFlash Array2

    VOUVAP-CP [K-1:0] VAP-CP

    VAP-CP VAP-RPRPCP

    ICELL

    BL

    IREF

    DBL

    + -

    M1M3 M4

    M2

    M5 M6

    +

    +

    Shared with all I/O Trimming for each I/O

  • # 30

    0

    2

    4

    6

    8

    10

    12

    512-rows, VDD=1.2V

    512-rows, VDD=0.9V

    2048-rows, VDD=1.2V

    2048-rows, VDD=0.9V

    AVB Diasbled AVB Enabled

    TA

    C (

    ns

    )

    1.15X

    1.16X1.48X

    [email protected] ICELL ~=3.1σ

    1.15x improvement

    @ 512-cells/BL, VDD=0.9V

    1.52x improvement

    @ 2048-cells/BL,VDD=0.9V

    High-Speed CSA - Measured Results

    TAC=4.5nsCLK/

    SE

    TAC_

    Scan

    DOUT

    TAC=3.9ns

    Chip-delay(CLK-DOUT) = TAC + Path delay(CLK_DFF-DOUT)

    w/o AVB w/ AVB (ΔVAP=150mV)

    1Mb

    Super

    Flash

    MF Chang, A-SSCC 2013

    Less than 0.5% test time overhead compared to regular test operations

    8.7%86.3% 5.3%(Erase) (Prog.)

    Calibration time

    (

  • Challenges at 3D NVM

    Examples:

    1. 3D Vertical-Gate (3DVG)

    NAND

    2. 3D Sequential Layered NVM

    3. 3D Nonvolatile Logics

  • # 32

    Published 3D NAND ~2006 2007 2008 2009 2010 2011

    Stacked NAND

    IEDM 2006

    Multi-layer TFT

    IEDM 2006

    BiCS

    VLSI Symp

    P-BiCS

    VLSI Symp

    VSAT

    VLSI Symp

    VG-NAND

    VLSI Symp.

    Island-gate

    SSL

    decoded 3D

    VG

    VLSI Symp.

    S-SGT

    IEDM 2001

    Univ. of Tokyo 3D FG: DC -SF

    IEDM

    Hybrid-

    channel 3D

    VG

    IMW

    SO

    NO

    S/T

    AN

    OS

    F

    G

    PN diode

    decoded

    3DVG

    VLSI

    Symp.

    2012

    IDG SSL decoded

    3D VG

    VLSI Symp

    Metal control

    gate 3D FG

    VLSI Symp

    B

    L

    Bit Line

    DSL PWL WL63

    n+

    n+ junctionoxidepoly channel

    WL00 PWL

    CSL

    Simply

    stacked

    one etch

    Concept Various 3D NAND innovations

    Split-page

    3DVG

    IEDM

    TCAT

    VLSI Symp

  • # 33

    3D Vertical-Gate (3DVG) NAND

    Source: Hung and Lue (MXIC), IEDM 2013

    Etching is not perfectly vertical

    i.e. 500mV top-bottom Vth difference.

  • # 34

    Cross-Layer Variation

    => Require layer-aware scheme

    Slide 34

    Challenges of 3DVG NAND

    • Higher failure rate than

    2D NAND due to the

    process complexity

    – Need more ECC bits

    => Need faster fail-bit-

    detection scheme

    Layer Top Bottom

    Cell Vth Lower Higher

    Program speed Slower Faster

    PGM&RD Disturb less more

    Forward Read Vt comparison of PL1 and PL2

    VT(V)

    -3 -2 -1 0 1 2 3 4 5

    Co

    un

    ts (

    A. U

    .)

    Page0 (PL1, bot)

    Page1 (PL1, bot)

    Page2 (PL1, bot)

    Page3 (PL1, bot)

    Page0 (PL2, top)

    Page1 (PL2, top)

    Page2 (PL2, top)

    Page3 (PL2, top)

    ~0.5V Layer[k]

    Disturbance Layer[1]

    Disturbance

    Bit C

    ou

    nts

    Layer[k]

    “P” VTHLayer[1]

    “P” VTH

    VTHFaster PGM

    Speed

    Larger

    Disturbance

    VLSI 2013, MXIC+NTHU

  • # 35

    Example: Layer-Aware-Program-Verify & Read

    Conventional PV

    Same target threshold voltage

    (VTHP) across layers

    Top layer (Layer[k]) program

    to higher VTHP which causes

    endurance degradation

    Proposed LA-PV & R

    Different VTHP across

    layers

    Lower VTHP for Layer[k]

    to reduce endurance

    degradation

    Slide 35 SM: Sensing Margin

    Bit C

    ou

    nts

    Layer[1]

    “E”

    Layer[k] “E”

    Layer[1] “P”

    Layer[k] “P”

    VTHP[1]

    VTHP[k]VTHSM2'

    Bit C

    ou

    nts

    Layer[1] “E”Layer[k] “E”

    “P”

    VTHP

    VT

    H

    SMSM2

    VT

    H D

    istr

    ibu

    tio

    n

    afte

    r D

    istu

    rba

    nce

    SM2 > SM SM2' = SMSM

  • # 36

    Example: Measurement Result of 3DVG-NAND

    MLC cell Vth distribution with LA-PV & R

    Bit

    Co

    un

    t(A

    .U.)

    Vth (a.u.)

    MLC W/ LAPV

    Bit

    Co

    un

    t(A

    .U.)

    Vth (a.u.)

    MLC W/o LAPV

  • # 37

    3D Sequential Layered (3DSL) NVM

    37 NDL/NTHU, IEDM 2013

    A low-thermal process: Less impact on gate

    dielectrics, S/D structures

    Available in NDL, Taiwan

    Design & Test Challenges Different cell performance

    across layers

    Different thermal-effect

    across layers

    In-process monitor/testing

    - Full function test?

    - At-speed test ?

    To be appear in 2013 IEDM

    (highlight paper)

    100 nm 3D hybrid chip

    TaN

    Epi-like Si

    Eu+3-APS dielectricLayer 1

    Layer 2

    FE-like metal-oxide (Eu+3-APS) NVM

    0.8 μm 3D hybrid chip

    Layer 1

    Layer 2

  • # 38

    Example: 3D nvSRAM/nvLatch Cell

    BL/RRAM-CL sharing

    Two 3D-stacked resistive device

    2T RRAM

    switch:

    RL RR

    RSWL RSWR

    Q QB

    SWL

    6T SRAM

    Chou & Chang, NTHU/ITRI, Symp. VLSI 2010 / JSSC 2012

    6T

    SRAM

    Rnv8T

    w/o RFS

    Rnv8T

    w/ RFS

    WM

    RSNM

    VDDmin Write margin improves 1.64~2.4x

    Trade WM for RSNM

    RSNM is improved 1.42x at TT corner

    => improves VDDmin

  • # 39

    On/Off Energy:

    Store/re-store energy

    Standby time vs. on/off frequency

    Example: 3D nvSRAM/nvLatch Cell

  • # 40

    A 16Kb 8T2R nvSRAM macro

    ITRI’s RRAM + 0.18um CMOS

    Low-VDDmin & Fast power-on/off speed

    Enable Logic-in-Memory

    16Kb Rnv8T macro

    10-1

    100

    101

    102

    103

    104

    105

    10-1

    100

    101

    102

    103

    104

    105

    106

    Stor

    e ti

    me

    (a.u

    .)

    Store Energy (a.u.)

    SRAM+

    MRAM

    SRAM+

    PCM

    SRAM+

    Flash

    12T-SONOS

    106X

    105X

    100

    101

    102

    0

    2

    4

    6

    8

    10

    12

    14

    16

    Stor

    e ti

    me

    (nor

    mal

    ized

    )

    Store Energy (normalized)

    100

    101

    102

    0

    2

    4

    6

    8

    10

    12

    14

    16

    Stor

    e ti

    me

    (nor

    mal

    ized

    )

    Store Energy (normalized)

    This work

    Chou & Chang, NTHU/ITRI, Symp. VLSI 2010 / JSSC 2012

    Example: 3D nvSRAM Macro

  • # 41

    Summary – NVM in Dependable Systems

    Nonvolatile memory is one of the enablers for DS Power interrupts to reduce voltage and temp. stress

    Against sudden power failure

    Emerging memories X-RAM (STT, ReRAM, ..), 3D memory

    Low power and fast read/write operations

    Enable nonvolatile logics

    Challenges for designing NVM Read disturbance, resistance variation, reference current

    generations, area/speed vs. write current … etc.

    Silicon examples

    ReRAM: high-speed, low-voltage & area-efficient

    3D-Memory: TSV-RAM, 3D-VG NAND, 3D-SL NVM

    Nonvolatile latch and SRAM

    Collaboration of system, circuit and device is needed

  • Thank You for Your Attentions

    Acknowledgements

    NTHU, ITRI, NDL, TSMC and

    MXIC


Recommended