+ All Categories
Home > Documents > 2018 2020 Research Day - Technion · 2018 2020 Research Day STT-MTJ Properties •Nonvolatile...

2018 2020 Research Day - Technion · 2018 2020 Research Day STT-MTJ Properties •Nonvolatile...

Date post: 19-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
1
2018 2020 Research Day STT-MTJ Properties Nonvolatile High endurance CMOS compatible Ternary Neural Networks Ternary Synapse Based STT-MTJ Reference: 1) Tzofnat Greenberg-Toledo, Ben Perach, Daniel Soudry, Shahar Kvatinsky “MTJ-Based Hardware Synapse Design for Quantized Deep Neural Networks”. CoRR abs/1912.12636 (2019) 2) Vincent, Adrien F., et al. "Spin-transfer torque magnetic memory as a stochastic memristive synapse for neuromorphic systems." IEEE transactions on biomedical circuits and systems 9.2 (2015): 166-174 3) Vincent, Adrien F., et al. "Analytical macrospin modeling of the stochastic switching time of spin-transfer torque devices." IEEE Transactions on Electron Devices 62.1 (2015): 164-170. 4) Deng, Lei, et al. "Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework." arXiv preprint arXiv:1705.09283 (2017). Switching Probability From [3] the switching probability is given by- Δ = + Δ = Δ = =1− = tanh The switching probability of each memristor Δ > = 1 − erf 2 2 exp Δ Δ Δ Δ *assuming high current density STT-MRAM based synapse Software synapse (Ideal) Training With STT-MTJ Based Synapse STT-MRAM based synapse Software synapse Plugging the STT-MTJ based synapse switching behavior to CNN network training algorithm Initial results Setup Dataset- MNIST 28x28, 60000 training set, 10000 validation set Batch size-10000 Number of epochs- 1200 CNN architecture- Results The network manage to converge to similar error rate as the software performance The convergence suffers from “Noise” Main controller Memory buffer Shared bus TNN Hardware Architecture Main architecture concept Tile based architecture Each tile contain: several synapse arrays, shared local computation units (example: activation and its’ derivatives), local buffer, etc… Main controller to map each layer to one or more synapse arrays Improve power consumption and run-time Reduce memory accesses In-memory computation of the GXNOR (= dot-product) GXNOR Inverse read Write Need to support 3 operations: 1. GXNOR (“read”) 2. Inverse read 3. Write DNN- High overhead- hardware computation and memory intensive, TNN constrains the weights to −1,0,1 and replaced the multiply-accumulate operations with Gated-XNOR operation. Training DNN discretization Training TNN Ternary space −1,0,1 Logic operations between weights and neurons (GXNOR) Matrix-vector multiplication Real value of weights and neurons Gated XNOR- If one of the inputs is zero then the output is zero else XNOR -1 -1 1 -1 0 0 -1 1 -1 0 -1 0 0 0 0 0 1 0 1 -1 -1 1 0 0 1 1 1 Weight Update- Based on the work “Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework” [3] The weight update value are restricted to the discrete ternary space. For calculating the update there is no need to keep the full precision value of the weights. From [3]- discrete state transition in ternary weight space The weight increment Δ is a stochastic function of the Gradient Δ = − , . State 0 -1 1 0 At the feedforward stage the voltage supply (“the previous layer neuron”) of both memristor are , = − , = The current is given by = Example =-1 Out 0 1 -1 0 0 −1 1 0 For propagating the error back through the network the value need to be calculated. The rows acts as the input with voltage level The output current per column per memristor (H\L) is summed and then compared. The gradient value Δ = Notice- ∈ {0,1} The left memristor is updated with respect to Δ 1 =ቊ , 0 < < ( 1 ) 0, 1 << The right memristor is updated with respect to Δ − Δ 2 =ቊ , 0 < < ( ) 0, << Magnetic tunnel junction (MTJ) Two ferromagnetic electrodes separated by an insulator barrier Free layer- this layer polarization can be set by the current flowing through it (Spin-transfer torque) Fixed layer (“Pinned”)- fixed polarization, used as reference layer Fixed Layer Free Layer Insulator Free Layer Insulator Fixed Layer Parallel polarization- Low resistance Ani-parallel polarization- High resistance out -1 -1 1 -1 1 -1 1 -1 -1 1 1 1 &!=0 Low power consumption High write and read speed Stochastic switching delay Stochastic switching delay Critical current of the device 0 = 2 0 2 For high current regime For low current regime = 1 0 log 2 ~ 0, 0 , 0 = 0 = 0 −1 exp 0 1− 0 = 1 − exp #layer Type Size 1 Input 28x28 2 Conv 32×5×5 3 Pooling 2×2 4 Conv 65×5 5 Pooling 2x2 6 FC 512 7 FC 10 Future steps Improve the STT-MTJ switching models “Play” with the STT-MTJ properties Switching probability for state -1 Probability Probability Delta Delta Epoch Error rate [%] Validation error rate MTJ-Based Hardware Synapse Design for Ternary Deep Neural Networks Tzofnat Greenberg, Ben Perach, Daniel Soudry, and Shahar Kvatinsky
Transcript
Page 1: 2018 2020 Research Day - Technion · 2018 2020 Research Day STT-MTJ Properties •Nonvolatile •High endurance •CMOS compatible Ternary Neural Networks Ternary Synapse Based STT-MTJ

2018 2020 Research Day

STT-MTJ

Properties• Nonvolatile• High endurance• CMOS compatible

Ternary Neural Networks

Ternary Synapse Based STT-MTJ

Reference:1) Tzofnat Greenberg-Toledo, Ben Perach, Daniel Soudry, Shahar Kvatinsky “MTJ-Based Hardware Synapse

Design for Quantized Deep Neural Networks”. CoRR abs/1912.12636 (2019)2) Vincent, Adrien F., et al. "Spin-transfer torque magnetic memory as a stochastic memristive synapse for

neuromorphic systems." IEEE transactions on biomedical circuits and systems 9.2 (2015): 166-1743) Vincent, Adrien F., et al. "Analytical macrospin modeling of the stochastic switching time of spin-transfer

torque devices." IEEE Transactions on Electron Devices 62.1 (2015): 164-170.4) Deng, Lei, et al. "Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under

a Unified Discretization Framework." arXiv preprint arXiv:1705.09283 (2017).

Switching Probability

From [3] the switching probability is given by-

𝑃 Δ𝑤𝑖𝑗 𝑘 = 𝑘𝑖𝑗 + 𝑠𝑖𝑔𝑛 𝜚 Δ𝑊𝑖𝑗𝑙 𝑘 = 𝜏 𝑣𝑖𝑗

𝑃 Δ𝑤𝑖𝑗 𝑘 = 𝑘𝑖𝑗 = 1 − 𝜏 𝑣𝑖𝑗

𝜏 − 𝑠𝑡𝑎𝑡𝑒 𝑡𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦

𝜏 𝑣 = tanh 𝑚 𝑣

The switching probability of each memristor

𝑃 Δ𝑡𝑢

𝑅> 𝜖 = 1 − erf

𝜋

2 2 expΔ𝑡𝑥

𝑐𝑅

𝑃𝑟𝑖𝑔ℎ𝑡 ∝ 𝑓 Δ𝑊𝑖𝑗𝑙 𝑘 − Δ𝑊𝑖𝑗

𝑙 𝑘

𝑃𝑙𝑒𝑓𝑡 ∝ 𝑓 Δ𝑊𝑖𝑗𝑙 𝑘

*assuming high current density

ST

T-M

RA

M b

ase

d

syn

ap

se

So

ftw

are

sy

na

pse

(Id

ea

l)

Training With STT-MTJ Based Synapse

STT-MRAM based synapseSoftware synapse

Plugging the STT-MTJ based synapse switching behavior to CNN network training algorithm

Initial results

Setup• Dataset- MNIST 28x28,

60000 training set, 10000 validation set

• Batch size-10000• Number of epochs- 1200• CNN architecture-

Results• The network manage to

converge to similar error rate as the software performance

• The convergence suffers from “Noise”

Main controllerMemory buffer

Shared bus

TNN Hardware Architecture

Main architecture concept• Tile based architecture• Each tile contain: several synapse arrays,

shared local computation units (example: activation and its’ derivatives), local buffer, etc…

• Main controller to map each layer to one or more synapse arrays

Improve power consumption and run-time • Reduce memory accesses• In-memory computation of the

GXNOR (= dot-product)

GXNOR Inverse read Write

Need to support 3 operations:1. GXNOR (“read”)2. Inverse read3. Write

DNN- High overhead- hardware computation and memory intensive, TNN constrains the weights to −1,0,1 and replaced the multiply-accumulate

operations with Gated-XNOR operation.

Training DNN

discretization

Training TNN

Ternary space −1,0,1

Logic operations between weights and

neurons (GXNOR)

Matrix-vector multiplication

Real value of weights and neurons

Gated XNOR- If one of the inputs is zero then the output is zero else XNOR

-1 -1 1

-1 0 0

-1 1 -1

0 -1 0

0 0 0

0 1 0

1 -1 -1

1 0 0

1 1 1

Weight Update-Based on the work “Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework” [3] • The weight update value are restricted to the discrete

ternary space.• For calculating the update there is no need to keep the

full precision value of the weights.

From [3]- discrete state transition in ternary weight space

• The weight increment Δ𝑤𝑖𝑗 𝑘 is a stochastic function of

the Gradient Δ𝑊𝑖𝑗𝑙 𝑘 = −𝜂

𝜕𝐸 𝑊 𝑘 ,𝑌 𝑘

𝜕𝑊𝑖𝑗𝑙 𝑘

.

𝑴𝑳

𝑴𝑹𝑴𝑳 𝑴𝑹 State

𝑅𝑜𝑓𝑓 𝑅𝑜𝑓𝑓 0

𝑅𝑜𝑓𝑓 𝑅𝑜𝑛 -1

𝑅𝑜𝑛 𝑅𝑜𝑓𝑓 1

𝑅𝑜𝑛 𝑅𝑜𝑛 0

• At the feedforward stage the voltage supply (“the previous layer neuron”) of

both memristor are 𝑢𝑙,𝑖 = −𝑢𝑟,𝑖 = 𝑥𝑖𝑙

• The current is given by

𝐼 =𝑀𝑅 −𝑀𝐿

𝑀𝐿𝑀𝑅𝑢

• Example 𝑢=-1

𝑴𝑳 𝑴𝑹 Out

𝑅𝑜𝑓𝑓 𝑅𝑜𝑓𝑓 0

𝑅𝑜𝑓𝑓 𝑅𝑜𝑛 1

𝑅𝑜𝑛 𝑅𝑜𝑓𝑓 -1

𝑅𝑜𝑛 𝑅𝑜𝑛 0

𝒘

0

−1

1

0

For propagating the error back through the network the value 𝑊𝑇𝑦 need to be calculated.

The rows acts as the input with voltage level 𝑦

The output current per column per memristor (H\L) is summed and then compared.

The gradient value Δ𝑊 = 𝑥𝑇𝑦Notice- 𝑥 ∈ {0,1}

The left memristor is updated with respect to Δ𝑊

𝑒𝑖1 = ቊ𝑠𝑖𝑔𝑛 𝑦𝑖 , 0 < 𝑡 < 𝑎𝑏𝑠( 𝑦1 )

0, 𝑎𝑏𝑠 𝑦1 < 𝑡 < 𝑇𝑤𝑟

The right memristor is updated with respect to Δ𝑊 − Δ𝑊𝑒𝑖2

= ቊ𝑠𝑖𝑔𝑛 𝑦𝑖 − 𝑦𝑖 , 0 < 𝑡 < 𝑎𝑏𝑠(𝑦𝑖 − 𝑦𝑖 )

0, 𝑎𝑏𝑠 𝑦𝑖 − 𝑦𝑖 < 𝑡 < 𝑇𝑤𝑟

Magnetic tunnel junction (MTJ)• Two ferromagnetic electrodes separated by an

insulator barrier• Free layer- this layer polarization can be set by the

current flowing through it (Spin-transfer torque)• Fixed layer (“Pinned”)- fixed polarization, used as

reference layer

Fixed Layer

Free Layer

Insulator

Free Layer

Insulator

Fixed Layer

Parallel polarization-Low resistance 𝑹𝒐𝒏

Ani-parallel polarization-High resistance 𝑹𝒐𝒇𝒇

𝑢 𝑤 out

-1 -1 1

-1 1 -1

1 -1 -1

1 1 1

𝑢 & 𝑤 ! = 0

• Low power consumption• High write and read speed• Stochastic switching delay

Stochastic switching delayCritical current of the device

𝐼𝑐0 =2 𝑒

𝛼𝑉 1 ± 𝑃

𝑃

𝜇0𝑀𝑠𝑀𝑒𝑓𝑓

2

For 𝑰 ≫ 𝑰𝒄𝟎 high current regimeFor 𝑰 ≪ 𝑰𝒄𝟎 low current regime

𝜏 = 𝐶1

𝐼 − 𝐼𝑐0log

𝜋

2 𝜃

𝜃~𝑁 0, 𝜃0 , 𝜃0 =𝐾𝐵𝑇

𝜇0𝐻𝐾𝑀𝑠𝑉𝜏 = 𝒇0

−1 exp𝐸0𝑘𝐵𝑇

1 −𝐼

𝐼𝑐0

𝑃𝑠𝑤 = 1 − exp −𝛥𝑡

𝜏

#layer Type Size

1 Input 28x28

2 Conv 32×5× 5

3 Pooling 2 × 2

4 Conv 64 ×5× 5

5 Pooling 2x2

6 FC 512

7 FC 10

Future steps• Improve the STT-MTJ

switching models• “Play” with the STT-MTJ

properties

Switching probability for state -1

Pro

bab

ilit

y

Pro

bab

ilit

y

Delta Delta

Epoch

Err

or

rate

[%

]

Validation error rate

MTJ-Based Hardware Synapse Design for Ternary Deep Neural Networks

Tzofnat Greenberg, Ben Perach, Daniel Soudry, and Shahar Kvatinsky

Recommended