Magnetic Logic Circuits with High Bit Resolution for ...

Magnetic Logic Circuits with High Bit Resolutionfor Hardware Acceleration

by

Sumit Dutta

B.S., Electrical Engineering, University of Illinois at Urbana-Champaign, 2011S.M., Electrical Engineering and Computer Science, MIT, 2013

Submitted to the Department of Electrical Engineering and Computer Sciencein partial fulfillment of the requirements for the degree of

Doctor of Philosophyin Electrical Engineering and Computer Science

at the Massachusetts Institute of Technology

June 2017

c© 2017 Massachusetts Institute of TechnologyAll Rights Reserved.

Signature of Author:

Sumit DuttaDepartment of Electrical Engineering and Computer Science

May 19, 2017

Certified by:

Marc A. BaldoProfessor of Electrical Engineering and Computer Science and Director of RLE

Thesis Supervisor

Accepted by:

Leslie A. KolodziejskiProfessor of Electrical Engineering and Computer Science

Chair, Committee for Graduate Students

2

Magnetic Logic Circuits with High Bit Resolutionfor Hardware Acceleration

by Sumit Dutta

Submitted to the Department of Electrical Engineering and Computer Sciencein partial fulfillment of the requirements for the degree of

Doctor of Philosophy

AbstractThe ever-increasing demand for high-performance and low-power computing warrantsan investigation of technologies beyond conventional digital transistor circuits. We ex-plore a logic device based on magnetic domain walls, which are electrically movableboundaries between oppositely magnetized domains of a wire, for applications to hard-ware acceleration. A domain wall logic device takes current on the input, which movesa magnetic domain wall to a position in a ferromagnetic wire, and this position is thenonvolatile data token read as an output current through a magnetic tunnel junction.

The spatial resolution of discrete magnetic domain wall positions in domain walllogic devices is studied to guide memory and logic applications. Theory, numerical mod-eling, and experiments on in-plane and perpendicularly magnetized materials demon-strate that the bit resolution, or analog information capacity, of a magnetic nanowirewith a single domain wall is limited by the self-affine statistics of the wire edge rough-ness. The domain wall logic device is extended further into functional design implemen-tations, including a logic-in-memory architecture to perform deep convolutional neuralnetwork operations in a hybrid process with magnetic devices and 45 nm CMOS. A3-terminal magnetic logic device is designed to have a 3-bit resolution, and is used inconjunction with transistors in circuit designs for an efficient logic-in-memory systemthat can process convolutional neural networks 10× faster than conventional digitalCMOS implementations.

Thesis Supervisor: Marc A. BaldoTitle: Professor of Electrical Engineering and Computer Science and Director of RLE

3

4

Acknowledgments

I have had an incredible journey pursuing research in circuits with emerging devices atMIT. I must express my gratitude to several groups of people starting from my profes-sional network and extending to my friends and family. I have had a unique, covetedopportunity to do real electrical engineering research using emerging magnetic materialswith Prof. Marc Baldo, while being able to consult with Prof. Caroline Ross in the De-partment of Materials Science and Engineering. I am additionally grateful to have theguidance and suggestions of Prof. Luqiao Liu and Prof. Anantha Chandrakasan, whoare on my thesis committee in addition to Prof. Marc Baldo. It has been very worth-while to consult with these professors and many students in their research groups, andexciting to deliver breakthrough research on magnetic logic circuits together connectingthe fields of materials science, device physics, and circuit design.

I must extend a special token of appreciation to Prof. Caroline Ross, who includedme and other students studying magnetic devices in her own research group informally.Although Prof. Caroline Ross was not officially my thesis supervisor, she played anintegral advisory role for me in many respects.

In my first two years at MIT before I had begun research on magnetic logic circuits, Iwas advised by Prof. Vladimir Stojanovic, and I appreciate the opportunity to developsome of the most complex relay circuits under his guidance. Much of the circuitsexpertise I developed working with Prof. Vladimir Stojanovic transferred to my circuitinterests with magnetic logic devices in my third through sixth years at MIT.

It has been a delight to meet and work with the other students in the research groupsof Prof. Marc Baldo and Prof. Caroline Ross. While I must thank everyone who hasbeen in these groups since I joined, I especially appreciate working with Saima Siddiqui,Joe Finley, Jean Anne Currivan-Incorvia, Tony Wu, Enno Lage, Eduardo FernandezMartin, Ethan Rosenberg, Jinshuo Zhang, Astera Tang, and Lara Tryputen. I would liketo thank David Bono for his help with designing our test infrastructure. Furthermore,I enjoyed working with everyone in the Research Communication Laboratory led byCarol Lynn Alpert and Karine Thate.

I would also like to thank two of my paper coauthors, Michael Price and FelixButtner, who were in the research groups of Prof. Anantha Chandrakasan and Prof.Geoff Beach, respectively. I thank them for their full participation in our interdisci-

5

6

plinary collaborations and for bringing important insights to the table. I would alsolike to thank Prof. Harry Lee, Prof. Geoff Beach, Dr. David Paul, and Prof. Jeff Langfor helpful discussions.

Before I began graduate school, I had already dived into research with Prof. Eric Popwho was at the University of Illinois at Urbana-Champaign. The experience I developedwith his students Albert Liao and David Estrada was invaluable and encouraging.

My friends have been an indispensable asset throughout my journey. A far-from-exhaustive list would include Piyush and Toshi Jain, George and Janine Garcia, SunnyVanderboll, Jean-Philippe Coutu, Shawn Adderly, Xueying Zhao, Ethan Sokol, Kris-tine Fong, Tom Ogrady, Abraham Neben, Matt Melissa, Aaron Tran, Quan Nguyen, LizChen, Jan Iyer, Anumita Das, Bhakti Halkude, Amy Agarwal, Cassiopeia Roychowd-hury, Beth Hadley, Alice Wang, Julie Charbonneau, Mariana Rodriguez Buno, An-drea Carney, Karen Ng, Manway Liu, April Wang, Royson Chong, Ella Chan, VikramAgarwal, Markrete Krikorian, Nathan Einstein, Cipriano Romero, Adi Pathak, AlexSanchez, Shawn Medford, Neladri Bose, Swami Tyagananda, Henna Chatterjee, IshwarMahadeo, Michael and Yanfen Li, Chase Driskell, Terrance Stevenson, Sam Gunter,Drew Blackburn, Justin and Maria Walker, Chris and Stacey Briles, John Scaggs, Brit-tany Rankin, and Zach Powell.

The freedom I had to explore science and technology back in South River HighSchool in Edgewater, Maryland helped shape my interest in research today, and for thatI give special thanks to Mrs. Kim Champagne who taught history and programming,Mr. John Jacobson who advised FIRST Robotics Team 1111, Mrs. Cheryl Young whotaught English, and Mr. Jim Hopkins who taught calculus.

Finally, I must thank my brother, Sanjit Dutta, my father and mother, Subijoy andUrmi Dutta, and the rest of my family.

Contents

Abstract 3

Acknowledgments 4

List of Figures 11

1 Introduction 171.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.2 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Magnetic Domain Wall Logic Devices 212.1 Circuit model for a magnetic device . . . . . . . . . . . . . . . . . . . . 232.2 Fanout and design for multiple stages . . . . . . . . . . . . . . . . . . . 252.3 Clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.1 Clock generators . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3.2 Resonant clock trees . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.5 Digital design for medium and large scale integration . . . . . . . . . . . 312.6 Experimental testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.7 Scaling and improvements with spin orbit torque . . . . . . . . . . . . . 33

3 Micromagnetic Modeling of Nanowires with Edge Roughness 353.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.4 Defect pinning in IMA nanowires . . . . . . . . . . . . . . . . . . . . . . 39

3.4.1 Triangular and rectangular notch defects . . . . . . . . . . . . . . 393.4.2 Anisotropy defects . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4.3 Comparison to experimental results . . . . . . . . . . . . . . . . 40

3.5 DW motion in IMA wires with periodic notches . . . . . . . . . . . . . . 413.6 DW motion in PMA wires with periodic notches . . . . . . . . . . . . . 433.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7

8 CONTENTS

4 Reconfigurable Digital Logic with Nonvolatile Magnetic Devices 474.1 The analog MTJ trimmer . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 The digital MTJ trimmer . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.4 Nonvolatile digital trimming . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4.1 MTJ bit cell for static CMOS trimming . . . . . . . . . . . . . . 504.4.2 Trimming global clock tree buffers using MTJ bit cells . . . . . . 524.4.3 Trimming STT-MRAM logic-in-memory . . . . . . . . . . . . . . 55

4.5 Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.5.1 Static timing analysis . . . . . . . . . . . . . . . . . . . . . . . . 564.5.2 Clock skew reduction in trimmed global clock trees . . . . . . . . 57

Dot product circuit . . . . . . . . . . . . . . . . . . . . . . . . . . 58Fast Fourier transform circuit . . . . . . . . . . . . . . . . . . . . 59

4.5.3 Applications to silicon debug . . . . . . . . . . . . . . . . . . . . 614.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Multi-Level Cell Magnetic Tunnel Junctions 655.1 A theoretical MLC MTJ . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.2 Implementations of the MLC MTJ . . . . . . . . . . . . . . . . . . . . . 655.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6 The Information Limit for Domain Walls in Magnetic Nanowires 696.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.3 Fractal model for fabricated wire edges . . . . . . . . . . . . . . . . . . . 706.4 Resolution limit from domain wall position distributions . . . . . . . . . 726.5 A realistic model for domain wall motion . . . . . . . . . . . . . . . . . 756.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.7 Experimental methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.7.1 Nanowire fabrication . . . . . . . . . . . . . . . . . . . . . . . . . 786.7.2 Domain wall traveling distance measurements . . . . . . . . . . . 796.7.3 Micromagnetic modeling . . . . . . . . . . . . . . . . . . . . . . . 816.7.4 Power spectral density (PSD) calculations . . . . . . . . . . . . . 816.7.5 Domain wall position spacing models . . . . . . . . . . . . . . . . 826.7.6 Analytical models . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7 Circuits for High-Resolution Magnetic Devices 857.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857.2 Deterministic domain wall motion applications . . . . . . . . . . . . . . 857.3 Stochastic domain wall motion applications . . . . . . . . . . . . . . . . 87

7.3.1 How to add stochasticity to current applications . . . . . . . . . 887.3.2 Randomness from thermal effects . . . . . . . . . . . . . . . . . . 88

CONTENTS 9

7.3.3 Randomness from the tunnel barrier . . . . . . . . . . . . . . . . 897.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

8 Neuromorphic Computing with High-Resolution Magnetic Devices 918.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.2.1 Device and circuit design approach . . . . . . . . . . . . . . . . . 928.3 The MTJ function evaluator . . . . . . . . . . . . . . . . . . . . . . . . . 94

8.3.1 Function implementation with a thresholding MTJ . . . . . . . . 968.4 Logic-in-memory system design . . . . . . . . . . . . . . . . . . . . . . . 97

8.4.1 Crosspoint array . . . . . . . . . . . . . . . . . . . . . . . . . . . 978.4.2 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . 988.4.3 Voltage drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.4.4 The sense amplifier . . . . . . . . . . . . . . . . . . . . . . . . . . 1008.4.5 Direct layer to layer connections . . . . . . . . . . . . . . . . . . 1008.4.6 System operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

9 Conclusions and Future Studies 1079.1 Tunnel magnetoresistance required . . . . . . . . . . . . . . . . . . . . . 1079.2 Digital logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079.3 Machine learning applications . . . . . . . . . . . . . . . . . . . . . . . . 1089.4 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Bibliography 109

10 CONTENTS

List of Figures

1.1 Magnetic materials have a magnetization ~M that can be switched collec-tively, i.e., at very low voltage, and thus present an opportunity to makelow-energy logic devices. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.2 Historically, magnetic materials have been used for memory in comput-ers. Pictured are (a) magnetic core memory pioneered by Jay Forresterof MIT in 1950, (b) magnetic hard disk drives in use since the 1950s, and(c) magnetic random access memory modules under development sincethe 1990s [8, 9]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3 Hardware acceleration options that make complex computing tasks avail-able in more places, going beyond data centers into personal computersand mobile technologies connected by the Internet of things. . . . . . . . 19

2.1 Magnetic domain wall logic device based on spin-transfer torque (STT)with in-plane magnetic anisotropy (IMA). Two states are shown in (a)and (b). This is described in [14] and the image is courtesy of J. A.Currivan-Incorvia and coauthors. . . . . . . . . . . . . . . . . . . . . . . 22

2.2 Magnetic tunnel junction (MTJ) with in-plane magnetic anisotropy (IMA). 222.3 Resistance versus voltage (R-V) characteristic of an MTJ with IMA sim-

ulated in HSPICE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.4 Magnetic domain wall logic device based on spin-orbit torque (SOT) with

IMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.5 Magnetic domain wall logic device based on STT with perpendicular

magnetic anisotropy (PMA). This is described in [14] and the image iscourtesy of J. A. Currivan-Incorvia and coauthors. . . . . . . . . . . . . 24

2.6 Magnetic domain wall logic device based on spin-orbit torque (SOT)with PMA. This is the most attractive configuration because it requiresthe lowest currents and is most practical to make. SOT with PMA isdescribed thoroughly in the literature [17, 18, 19]. . . . . . . . . . . . . . 24

2.7 The circuit model used most often for the magnetic domain wall device. 24

11

12 LIST OF FIGURES

2.8 An alternative circuit model for the magnetic domain wall device. Thismodel is not used often because CMTJ ∼ 10−18 F in typical scaled de-vices, which can be neglected [2]. . . . . . . . . . . . . . . . . . . . . . . 25

2.9 A magnetic domain wall logic device connected to drive another device,i.e., a circuit with device fanout 1, the same as fanin 1. . . . . . . . . . . 25

2.10 A magnetic domain wall logic device configured with fanout 1, the sameas fanin 1, showing how all participating devices can be included eventhough the model in Fig. 2.9 is sufficiently accurate. . . . . . . . . . . . 26

2.11 A magnetic domain wall logic device configured with fanout 4. . . . . . 272.12 A magnetic domain wall logic device configured with fanin 4. . . . . . . 272.13 A magnetic domain wall logic device configured with fanout 2. . . . . . 282.14 A magnetic domain wall logic device configured with fanin 2. . . . . . . 282.15 A resonator circuit with a resistor, inductor, and capacitor placed in each

branch of the clock tree, designed to distribute the clock signal throughthe tree. The network between node A and node B would be in the clocktree, which we excite here with a voltage step VDDu(t). . . . . . . . . . 29

2.16 A resonator circuit with an MTJ, resistor, inductor, and capacitor placedin each branch of the clock tree, designed to distribute the clock signalthrough the tree. Sharp transitions in the value of RMTJ from RAP toRP and vice versa help sustain oscillations. As in Fig. 2.15, the stepresponse is seen to the network between node A and node B that wouldbe in the clock tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.17 (a) The current in a step response of a clock distribution network rep-resented as an RLC network with no MTJ, seen in Fig. 2.15. (b) MTJresistance and (c) the current in the step response of a clock distributionnetwork represented with an RLC network with an MTJ. The circuit inFig. 2.16 is measured in (b) and (c), where an added MTJ drives moreoscillations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.18 Full adder with magnetic domain wall logic devices, where each NANDor NOT gate is a single device. This is described in [14] and the imageis courtesy of J. A. Currivan-Incorvia and coauthors. . . . . . . . . . . . 32

2.19 Experimental prototype of the magnetic domain wall logic device. Thisis described in [3] and the image is courtesy of J. A. Currivan-Incorviaand coauthors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

LIST OF FIGURES 13

3.1 (a) Schematic of a logic device made from a magnetic IMA nanowire from[14] with LER added. Red and blue represent magnetization directions,the output is a tunnel junction with a fixed layer (blue), and the input andclock include an antiferromagnet (not shown) to pin the magnetizationdirection of the free layer. Simulation model image of DW motion in an(b) IMA nanowire with LER and (c) PMA nanowire with LER. (d) SEMimage of CoFeB nanowire and its discretization, from which the shapesof (b) and (c) are derived. Pinning site A labeled in (b) is the strongestpinning site that defined the threshold for that nanowire. . . . . . . . . 37

3.2 DWs are pinned by isolated notch defects. The DW core is orientedupward such that the DW is wider on the top of the page. We modelpinning for notches of different depths placed on the top or bottom,whose shape is (a) triangular or (b) rectangular. The top panels show apinned DW in a 30-nm-wide wire. For the 30-nm-wide wire, the effect ofmoving the notch to the other side of the wire is shown. This producesa chirality filtering effect. . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3 (a) Moving DWs are pinned by an anisotropy defect, shown as a shadedrectangle in the top panel. This pinning is modeled for various aniso-tropy magnitudes. (b) Comparison of the range of depinning fields forexperimental 60-nm-wide and 80-nm-wide Co nanowires and simulated60-nm-wide Co nanowires with a rectangular notch, varying the notchdepth and including anisotropy increases of up to 10% of Khcp-Co, shownas shaded rectangles around the notch in the top panel. The ranges ofdepinning fields for the experiment and model are comparable at notchdepths around 6 – 14 nm. . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 The DW velocity in IMA nanowires varies with position in periodicallynotched wires. The DW is driven at a constant magnetic field 3% abovethe threshold for DW traversal. Insets show snapshots of the DW positionat a velocity maximum and a velocity minimum for the 96 nm period.The DW velocity oscillations follow the changes in linewidth when theDW is narrower than the notch period. . . . . . . . . . . . . . . . . . . . 42

3.5 The DW velocity in PMA nanowires driven at a constant magnetic field.The same field is applied in all wires, higher than Walker breakdown.There are small fluctuations in velocity with position along the wireexcept for a period of 6 nm and below. . . . . . . . . . . . . . . . . . . . 43

4.1 Voltage levels for the programming mode and the regular operation modeof an MTJ, as determined from the resistance-voltage characteristic de-termined from an HSPICE simulation. . . . . . . . . . . . . . . . . . . . 48

14 LIST OF FIGURES

4.2 Example analog MTJ trimmer circuits for a differential pair. Althoughthe design may work in theory, process variations in the MTJ and the dif-ficulty of getting sufficient programming current in selected MTJs makesthis design impractical. (a) Only a single device is used to provide pro-gramming currents to the two devices, using the devices in the trimmedcircuit to aid with selection. (b) Transmission gates are used to programeach MTJ with more accurate currents in the desired direction. . . . . . 49

4.3 MTJ bit cell: The MTJ is put in a low-resistance (RP ) or high-resistance(RAP ) state depending on the write current direction, set digitally. Thesense amplifier design to read RMTJ is based on [84]. Transistors use theminimum width, w0 = 120 nm, when possible to reduce area. . . . . . . 51

4.4 Digital trimmer applied to a generic static CMOS logic gate: Comple-mentary resistances are put above the PUN and below the PDN. Certainparts of the PUN or PDN may bypass the programmable resistances. . . 52

4.5 Programmable clock buffer schematic: wP0 = 1.8 µm and wN0 = 1.2 µmfor high drive strength. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6 (a) Relative programmed resistance and propagation delay for the pro-grammable clock buffer of Fig. 4.5, in different CMOS process corners.(b) Clock latency in the programmable clock buffer in the typical processfor the 1 → 0 and 0 → 1 output transitions for default, minimum, andmaximum programmed delays, showing nearly uniform transition times.The original non-programmable clock buffer latency is added for reference. 54

4.7 Digital trimmer applied with memory arrays in STT-MRAM to nearbylogic circuits in memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.8 MTJ bit cell simulation: The MTJ is written a 0, then the 0 is read,then the MTJ is written a 1, and then the 1 is read. . . . . . . . . . 57

4.9 Clock tree in a dot product circuit: The upper clock buffer is first foundto be in a fast process corner, which is corrected by re-programming thatbuffer to a slower output latency. . . . . . . . . . . . . . . . . . . . . . . 59

4.10 Correction of clock skew introduced by process variation within the FFTmodule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.11 Useful skew fixing of setup violations for increased clock frequency. Theslacks shift in the positive direction by about 16 ps when corrected. . . . 62

5.1 Circuit schematics of multi-level cell (MLC) designs with multiple mag-netic tunnel junctions (MTJs) for one variable resistor. . . . . . . . . . . 66

5.2 MLC with 4 MTJs with series resistances, in parallel, as in Fig. 5.1(c).There are 16 different resistance states in this circuit. The major andminor loops are shown. It is seen that the input voltage VIN must beswept across several intervals before a desired resistance state can bereached. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

LIST OF FIGURES 15

6.1 Concentric magnetic nanowire rings, each with a magnetic domain wall.The positioning and motion of domain walls in such wires is analyzedwith micromagnetic models and described analytically in this study. . . 70

6.2 Scanning electron microscope (SEM) images of (a) concentric rings and(b) L-shaped nanowires. (c) SEM image of wire discretized as shown bythe dotted line. (d) Power spectral density of the line edge roughnessof the discretized wire, the average of 104 synthesized self-affine wires,and the average of 104 synthesized random-edge wires. The correlationlength, ξ, is extracted from the discretized wire and is applied to synthe-size the self-affine wires. . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.3 (a) Final positions into which domain walls relax from their initial po-sitions of domain wall nucleation, for one wire in micromagnetic simu-lations. The bin width is chosen to be the size of one simulation cell, 3nm. Inset: Zoom of initial positions. (b) Line width roughness profile ofthe wire shows that the traps often correspond to local minima in width. 73

6.4 Distribution of spacing ∆x between domain wall discrete positions for(a) IMA and (b) PMA wires, including wires with self-affine edge pro-files with correlation length ξ and wires with random-edge profiles. Thedistributions are normalized by area. . . . . . . . . . . . . . . . . . . . . 74

6.5 Distributions of distances traveled by domain walls in simulated nano-wires. The sample size is 1500 wires. The fractal analytical model is theoverlaid line with parameters listed for each applied field. . . . . . . . . 75

6.6 (a) Magnetic force micrographs of domain walls in concentric rings withapplied magnetic fields of 20 kA/m, 24 kA/m, 28 kA/m, and 32 kA/malong the y-axis. The black and the white arrows indicate the positionsof two separate domain walls in two different nanowires at different mag-netic fields. (b) Fractal analytical model and (c) exponential analyticalmodel overlaid on distributions of distances traveled by domain walls inexperiment. The sample size is 94, 94, 83, and 134 wires in increasingorder of applied fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.7 Magnetic force micrograph of domain walls in the initial onion states. . 806.8 Magnetic force micrographs of L-shaped nanowires. . . . . . . . . . . . . 806.9 A single domain wall as seen in a close-up top-down view of the magnetic

moments, mx and mz , in micromagnetic models of (a) an IMA wire and(b) a PMA wire, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 82

7.1 A 3-terminal magnetic logic device including a long MTJ can be used fora variety of circuit applications. The line edge roughness depicted playsa role in setting the device bit resolution, i.e., the number of bits thatthe device could be used to process. . . . . . . . . . . . . . . . . . . . . 86

8.1 Neural network operations: multiply-accumulate for the dot product andone of many nonlinear functions for activation or thresholding. . . . . . 92

16 LIST OF FIGURES

8.2 Design of the MTJ function evaluator. (a) Device drawing. (b) Mi-cromagnetic model of device with domain wall motion with a variablecurrent density along the wire due to spin-orbit torque. . . . . . . . . . 93

8.3 (a) Transfer characteristic from input current to output domain wallposition and resistance, determined in micromagnetic simulation. Nor-malized data are on the left and bottom axes and actual data are on thetop and right axes. (b) Width function required for the shown shiftedsigmoid thresholding function. (c) Current density profile due to thewidth profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8.4 General approach to using Ohm’s law to perform an analog multiplica-tion for deep convolutional neural networks. (a) The synaptic functionperforms the dot product. (b) The activation function evaluator, per-forms thresholding based on a generally nonlinear function f(IJ). . . . . 98

8.5 (a) Circuit symbols and models for the synaptic MTJ and thresholdingMTJ. (b) Crosspoint array using synaptic MTJs and thresholding MTJsas analog logic and memory elements. . . . . . . . . . . . . . . . . . . . 99

8.6 Architecture of the logic-in-memory system. . . . . . . . . . . . . . . . . 1008.7 (a) Read-line driver. (b) Bitline driver. (c) Common source amplifier

with an active load used in the driver circuits. . . . . . . . . . . . . . . . 1018.8 Multi-bit sense amplifier design based on [6]. . . . . . . . . . . . . . . . 1028.9 (a) Feed-forward network operation and (b) memory operations: reading

(dashed lines only) and writing (solid lines only) weights. . . . . . . . . 1038.10 Any number of layers can be processed in the analog domain, without

having to convert each layer result from analog to digital back to analog. 104

Chapter 1

Introduction

LOGIC circuits are the backbone of information systems found everywhere. Conven-tionally, these circuits have been composed of transistors, which are robust but are

running into limits that other devices could surpass [1, 2, 3, 4, 5, 6]. Since 2005, proces-sor speeds have been limited to the range of 1 to 6 GHz and processor design is limitedby power density [7]. Although transistor scaling is still going strong, it is providingdiminishing returns, which warrants an investigation of emerging technologies beyondconventional digital transistor circuits.

The premise of our studies of devices based on magnetic materials is that they enablean alternative pathway to fast, low-energy computation. Independent charges require aminimum amount of voltage change for a transistor to switch, but spins can be controlledcollectively in magnetic devices, offering us an opportunity to reduce switching energywith a lower operating voltage. In other words, the domains of ferromagnets can beswitched at low voltages because electron spins align collectively in such systems. Thefundamentally low energy that is possible for switching magnetic materials is illustratedin Fig. 1.1.

Magnetic logic devices are compatible with the traditional fabrication process forconventional complementary metal-oxide-semiconductor (CMOS) transistor logic de-vices [3, 5]. Magnetic materials have a history of being used for computer memory,seen in Fig. 1.2. If we are to advance the performance, density, or energy-efficiency ofcomputational circuits, we could build purely magnetic logic circuits or hybrid circuitswith transistors and magnetic devices. The questions arise then—what are the limitsto the resolution of those magnetic devices and how can we maximize capacity with or

Magnetic materials provide complementary functions to conventional CMOS circuits Magnetic core memory Hard drives

Exchange interactions in ferromagnets provide a fundamentally digital store of information

Why Magnetic Materials?

Dutta 7

Spins move collectively in magnetic nanowires

Lower switching energy and high endurance achievable

Figure 1.1. Magnetic materials have a magnetization ~M that can be switched collectively, i.e., at verylow voltage, and thus present an opportunity to make low-energy logic devices.

17

18 CHAPTER 1. INTRODUCTION

(a) (b) (c)

STT-MRAM

Figure 1.2. Historically, magnetic materials have been used for memory in computers. Pictured are(a) magnetic core memory pioneered by Jay Forrester of MIT in 1950, (b) magnetic hard disk drivesin use since the 1950s, and (c) magnetic random access memory modules under development since the1990s [8, 9].

without the aid of transistor circuits?A variety of emerging devices can add to the capabilities of conventional transistor-

based circuits. Spin-torque nano-oscillators are an example [10]. The industry hasbeen implementing highly parallelized algorithms no longer on general-purpose centralprocessing units (CPUs), but rather on custom hardware such as graphics processingunits (GPUs) and field-programmable gate arrays (FPGAs), as seen in in Fig. 1.3. Thenext generation of custom hardware implementations may perhaps benefit from usingemerging devices such as magnetic logic devices.

The shift from general-purpose processors to custom hardware accelerators is drivenby the fact that computing is becoming embedded in ubiquitous applications spanningapplications such as sensor processors for self-driving cars and voice recognition systemsfor cell phones. For example, artificial intelligence computing with neural networks isused by self-driving cars that process sensor images to detect objects such as pedestrians,bikes, and vehicles. While such tasks are typically run today by having the mobile devicecommunicate with a powerful data center to run the task, this does not scale well whenthe number of mobile devices keeps growing. In the Internet of things, devices from cellphones to sensors in buildings and cars all will need to perform intelligent computations.Instead of placing the burden on data centers and communication infrastructure, it ispreferable that the mobile devices run their own tasks efficiently with custom hardwareaccelerators with low power and low latency.

At the circuits and systems level, another barrier to computing performance is thememory wall, the term used to describe the bottleneck in the data transfer between theprocessor cores and the memory banks in a CPU. The memory wall may be overcomeby making logical operations possible within memory, i.e., to have logic-in-memory.Logic-in-memory solutions with CMOS already show reductions in power requirements[11], and this work shows the design and advantages of logic-in-memory solutions usingmagnetic logic devices.

Magnetic materials are readily integrated with conventional CMOS processes, asdemonstrated in commercial magnetic random access memory [5]. Current devices can

Sec. 1.1. Outline 19

From Processors to Ubiquitous Accelerators

Neural networks, usually implemented with general purpose processors, become more efficient with hardware acceleration

Dutta 3

Image Vision features Detection

Audio Audio features Speaker ID

CPUs

GPUs

FPGAs

Emerging devices

operations/W

ubiq

uity

Classification with deep learning:

Image courtesy of A. Ng

Figure 1.3. Hardware acceleration options that make complex computing tasks available in moreplaces, going beyond data centers into personal computers and mobile technologies connected by theInternet of things.

store a single bit, 0 or 1, but it is possible to store more bits in a single magnetic device.We show in this work how the readable and writable position of a magnetic domain wallin a magnetic device can be represented by a multi-bit binary word, thereby making itpossible to store multiple bits in a magnetic device [12].

We define the bit resolution of magnetic devices to mean the maximum numberof bits that can be written to or read from the magnetic device. The term is derivedfrom its use in analog to digital converters, where the bit resolution is the number ofbits that the converter outputs from an analog input signal. A pivotal question thatthis dissertation answers is how to determine and achieve the highest bit resolution ofmagnetic devices.

1.1 Outline

The remainder of this dissertation begins with a description of magnetic domain walllogic devices in Chapter 2. The nanowires used to make such devices are modeled inChapter 3. Reconfigurable digital logic with magnetic tunnel junctions is explored inChapter 4. Chapter 5 explores how multiple bits may be stored with magnetic tunneljunctions. In Chapter 6, a self-affine fractal edge model is developed and used todetermine the limit to how many bits of information can be stored in magnetic devices.Magnetic devices that can store multiple bits are then applied to circuits in Chapters 7and 8, where a neural network is implemented in a logic-in-memory system. The thesisthen concludes in Chapter 9.

20 CHAPTER 1. INTRODUCTION

1.2 Publications

This dissertation covers research contributions by the author reported in the followingpapers:

1. S. Dutta, S. A. Siddiqui, F. Buttner, L. Liu, C. A. Ross, and M. A. Baldo, “A Logic-in-Memory Design with 3-Terminal Magnetic Tunnel Junction Function Evaluatorsfor Convolutional Neural Networks,” submitted

2. S. Dutta,† S. A. Siddiqui,† J. A. Currivan-Incorvia, C. A. Ross, and M. A. Baldo,“The Spatial Resolution Limit for Domain Walls in Sub-100-nm-wide MagneticNanowires,” submitted

3. S. A. Siddiqui, S. Dutta, J. A. Currivan-Incorvia, C. A. Ross, and M. A. Baldo,“The effect of magnetostatic interactions on stochastic domain wall motion insub-100-nm wide nanowires,” in preparation

4. S. Dutta, M. Price, and M. A. Baldo, “Nonvolatile Online CMOS Trimming withMagnetic Tunnel Junctions,” IEEE/ACM International Symposium on NanoscaleArchitectures (NANOARCH), July 2016, Beijing, China [4]

5. J. A. Currivan-Incorvia, S. A. Siddiqui, S. Dutta, E. Evarts, J. Zhang, D. C.Bono, C. A. Ross, and M. A. Baldo, “Logic circuit prototypes for three-terminalmagnetic tunnel junctions with mobile domain walls,” Nature Communications 7,10275 (2016) [3]

6. J. A. Currivan-Incorvia, S. A. Siddiqui, S. Dutta, E. R. Evarts, C. A. Ross, andM. A. Baldo, “Spintronic Logic Circuit and Device Prototypes Utilizing DomainWalls in Ferromagnetic Wires with Tunnel Junction Readout,” IEEE InternationalElectron Devices Meeting (IEDM), December 2015, Washington, DC [13]

7. S. Dutta, S. A. Siddiqui, J. A. Currivan-Incorvia, C. A. Ross, and M. A. Baldo,“Micromagnetic Modeling of Domain Wall Motion in Sub-100-nm-Wide Wires withIndividual and Periodic Edge Defects,” AIP Advances 5, 127206 (2015) [12]

Chapter 2

Magnetic Domain Wall LogicDevices

THE fundamental class of devices used for magnetic logic circuits is described in thischapter, laying the foundation for an exploration of several magnetic logic devices

and applications to circuits. The magnetic domain wall logic device, first explored in[14], takes an input current to set its state and has an output current that reports thedevice state. If the total input current is high enough, the device state will change.Then, the device output current will be high or low depending on the state. Thedevice state is stored in the magnetic tunnel junction (MTJ), an element with variableresistance (RMTJ) that is programmed by the input current to a low resistance (RP ) ora high resistance (RAP ). The output current is then high or low depending on the valueof RMTJ that was set. The magnetic domain wall logic device promises low voltageswitching and is inherently a storage device in addition to being a logic device. Thisenables the device to be a single-device NAND or NOT logic gate, and can be consideredfurther as a flip-flop memory element or memory cell. Such characteristics are ideal forlogic-in-memory and pipelined circuits. This work presents several types of magneticdomain wall logic devices, one of which is shown in Fig. 2.1.

In the device in Fig. 2.1, a current density with spin-polarized electrons from theirentry contact into the magnetic nanowire exerts a spin-transfer torque (STT) on themagnetic moments that results in domain wall (DW) motion [15].

An MTJ is shown in Fig. 2.2, where between the top and bottom electrical contactsthere is a layer with fixed magnetization, a tunnel barrier typically made of MgO, anda free layer whose magnetization direction is set by the direction of current applied be-tween the contacts. The MTJ shown has in-plane magnetic anisotropy (IMA), meaningthat the magnetic layers are magnetized in the planar directions. The opposite casewould be if magnetic layers had perpendicular magnetic anisotropy (PMA), in whichcase the magnetizations would be out of the plane of the device.

A simulation of an MTJ shown in Fig. 2.3 shows how the MTJ is put into each of its2 different resistance states. The circuit model for the simulation in HSPICE is basedon a model presented and made openly accessible in [16]. The parameters for the MTJin Fig. 2.3 are based on the scaled values of the device measurements in [3], and are

21

22 CHAPTER 2. MAGNETIC DOMAIN WALL LOGIC DEVICES

Figure 2.1. Magnetic domain wall logic device based on spin-transfer torque (STT) with in-planemagnetic anisotropy (IMA). Two states are shown in (a) and (b). This is described in [14] and theimage is courtesy of J. A. Currivan-Incorvia and coauthors.

Dutta 3

Fixed Layer

Free Layer

MgO Tunnel Barrier

V+

V–

Figure 2.2. Magnetic tunnel junction (MTJ) with in-plane magnetic anisotropy (IMA).

listed in Table 2.1. The MTJ resistance RMTJ is taken as the slope(dIMTJdVMTJ

)−1during

the voltage sweep. Later in Chapter 4, a different set of scaled parameters with a TMRof 100% is assumed instead for practical realizable applications to larger scale circuits.

MTJs have been integrated into the back end of the line of CMOS processes, whichhas made possible spin-transfer torque magnetic random access memory (STT-MRAM)technology. STT-MRAM features arrays of memory cells, where each cell stores a singlebit with 1 MTJ and 1 access transistor [9].

Several types of domain wall logic devices are shown in Fig. 2.1, Fig. 2.4, Fig. 2.5,and Fig. 2.6. The physical mechanism for domain wall motion is spin-transfer torque(STT) in Fig. 2.1 and Fig. 2.5, and the physical mechanism is spin-orbit torque (SOT)in Fig. 2.4 and Fig. 2.6. Furthermore, the magnetic anisotropy of the ferromagnet isdesigned to be in-plane (IMA) in Fig. 2.1 and Fig. 2.4, and perpendicular (PMA), i.e.,out-of-plane, in 2.5 and Fig. 2.6. DW motion with STT is described in [15] and DW

Sec. 2.1. Circuit model for a magnetic device 23

Dutta 4

RMTJ1I

Quasi-DC transient analysis

forR-V curve

Figure 2.3. Resistance versus voltage (R-V) characteristic of an MTJ with IMA simulated in HSPICE.

Table 2.1. MTJ device model parameters, following the model schematic in Fig. 2.8 with RL = 0 andRR = 0

ParameterFabricatedMTJ in [3]

Scaled MTJin Fig. 2.3

Tunnel magnetoresistance(TMR = (RAP −RP )/RP )

13% 30%

Resistance RMTJ states: RAP , RP (kΩ) 0.0128, 0.0113 12.8, 9.8

Critical current to set RAP : IC,AP (µA) -1400 -16.8

Critical current to set RP : IC,P (µA) 1400 12.9

Capacitance: CMTJ (fF) 80 0.11

motion with SOT is described in [17, 18].In the device in Fig. 2.4, an electronic charge current density in the bottom heavy

metal layer, typically made of Ta or Pt, leads to a spin current density that moves theDW in the free magnetic layer above. SOT with IMA is described in [19].

Out of the devices shown in Fig. 2.1, Fig. 2.4, Fig. 2.5, and Fig. 2.6, the mostattractive is the device with SOT in PMA in Fig. 2.6, because they are practical tofabricate and the minimum threshold current densities are in the range of 1011 to 1012

A/m2 with current research suggesting that current densities can be further reduced to109 A/m2 for very low energy switching [17, 18, 19, 20].

2.1 Circuit model for a magnetic device

Figure 2.7 shows the circuit model used for a magnetic domain wall device.

Magnetic Domains

Dutta 2

PinnedV–

PinnedV+

Tunnel junction output

Domain wall

Figure 2.4. Magnetic domain wall logic device based on spin-orbit torque (SOT) with IMA.

Figure 4.2: Cartoon of the device with perpendicular magnetic

anisotropy.

t

Figure 2.5. Magnetic domain wall logic device based on STT with perpendicular magnetic anisotropy(PMA). This is described in [14] and the image is courtesy of J. A. Currivan-Incorvia and coauthors.

Magnetic Domains

Dutta 2

PinnedV–

PinnedV+

Tunnel junction output

Domain wall

Figure 2.6. Magnetic domain wall logic device based on spin-orbit torque (SOT) with PMA. This isthe most attractive configuration because it requires the lowest currents and is most practical to make.SOT with PMA is described thoroughly in the literature [17, 18, 19].

VMTJ

RMTJ

VL RL RR VR

Figure 2.7. The circuit model used most often for the magnetic domain wall device.

Sec. 2.2. Fanout and design for multiple stages 25

VMTJ

RMTJ

VL RL RR VR

CMTJ

Figure 2.8. An alternative circuit model for the magnetic domain wall device. This model is not usedoften because CMTJ ∼ 10−18 F in typical scaled devices, which can be neglected [2].

2.2 Fanout and design for multiple stages

The design of logic gates using the magnetic domain wall logic device is constrained byhow many subsequent devices one device can drive and how many device inputs can beaccepted by one device. The ability of one device to connect with devices on the outputis referred to as fanout, and the ability for one device to connect with devices on theinput is referred to as fanin.

In the simplest case in Fig. 2.9, one device should be able to drive another device.Since a single magnetic logic device is read and written to in different cycles, we definefanout as the number of devices a single device can write to and fanin as the numberof devices that can be read on the input of a single device.

Note that the circuits in this section are designed with negative voltage pulses, i.e.,electrons come from the negative pulses in the clock terminal of the device and exertSTT on the devices connected on the output. The left input terminal of every device,except those in the first stage, is connected to the output of another device, meaningthat there are many paths to ground from the input pulse. In the circuit model for anyone device, one could write the complete network with the schematic in Fig. 2.10 wheredevice resistances farther away from the device under consideration have diminishingeffect on the device IIN . Rather than consider all the devices connected to the single

RMTJ

Rch_l Rch_r RcontactRcontact

VCLK < 0

IIN

RMTJ

Rch_l Rch_r RbondRcontact

Rtrace

Rbond RcontactRbond+

-

RMTJ


Rtrace

Rbond

Figure 2.9. A magnetic domain wall logic device connected to drive another device, i.e., a circuit withdevice fanout 1, the same as fanin 1.


device, the schematic in Fig. 2.9 is preferred where only the nearby device resistancesthat dominate the effects on IIN are considered to model fanout and fanin.

Fanout 4, shown in Fig. 2.11, and fanin 4, shown in Fig. 2.12, would be ideal formagnetic logic devices.

The schematics in Fig. 2.11 and Fig. 2.12 are used to derive the constraints on thedevice geometry, listed as follows, to make fanout 4 and fanin 4 possible.

1. The current density JOUT in each of the output devices must be in the valid rangewhere the DW in those output devices would switch or not switch their MTJdepending on the MTJ resistance of the device being read. The minimum currentdensity expected with STT here is 1012 A/m2, but this could be lowered with SOTdevices to around 1011 A/m2 [21].

2. The MTJ must be smaller than the nanowire it is above, in the length and widthdimensions.

3. The MTJ cannot break down, meaning the voltage across it VMTJ = IINRMTJ

should be below a conservative breakdown voltage of 1 V [22].

Material parameters for the fanout and fanin analyses are taken from the valuesreported in Chapter 8 with a scaled hybrid CMOS and MTJ process. The resistivityvalues and other materials information are listed there in (8.9), (8.10), (8.11), andTable 8.3. In addition to those parameters, we assume RMTJAMTJ ≈ 1.0×10−10 Ω-m2,which is lower than the value in Chapter 8 and can be tuned across several orders ofmagnitude by the tunnel barrier thickness [23, 24].

The design space, in which up to fanout 4 and fanin 4 is possible with the materialset studied, is determined using the models in Fig. 2.9, Fig. 2.11, and Fig. 2.12. Ourmodels, with |VCLK |= 1 V, yield a design space with the following geometry constraints:

120 nm < wch (2.1)

65 nm < wMTJ < wch (2.2)

65 nm < LMTJ < wch (2.3)

where Lch was varied between 300 nm and 600 nm and could be shortened or lengthened,Lcontact was varied between 50 nm and 250 nm and could be shortened or lengthened,and the channel thickness dch was varied between 2 nm and 4 nm but could be smaller

RMTJ


RMTJ

Rch_l Rch_r RgridRcontact

Rtrace

RcontactRgrid

RMTJ


Rtrace

Rgrid

RMTJ


RtraceRtrace

...

RMTJ


Rtrace

Rgrid

RMTJ


Rtrace

Rgrid

...Rgrid

VCLK < 0+

-

IIN

Figure 2.10. A magnetic domain wall logic device configured with fanout 1, the same as fanin 1,showing how all participating devices can be included even though the model in Fig. 2.9 is sufficientlyaccurate.

RMTJ


RMTJ


Rtrace

Rbond RcontactRbond

RMTJ

Rch_l Rch_r RbondRcontact Rcontact

VCLK < 0+

-

IIN

RMTJ


RMTJ


Rload

Rload

Rload

Rload

Figure 2.11. A magnetic domain wall logic device configured with fanout 4.

RMTJ


IINRMTJ


Rtrace

Rbond RcontactRbond

RMTJ

Rch_l Rch_r RcontactRcontactRbond Rbond

VCLK < 0+

-

RMTJ


RMTJ


Rload

Figure 2.12. A magnetic domain wall logic device configured with fanin 4.


RMTJ


RMTJ


Rtrace

Rbond RcontactRbond

RMTJ


VCLK < 0+

-

IIN

Rload

Rload

Figure 2.13. A magnetic domain wall logic device configured with fanout 2.

RMTJ


IINRMTJ


Rtrace

Rbond RcontactRbond

RMTJ


VCLK < 0+

-

Rload

Figure 2.14. A magnetic domain wall logic device configured with fanin 2.

or larger. The resultant RMTJ values were in the range of 1 kΩ to 10 kΩ and Rch l andRch r values were in the range of 475Ω to 4.75 kΩ.

In order to expand the device geometries possible in the design space, one could limitthe device-to-device connectivity requirement to just fanout 2, shown in Fig. 2.13, andfanin 2, shown in Fig. 2.14. In order to allow for the narrower scaled devices presentedin the later chapters, it is most practical to require up to only fanout 2 and fanin 2.

Although CMOS logic gates typically have fanout 4 [25], such high fanout in mag-netic logic circuits would require high voltages, and thus we settle with magnetic logicdevices with fanout 2 and fanin 2.

Sec. 2.3. Clocking 29

+−VDDu(t)

AC

R L B

Figure 2.15. A resonator circuit with a resistor, inductor, and capacitor placed in each branch of theclock tree, designed to distribute the clock signal through the tree. The network between node A andnode B would be in the clock tree, which we excite here with a voltage step VDDu(t).

2.3 Clocking

The DW logic device described in [3] requires a clock signal that switches on every cycle.Furthermore, the clock signal must be distributed to all DW logic devices in a circuit.This warrants a look into the circuits that could be used to generate the clock signaland the circuits that could be used to distribute the signal across a chip containinglarge scale circuits with magnetic logic devices.

2.3.1 Clock generators

The clock signal must first be generated. Since the magnetic logic device cannot bemade to oscillate on its own, the clock signal could come either from an oscillatordevice made in the magnetic process or from established transistor oscillator circuits.A device made in the magnetic process is the spin-torque nano-oscillator, which canbe used to generate a clock signal in the GHz range [26, 27]. The alternative is touse transistors to generate a clock signal with a typical voltage controlled oscillatorcircuit using phase-locked loops as needed to increase the clock frequency to the GHzrange [25]. If a transistor process is already available in the front end of the line, it isreasonable to use transistor-based clock generators. Whether spintronic oscillators ortransistor circuits are used to generate a clock, the clock signal should be able to go upto GHz speeds to match MTJ switching speeds, and that signal must reach a devicewithout much loss.

2.3.2 Resonant clock trees

It is possible to generate a clock signal for domain wall logic using a resistor, inductor,and capacitor (RLC) network in conjunction with a magnetic tunnel junction (MTJ).

We propose using on-chip inductors for the resonant clocking of 3-terminal devices.The circuit schematic in Fig. 2.15 shows a network, between node A and node B, whichwould be inserted into the clock tree, and a voltage step is applied to that network tomeasure its response.

The driven RLC circuit with MTJs is shown in the schematic in Fig. 2.16. This


+−VDDu(t)

AC

RRMTJ L B

Figure 2.16. A resonator circuit with an MTJ, resistor, inductor, and capacitor placed in each branchof the clock tree, designed to distribute the clock signal through the tree. Sharp transitions in the valueof RMTJ from RAP to RP and vice versa help sustain oscillations. As in Fig. 2.15, the step response isseen to the network between node A and node B that would be in the clock tree.

Figure 2.17. (a) The current in a step response of a clock distribution network represented as an RLCnetwork with no MTJ, seen in Fig. 2.15. (b) MTJ resistance and (c) the current in the step response ofa clock distribution network represented with an RLC network with an MTJ. The circuit in Fig. 2.16is measured in (b) and (c), where an added MTJ drives more oscillations.

circuit schematic also has a network between node A and node B to which the stepresponse is measured, but the network includes an MTJ. SPICE simulations in Fig. 2.17show that up to 5 square wave pulses output from an MTJ are possible with a voltagestep response with RAP = 45Ω, RP = 18Ω, R = 1Ω, C = 1 pF, and L = 1 nH. Theclock speed in those simulations is 10 GHz.

Specific device dimensions need to be known since geometry matters in the designof spiral inductors and clock buffer network. The inductance can be calculated for aphysical geometry as shown in [28].

Sine waves are better for full resonance, but square waves are still aided by resonance

Sec. 2.4. Pipelining 31

from added inductors [29, 30].The improvement from adding a resonant clock tree is seen in Fig. 2.17.One could combine resonant clocks with constrained pipelining, limiting the number

of stages and using pipelining for all logic.A major issue with resonant clock trees driven by a switching MTJ is that the

required inductors and capacitors have large values and must match with the MTJswitching times.

2.4 Pipelining

Magnetic domain wall logic devices are ideal to pipeline when putting into circuits,because the devices are logical elements as well as storage elements. Any multi-stagecombinational logic can be pipelined because for any stage, data stored from the pre-vious stage is processed in the current stage. The stages can be divided into pairs of2 stages, where each pair has the first stage being read to write the following stage.This enables input to be fed in and output to come out every 2 cycles, regardless of thedepth of the logic. Thus, the pipeline depth is 2.

Furthermore, considering that a single device is essentially a flip-flop, the logic caninclude sequential elements in addition to the combinational elements and still have lowlatency from efficient pipelining.

A similar logic device known as all-spin logic has also been designed for circuitpipelining, using the fact that a single device can store its state between cycles [31].

The logic circuits with magnetic domain wall logic devices designed in [14] canoperate in a pipelined fashion, though we focus on the most complex design presentedin that work in the next section.

2.5 Digital design for medium and large scale integration

A full adder design with the magnetic domain wall logic device is presented in [14] andis shown in Fig. 2.18. This full adder has 3 clocks, meaning that in every 3 consecutivestages, one stage is in isolation, another is being read, and the remaining stage is beingwritten to. In this case, the pipeline depth is 3. The result of simulating the full addercircuit shows, from the logical values of inputs A, B, and C, and outputs F and S, thatA+B + C = 2F + S [14].

2.6 Experimental testing

Figure 2.19 shows a magnetic domain wall logic device that we fabricated and testedin the lab. We showed that the device performs a logical NAND correctly. The experi-mental details are in [2] and [3].

4.5 Full Adder Simulation

Figure 4.13: Full adder. (a) Circuit diagram. (b) Current transients

for the input and output bits showing the full adder truth table.

B

A

F

S

C

Clk 2 Clk 1 Clk 3 Clk 1 Clk 2 Clk 3 Clk 1

0

6

0

6

0

6

0

6

0

6

(0) (0) (0) (0)

(0) (0) (0) (0)

(0) (0) (0) (0)

(0) (0) (0) (0)

(0) (0)

(0) (0)

Cur

rent

(μA

)

Input Time, Output Time – 21.3 (ns)

A in

B

in

C in

S

out

F

out

(1) (1) (1) (1)

(1) (1) (1) (1)

(1) (1) (1) (1)

(1) (1) (1) (1)

(1) (1) (1) (1)

20 40 60 80 100 120 0

Figure 2.18. Full adder with magnetic domain wall logic devices, where each NAND or NOT gateis a single device. This is described in [14] and the image is courtesy of J. A. Currivan-Incorvia andcoauthors.

OUT

IN CLK

DW W

MTJ

HB

HI

y

x

Figure 2.19. Experimental prototype of the magnetic domain wall logic device. This is described in[3] and the image is courtesy of J. A. Currivan-Incorvia and coauthors.

Sec. 2.7. Scaling and improvements with spin orbit torque 33

2.7 Scaling and improvements with spin orbit torque

The MTJ and 4 flavors of 3-terminal magnetic domain wall devices are presented inthis chapter. The most promising of the 4 devices is that based on SOT with PMA inFig. 2.6 because the materials physics leads to a reduction of required electrical currentdensity, from ∼ 1012 A/m2 with STT to ∼ 1011 A/m2 with SOT, to move the DW orultimately switch the magnetic layer [2, 32, 33]. Effects relevant to SOT including thespin Hall effect (SHE), the Rashba effect, and the Dzyaloshinskii-Moriya interaction(DMI) are discussed extensively in other literature [2, 17, 18, 19, 34, 35]. The essentialdifference between STT and SOT in how DWs are moved is that in STT devices theelectrical current along the magnetic layer moves the DW and in SOT devices theelectrical current in a heavy metal layer underneath the magnetic layer primarily movesthe DW.

Although the mechanisms to move DWs are an active area of research, the physicaltheory of magnetic domains and DWs is understood well [36]. The theory about DWsmatters to device design, and this is explored in Chapter 3.


Chapter 3

Micromagnetic Modeling ofNanowires with Edge Roughness

MAGNETIC domain walls are significantly affected by the edge of the magneticnanowire in which they reside, especially when the wire width is below 100 nm.

This chapter explores the effects of individual and periodic edge defects on themotion of a passing domain wall.

Parts of this chapter appear in the paper titled “Micromagnetic modeling of domainwall motion in sub-100-nm-wide wires with individual and periodic edge defects” by S.Dutta, S. A. Siddiqui, J. A. Currivan-Incorvia, C. A. Ross, and M. A. Baldo in AIPAdvances [12].

3.1 Abstract

Reducing the switching energy of devices that rely on magnetic domain wall motionrequires scaling the devices to widths well below 100 nm, where the nanowire line edgeroughness (LER) is an inherent source of domain wall pinning. We investigate the effectsof periodic and isolated rectangular notches, triangular notches, changes in anisotropy,and roughness measured from images of fabricated wires, in sub-100-nm-wide nanowireswith in-plane and perpendicular magnetic anisotropy using micromagnetic modeling.Pinning fields calculated for a model based on discretized images of physical wiresare compared to experimental measurements. When the width of the domain wall issmaller than the notch period, the domain wall velocity is modulated as the domainwall propagates along the wire. We find that in sub-30-nm-wide wires, edge defectsdetermine the operating threshold and domain wall dynamics.

3.2 Introduction

Devices based on domain wall (DW) motion in magnetic nanowires are a promisingalternative to conventional transistors for memory and digital logic applications with lowswitching energy [14, 37, 38, 39]. The logical state of the device depends on the positionof a DW, which can be read back using, for example, a magnetic tunnel junction (MTJ),as shown in the elementary logic gate of Fig. 3.1(a) [14]. In DW-based devices, the DW

35

36 CHAPTER 3. MICROMAGNETIC MODELING OF NANOWIRES WITH EDGE ROUGHNESS

must move predictably in the presence of a spin-polarized current or external magneticfield. However, in practice, DW motion can be affected by variations in nanowire widthresulting from the lithography and pattern transfer processes, for example roughnesswith amplitude around 2-6 nm reported in [40]. Consequently, the pinning of the DWby edge roughness is critical for determining the scalability of DW devices.

High-resolution patterning has enabled the fabrication of sub-100-nm-wide nano-wires. At this scale, line edge roughness (LER) of even a few nm provides significantinteractions with DWs. Studies have shown the effect of LER on DW depinning in bothin-plane magnetic anisotropy (IMA) and perpendicular magnetic anisotropy (PMA)nanowires [41, 42, 43, 44, 45, 46]. A DW can also be pinned by local variations in aniso-tropy, which can arise from misorientation between grains in a polycrystalline film orfrom other perturbations in the nanowire anisotropy, due to inhomogeneous strain forexample [42, 47]. On the other hand, LER can facilitate the nucleation of a DW or theconfinement of a DW to a region within the nanowire, which can be useful in deviceapplications [48].

Micromagnetic simulations provide a tool to study the effect of edge roughness oranisotropy variations, subject to the limitations of the discretization cell size, whichshould be similar to the exchange length. LER can be introduced into a simulation byremoving cells or blocks of cells from the edge of a nanowire in a model with cuboidal[49,50] or tetrahedral cells [41]. Other studies use a Voronoi cell geometry [51]. Despite thedifferences in approach, edge defects can only be resolved when their sizes are largerthan the cell size [41].

To date, studies on DW motion in nanowires with edge roughness have focused onwires with widths over 100 nm. As the wire width increases, the DW also becomes largerin size, making it less sensitive to small-scale edge fluctuations, and LER contributeslittle to DW pinning in wires that are wider than 200 nm [47, 52], within which artificialpinning sites are engineered in some cases [43, 53]. Studies have found that DW pinningfields increase linearly with LER, but narrower linewidths have not been explored [41,45, 54, 55]. An analytical model has shown the effect of periodic pinning potentials onthresholds for DW traversal [56]. Here, we examine the effects of defects in wires withlinewidths below 60 nm, examining both edge roughness and anisotropy variations,on the threshold and on DW velocity. We begin with a description of the modelingtechniques in Section II. In Section III, we analyze the effects of anisotropy and notchdefects on DW pinning in IMA nanowires. We explain the effect of periodic LER onDW motion in IMA nanowires and PMA nanowires in Sections IV and V, respectively.

3.3 Methods

We model magnetic nanowires with the Object-Oriented Micro-Magnetic Framework(OOMMF) software package [57]. Figures 3.1(b) and 3.1(c) show a top-down viewof the magnetization vectors in an IMA Co nanowire and a PMA CoFeB nanowire,respectively, each containing a DW, after relaxation at remanence. The shape of the wire

Sec. 3.3. Methods 37

11/28/2015

1

…

…

…

H JMove DW right with or

Figure 1: (a) Example logic device with magnetic IMA nanowire with LER. Simulation model image of DW motion in an (b) IMA nanowire with LER and (c) PMA nanowire with LER. (d) SEM image of CoFeB nanowire.

z x

y

w = 30nm

L = 900nm

JMove DW right with orH

LER

DW

DW

z x

y

(c) PMA

(b) IMA

(d) SEM image

Pinning site A

(a) Logic device OutIn ClockMagnetic tunnel junctionMagnetic nanowire

DW

Figure 3.1. (a) Schematic of a logic device made from a magnetic IMA nanowire from [14] with LERadded. Red and blue represent magnetization directions, the output is a tunnel junction with a fixedlayer (blue), and the input and clock include an antiferromagnet (not shown) to pin the magnetizationdirection of the free layer. Simulation model image of DW motion in an (b) IMA nanowire with LERand (c) PMA nanowire with LER. (d) SEM image of CoFeB nanowire and its discretization, from whichthe shapes of (b) and (c) are derived. Pinning site A labeled in (b) is the strongest pinning site thatdefined the threshold for that nanowire.

in Figs. 3.1(b) and 3.1(c) is taken from Fig. 3.1(d), which shows the discretization of ascanning electron microscope (SEM) image of a 30-nm-wide CoFeB nanowire fabricatedusing a bilayer resist lithography process [40].

We modeled field-driven DWs by initializing a DW in the wire, then measuring thethreshold for DW traversal as a function of applied magnetic field, which is parallel tothe wire for IMA or out-of-plane for PMA simulations. The threshold is defined as themagnetic field value that allows a DW to traverse the entire nanowire without beingpinned. The simulations did not include any thermal effects, which require a much morecomputationally intensive model [44, 58]. Thermal effects are generally known to lowerthe threshold field [15, 49, 59, 60], and thermally-induced changes in DW velocity canchange the positions at which a DW could be trapped in a wire with LER [49]. Oncewe determined the threshold, we calculated the DW velocity at a constant field thatis 3% above the threshold field for that nanowire. The fields were below the Walkerbreakdown limit in IMA simulations.

In our simulations of IMA Co nanowires, the cell size was 3 nm× 3 nm× 2.5 nm, andthe wire was 30, 45, or 60 nm wide with thickness 5 nm and length 900 nm, containingabout 104 cells. The damping parameter, α, described in [50], was set to 0.018, andthe saturation magnetization M sat = 1.40 × 106 A/m. A uniaxial magnetocrystallineanisotropy of 4100 J/m3 with random direction was present in each cell. This anisotropy


11/28/2015

1

0 5 100

10

20

30

40

50

Notch Depth (nm)Pi

nnin

g H

-fie

ld (

kA/m

)0 5 10

0

10

20

30

40

50

Notch Depth (nm)

Pinn

ing

H-f

ield

(kA

/m)

DW pinned

DW continues

Rectangular notch defect

Figure 2: Moving DWs are pinned by notch defects. This pinning is modeled for varied increases in the ratio of notch depth to wire width. (a) Triangular notches. (b) Rectangular notches.

w=30nm

DW pinned

DW continues

w=60nm

Triangular notch defect

w=45nm

(a) (b)

Notch on bottom side: w=30nm

Notch on top side: w=30nm w=60nmw=45nmNotch on top side:

Notch on bottom side: w=30nm

d d

Notch Depth, d (nm)Notch Depth, d (nm)

Figure 3.2. DWs are pinned by isolated notch defects. The DW core is oriented upward such thatthe DW is wider on the top of the page. We model pinning for notches of different depths placed onthe top or bottom, whose shape is (a) triangular or (b) rectangular. The top panels show a pinned DWin a 30-nm-wide wire. For the 30-nm-wide wire, the effect of moving the notch to the other side of thewire is shown. This produces a chirality filtering effect.

value was 10% of the fcc-Co anisotropy, K fcc-Co, based on the assumption that theanisotropy of the thin film is less than that of bulk [61], and the weak crystalline textureleads to an approximately random anisotropy [62]. K fcc-Co is about 10% of the hcp-Coanisotropy, K hcp-Co = 4.1 × 105 J/m3 [63, 64]. In some nanowire simulations we addedanisotropy fluctuations, in addition to the preexisting random-direction anisotropy. Ananisotropy fluctuation could correspond to a region where the crystallographic axes ofseveral grains happen to align leading to a net anisotropy. It was modeled as a regionof the wire with a local anisotropy increase that was up to K fcc-Co (or 10% of K hcp-Co),as suggested by the thin film microstructure [64, 65]. For simulations of IMA permalloy(Ni80Fe20), M sat = 7.96 × 105 A/m and a randomly-directed uniaxial anisotropy of 100J/m3 were used with other parameters the same as IMA Co. We initialized the DW farenough from the wire ends to avoid the influence of their stray fields. The relaxed DWhad a transverse structure in both Co and NiFe.

PMA nanowires were modeled corresponding to 5-nm-thick CoFeB, illustrated inFig. 3.1(c). Studies report CoFeB nanowires can be fabricated with strong PMA [66, 67].The cell size for PMA wires was 3 nm × 3 nm × 2.5 nm. We used α = 0.01, M sat =7.96 × 105 A/m and an anisotropy in the out-of-plane direction of K z = 7.82 × 105

J/m3 with an additional random-direction anisotropy in each cell of 100 J/m3 [40].

Sec. 3.4. Defect pinning in IMA nanowires 39

3.4 Defect pinning in IMA nanowires

3.4.1 Triangular and rectangular notch defects

In Fig. 3.2, we model the effect of an isolated notch, whose shape is (a) triangular or (b)rectangular, by setting zero magnetization in appropriate cells in the wire edge. TheDW is initialized on the left of the wire and moves to the right as the in-plane field isapplied. The DW core magnetization is oriented towards the top of the page such thatthe DW is wider at the upper edge of the wire where the notch is. Figure 3.2 showsthe ranges of fields at which the DW is pinned or when it continues past the notch, fordifferent notch depths, d, and wire widths, w. The notch width at the wire edge is (2d– 3) nm for d = 3, 6, 9 ... nm. The pinning fields increase by up to a factor of 2 as thewire width decreases from 60 nm to 30 nm.

As the notch depth increases, the pinning field increases, in agreement with the trendseen in [46]. For a given notch depth, the rectangular notch has a higher pinning fieldthan the triangular notch in wires that are 45 nm and 60 nm wide, but the difference issmaller in 30-nm-wide wires. Rectangular notches have a steeper profile in our model,and their higher pinning field in wider wires agrees with the observation in [55] thatin 1.4-µm-wide wires the depinning field is higher for steeper-profile notches. In thedepinning process, the DW oscillates as seen in [44] and then breaks free of the notch.

Reversing the orientation of the transverse core of the DW, or, equivalently, movingthe notch to the opposite side of the wire, increases the pinning field as predicted by[58]. The different pinning in the cases of notch depths around 3 to 9 nm is shown inFig. 3.2 for the 30 nm wide wire. This leads to a chirality filtering effect as describedin [68], where within a certain field range, DWs with one core chirality pass but DWswith the opposite chirality are pinned because the pinning is greater when the narrowerside interacts with the notch.

3.4.2 Anisotropy defects

We simulate anisotropy defects as regions of the wire with an increase in anisotropy,∆K, of up to 10% of K hcp-Co. Anisotropy defects consisted of sections of the wire thatare 9 nm long, similar to the grain size [65]. Each section of an anisotropy defect istreated as a row of cells with higher anisotropy, which spans across the entire widthand entire thickness of the magnetic nanowire [63, 69], shown in the red-shaded regionof the top panel of Fig. 3.3.

In Fig. 3.3, we plot the pinning field vs. ∆K for each nanowire width. Althoughthe pinning strength of anisotropy defects nearly doubles when halving the nanowirewidth, the overall pinning strength of anisotropy defects is almost an order of magnitudesmaller than that of the notch defects described earlier, and hence is not expected tobe the dominant factor in determining the depinning fields for the geometry and IMAmaterial parameters studied here.


11/28/2015

1

0 10 20 30 400

2

4

6

8

10

ΔK (kJ/m3)

Pinn

ing

H-f

ield

(kA

/m)

w=30nm

DW pinned

DW continues

w=60nm

Anisotropy defect

Figure 3: (a) Moving DWs are pinned by an anisotropy defect at a grain boundary. This pinning is modeled for varied increases in anisotropy at the defect. (b) Comparison of depinning fields for experimental 60nm-wide Co nanowire and simulated 60nm-wide Co nanowire with a rectangular notch and varied local anisotropy increases up to 40 kJ/m3.

w=45nm

(a) (b)

0 10 20 30 400.025Khcp-Co 0.05Khcp-Co 0.075Khcp-Co 0.1Khcp-Co

∆K (kJ/m3)

Rectangular notch defect with local anisotropy variations

Range for fabricated nanowires

d

Notch Depth, d (nm)

Figure 3.3. (a) Moving DWs are pinned by an anisotropy defect, shown as a shaded rectangle in thetop panel. This pinning is modeled for various anisotropy magnitudes. (b) Comparison of the range ofdepinning fields for experimental 60-nm-wide and 80-nm-wide Co nanowires and simulated 60-nm-wideCo nanowires with a rectangular notch, varying the notch depth and including anisotropy increases ofup to 10% of Khcp-Co, shown as shaded rectangles around the notch in the top panel. The ranges ofdepinning fields for the experiment and model are comparable at notch depths around 6 – 14 nm.

3.4.3 Comparison to experimental results

The depinning fields observed in experimentally fabricated 5-nm-thick and 60-nm-wideor 80-nm-wide Co nanowires are in the range of 16 kA/m to 32 kA/m. The LERamplitude is up to ∼7 nm, or 10% of the wire width, found by analyzing nanowire SEMimages using the method of [40].

The experimental depinning fields can be compared with modeled depinning fieldsfor a 60-nm-wide Co wire with a single notch and distributed anisotropy defects, shownin Fig. 3.3(b). Distributed anisotropy defects were modeled as blocks of 3 cells × (3 to11 cells), i.e. 9 nm wide and 9 to 33 nm long, subject to an increased anisotropy ∆K =1% to 10% of K hcp-Co in the direction along the wire length. Ten of these defects withvarying size and ∆K were placed within the wire around the notch. Notch depths of6 – 14 nm resulted in depinning fields that were comparable to the range of measuredvalues.

Furthermore, we model Co nanowires with an edge profile determined from dis-cretized SEM images such as the nanowire in Fig. 3.1(d), with and without local aniso-tropy variations modeled as blocks of cells of size (2 to 4 cells) × (2 to 11 cells) withinthe nanowire with ∆K values uniformly distributed from 1% to 10% of K hcp-Co. Thepinning field for such nanowires increases by about 2 kA/m, from 25 kA/m to 27 kA/m,as a result of adding the local anisotropy variations. Hence, based on Figs. 3.2 and 3.3,

Sec. 3.5. DW motion in IMA wires with periodic notches 41

LER is confirmed to be a significant source of pinning in nanowires that are 30 to 60nm wide while anisotropy defects increase the pinning field by a lesser amount.

3.5 DW motion in IMA wires with periodic notches

To understand how a DW interacts with the series of notches present at a rough edge,we identify the peak spatial frequencies along the nanowire edge from a spectral analysisof the edge deviation seen in an SEM image of a nanowire. The peak spatial periods forour example nanowire are 200 nm with the highest peak, 91 nm with a smaller peak, 47nm, 29 nm, and below, determined from the analysis of a micrograph with a resolutionof 1.4 nm. We then model DW motion in wires with an array of notches, each with onespatial period and compare the DW velocity in nanowires with monochromatic edgespatial frequencies with the DW velocity in the discretized nanowire with a range offrequencies.

In simulations of NiFe nanowires with periodic notches of depth d = 3 nm, thethreshold field is 2 to 15 kA/m, well below the Walker breakdown field, which wedetermined to be 185 kA/m. The threshold in the 36-nm-wide wires with LER ishigher than the thresholds in [70] around 0.3 kA/m reported for NiFe nanowires thatare 100 to 300 nm wide, whose ratio of edge roughness to width is an order of magnitudesmaller.

Figure 3.4 shows the modeling results for DW movement under a field, H x , thatis 3% greater than threshold, as well as a definition of the notch array geometry. Asmooth wire with no defects has a threshold close to but not exactly zero due to therandom-direction anisotropy.

In periodically notched wires, the DW velocity varies periodically with distance,higher as the DW encounters a narrow section where the DW area decreases, and loweras it encounters a wider section of the wire where its area increases. The DW width,which is defined as the full width at half maximum of the magnetization componenttransverse to the wire at the wire midpoint, is 29, 40, and 50 nm for smooth Co wiresthat are 30, 45, and 60 nm wide, respectively. When the spatial period of the notchesis smaller than the DW width, the DW velocity shows less variation, as seen in the6-nm-period and 18-nm-period cases. In the 36-nm-wide wire, the DW width variedfrom 29 nm in narrow sections to 32 nm in wide sections. The DW is wide enoughto average over the fluctuations in wire width for high frequency roughness, leading toless effective pinning. In these IMA nanowires, the DW does not change chirality andretains its transverse structure while moving through the narrow and wide sections.

When d and w are increased by the same amount, oscillations in DW velocity areseen but the threshold is higher because more energy is needed for the DW to expandinto a larger increase in volume. We also plot the DW velocity for a simulated nanowirewhose shape is taken from a discretized SEM image, shown in Fig. 3.1(b,d). The DWvelocity is minimum when the DW reached a deep notch and then expanded into awider section of the wire.


Figure 4: The DW velocity in IMA nanowires varies with wire width variations, seen for these cases for fixed spatial periods. Driven at constant magnetic field 3% above the specific threshold for DW motion.

Discretized SEM image (Fig. 1(a)),H = 32.7 kA/mLER period = 96 nm, H = 14.5 kA/mLER period = 54 nm, H = 12.2 kA/mLER period = 18 nm, H = 6.8 kA/mLER period = 6 nm, H = 2.2 kA/m

Updated to 5nm thickness 10/14/2015

LER d

LER period

w......w = 36 nm

L = 900 nm

d = 3 nm

Figure 3.4. The DW velocity in IMA nanowires varies with position in periodically notched wires.The DW is driven at a constant magnetic field 3% above the threshold for DW traversal. Insets showsnapshots of the DW position at a velocity maximum and a velocity minimum for the 96 nm period.The DW velocity oscillations follow the changes in linewidth when the DW is narrower than the notchperiod.

We performed an additional set of simulations on nanowires of 3 nm thickness. Inthese cases, the DW velocity variation becomes even more pronounced, especially forthe nanowire with a notch period of 96 nm, in which the amplitude of the velocityoscillation increased by 158% when reducing the nanowire thickness from 8 nm to 3nm.

In comparison to the DWs in [71], which move at high field across periodic anti-notches above the Walker breakdown threshold, the DWs of Fig. 3.4 retain their trans-verse structure below Walker breakdown but their velocity fluctuates. The DW energyis about 1 aJ. The simulations indicate about 0.2 aJ difference in energy between whenthe DW is in a narrow or wide section, an energy provided by the applied field as the

Sec. 3.6. DW motion in PMA wires with periodic notches 43

Figure 5: The DW velocity in PMA nanowires is less affected by variations in wire width, seen for these cases for fixed spatial periods. Driven at constant magnetic field 3% above the specific threshold for DW motion.

Discretized SEM image (Fig. 1(b)),H = 200 kA/mLER period = 96 nm, H = 200 kA/mLER period = 54 nm, H = 200 kA/mLER period = 18 nm, H = 200 kA/mLER period = 6 nm, H = 200 kA/mSmooth (d = 0), H = 200 kA/m

Updated to 5nm thickness 10/16/2015

LER d

LER period

w......w = 36 nm

L = 900 nm

d = 3 nm

Figure 3.5. The DW velocity in PMA nanowires driven at a constant magnetic field. The same fieldis applied in all wires, higher than Walker breakdown. There are small fluctuations in velocity withposition along the wire except for a period of 6 nm and below.

DW expands into the wide sections. The results here show that even small 3-nm-deepperiodic variations in the wire width can lead to a modulation in DW velocity, with thevelocity modulation most pronounced for periods greater than the DW width.

3.6 DW motion in PMA wires with periodic notches

Figure 3.5 shows an analogous model study of CoFeB PMA nanowires with a periodicnotch structure with d = 3 nm and w = 36 nm. The Walker breakdown field is 10kA/m for a smooth wire. The threshold field is close to zero in the smooth wire, butthe threshold field exceeds the Walker breakdown field when there are periodic notchdefects. The DW is therefore moving in the precessional regime with its core orientationchanging, in Fig. 3.5.


The velocity varies periodically as the DW encounters changes in width, but thechanges in velocity are less than 20% for the notch geometries examined here, a smallermodulation than seen in the IMA model of Fig. 3.4. The velocity modulation is similaracross the range of periods. The DW velocity decreases when d and w are increasedby the same amount, not shown in Fig. 3.4. The velocity modulation is consistent with[41], which shows oscillations in DW velocity when a DW in a rough PMA nanowire isdriven above Walker breakdown.

Since the DW is in Walker breakdown in the PMA nanowires, the DW core ori-entation rotates. In a smooth wire, this rotation has a period of 20 nm and the DWvelocity is constant, increasing with width. Due to the rotation of the DW, the DWis more likely to get pinned in one orientation compared to another. This means thateven if a DW continues past one notch period, it could be pinned in a subsequent perioddepending on its core orientation there. We drive the DW at H z = 200 kA/m in allPMA wires, which is sufficiently high to avoid any pinning. Hence, the changes in DWvelocity are not only due to the periodic notches but also due to the rotating DW.Figure 3.5 shows that the DW velocity varies by about ±3 m/s as the DW traversesthe larger period notched wires.

In Fig. 3.5, we plot the DW velocity for a simulated nanowire with the same shapeas the discretized SEM image in Fig. 3.1(c,d). Its threshold field, which is determinedby the most effective pinning site along the wire labeled as Pinning site A, is higherthan that for the periodic notches, but the fluctuations in velocity are similar to thoseof the longest period notch array.

Our modeling shows that for the parameters explored here, LER leads to smallerfluctuations in the velocity of DWs under an applied field in the PMA case comparedto the IMA case. Comparing the same discretized geometry wires for IMA (Fig. 3.1(b))and PMA (Fig. 3.1(c)), the lowest DW velocity occurs at the same location, labeled asPinning site A in Fig. 3.1(b). However, the velocity dip associated with Pinning site Ais an 82% reduction for the IMA case and only a 7% reduction for the PMA case.

3.7 Conclusions

In Co nanowires with IMA, we find with micromagnetic modeling that the line edgeroughness is an important source of pinning as the linewidth decreases to 30 nm, whilefluctuations in anisotropy also contribute to pinning. The pinning increased as linewidthdecreased and depended on the notch shape. In NiFe wires with a periodic array ofshallow notches, the DW velocity fluctuated with a periodic oscillation as the DWencountered the changes in linewidth. In these IMA wires, the DW became less sensitiveto roughness as the period of the notches decreased below the DW width. In thePMA case for CoFeB, the DW was narrower and interacted even with high frequencyroughness. The micromagnetic modeling of Co nanowires with defects predicted pinningfields of similar magnitude to those measured experimentally, showing that our modelingcan be applied to physical devices. Models with discretized wire images could further

Sec. 3.7. Conclusions 45

explain actual pinning sites observed in experiment. Further fabrication advances thatlower LER will lower the threshold field or current density needed to translate DWs.


Chapter 4

Reconfigurable Digital Logic withNonvolatile Magnetic Devices

MAGNETIC tunnel junctions store information even when they are powered off.This makes it possible to make reconfigurable circuits where the configuration

information for the circuit is stored in MTJs. This chapter shows how digital CMOScircuits can be reconfigured or tuned by setting nonvolatile MTJ elements.

We introduce nonvolatile, programmable resistances that we call MTJ trimmersinto digital circuits. This allows digital logic blocks to be reconfigured to have loweror higher input-to-output delay for system-level improvements such as removing timingviolations. The main advantage of digital MTJ trimmers is that changes to propagationdelay can be made without interrupting regular circuit operation and without losingstate when turned off. Prior work has shown programmable delay elements, in the formof capacitors on an output selected by transmission gates [72, 73]. Coarse adjustmentsin delay time can be made by introducing extra buffers into a path [74]. In another case,multiplexers select the best of many clock signal lines that each has a different latency[75]. We complement these various delay-programming techniques by using MTJs fornonvolatile delay programming.

4.1 The analog MTJ trimmer

An MTJ can be used directly as a variable resistor in a CMOS circuit to tune thecircuit properties. We refer to this tuning modification with the addition of MTJs asthe analog MTJ trimmer. Since the MTJ resistance RMTJ can be programmed, theresistance can be put in series with the pull-up or pull-down path of a logic gate tomodify the propagation delay of the logic gate, which is directly proportional to theresistance of the pull-up or pull-down path. While RMTJ would be a passive elementthat does not change value during the regular operation of the logic gate it tunes,RMTJ could be programmed by a separate programming path activated outside ofregular logical operation where higher write currents in one of both directions could beapplied to program the MTJ resistance.

If an MTJ is to be used as a variable resistor but with only two terminals, the voltageand current levels must be different for the resistance programming operation and the

47

48 CHAPTER 4. RECONFIGURABLE DIGITAL LOGIC WITH NONVOLATILE MAGNETIC DEVICES

Analog MTJ Trimmer Operation• Program at high current magnitudes• Operate circuit at ~10x less current

Programming current

Programming current

Normal circuit operation

Figure 4.1. Voltage levels for the programming mode and the regular operation mode of an MTJ, asdetermined from the resistance-voltage characteristic determined from an HSPICE simulation.

resistance reading operation. Figure 4.1 shows the two different regions of operationfor an MTJ.

As it turns out with present-day commercial CMOS and MTJ process technologies,process variations tend to be greater with the MTJ process [76]. In other words, thedirect use of RMTJ as a tunable circuit element in the analog MTJ trimmer may leadto more unwanted variations. For this reason, in the next section we consider instead atrimmer design with the tuning configuration stored in MTJs but where MTJ processvariations would be decoupled from the tuning of logic circuits.

Example analog MTJ trimmer circuits are shown in Fig. 4.2, where a differentialpair is modified. The design here is similar to a previous floating-gate trimmer design[77].

4.2 The digital MTJ trimmer

In order to make sure that process variations are contained to that arising from CMOSvariations, a digital MTJ trimmer is more attractive for practical applications. Therest of this chapter focuses on the digital MTJ trimmer.

Parts of this chapter appear in the paper titled “Nonvolatile Online CMOS Trim-ming with Magnetic Tunnel Junctions” by S. Dutta, M. Price, and M. A. Baldo in theIEEE/ACM International Symposium on Nanoscale Architectures [4].

Sec. 4.3. Introduction 49

VG1 VG2

Vτ

RMTJ1 RMTJ2

VGP1

VSP1

VDD

VDD VDD

VDD VDD

Rτ

R1 R2

Differential pair

I1 I2

Programming access transistor

VG1 VG2

Vτ

RMTJ1 RMTJ2

VGP1

VSP1

VDD

VDD VDD

VDD VDD

Rτ

R1 R2

VGP1

VGN1VSN1

VDD

VGN1 VGN2

VSN2

VDD

VGN2

(a) (b)

Differential pair

Figure 4.2. Example analog MTJ trimmer circuits for a differential pair. Although the design maywork in theory, process variations in the MTJ and the difficulty of getting sufficient programmingcurrent in selected MTJs makes this design impractical. (a) Only a single device is used to provideprogramming currents to the two devices, using the devices in the trimmed circuit to aid with selection.(b) Transmission gates are used to program each MTJ with more accurate currents in the desireddirection.

4.2.1 Abstract

We design a programmable output delay for digital circuits, making use of the non-volatile bit storage in magnetic tunnel junctions (MTJs), which are available in hybridprocesses with CMOS. We introduce our programmable clock buffers in VLSI dot prod-uct and fast Fourier transform (FFT) circuits at the 45 nm node. We reduce the FFTclock skew by 39%. These performance improvements and the ability to reduce timingviolations come at less than a 1% increase in area and power.

4.3 Introduction

The integration of magnetic tunnel junctions (MTJs) into modern CMOS processes hascreated a pathway to proceed beyond CMOS scaling limits toward low-power circuitssuch as spin-transfer torque magnetic random access memory (STT-MRAM) [78, 79, 80].An MTJ is put in a low- or high-resistance state by a write current, and can enableone bit of nonvolatile memory [79]. MTJs allow for improved energy-efficiency overCMOS because they can be powered off when not in use [80]. While other nonvolatiletechnologies such as resistive RAM could be used, MTJs have the potential for highdensity and have been integrated well in hybrid processes with STT-MRAM and CMOS[79, 81, 82]. Nonvolatile bits could be used to configure CMOS circuits as well.


This study introduces a trimmer based on MTJs while being resistant to CMOSand MTJ process variations. The trimmer design here complements and adds to thedelay-programming techniques presented in [72, 73, 74, 75, 77] in prior work.

We use NMOS and PMOS switches as the variable resistance because they trackchanges in process, voltage, and temperature (PVT) that affect the rest of the circuit.We use MTJs as used in STT-MRAM to store configuration bits for a programmableresistance. We could have used MTJs directly as variable resistors, but they would havehigh leakage current and follow different variation mechanisms.

This paper describes the design and application of nonvolatile digital trimmer cir-cuits. In Section 4.4, we design nonvolatile digital trimmers for static CMOS gatesand logic-in-memory. In Section 4.5, we illustrate MTJ-based delay trimming in 45 nmCMOS standard cell designs. Finally, Section 4.6 concludes the paper.

4.4 Nonvolatile digital trimming

Very large-scale integrated (VLSI) digital circuits often have signals that are distributedacross a chip. Such signals require precise or synchronized timing to their loads. Onecommon example is clock trees, which are high-fanout nets with multiple levels of buffersthat drive groups of registers throughout the chip. We propose a method to programchanges to the propagation delay of these buffers. If PVT variations change propagationdelays in a certain part of the chip, one could use our trimmers to program an opposingchange in delay and restore matching with the rest of the chip. This improves toleranceto local variation without disrupting functionality.

We now show how digital trimming can be performed for static CMOS and logic-in-memory circuits in a hybrid CMOS and MTJ process.

4.4.1 MTJ bit cell for static CMOS trimming

We use MTJ bit cells to provide the inputs to the programmable delay networks fordigital circuits. An MTJ bit cell stores one bit with one or more MTJs, augmentedwith transistor circuits to read and write the MTJ state. MTJs can be read using asense amplifier, as shown in [82]. Pre-charged sense amplifiers are often used to readMTJs [80]. In contrast to MTJ bit cells such as the flip-flop in [83], we create such acircuit with minimal area overhead and minimal static power dissipation without muchconcern for the dynamic power dissipated infrequently for sensing and writing.

Figure 4.3 shows our MTJ bit cell circuit. Our design is based on a specific pre-charged sense amplifier topology known as Chung’s sense amplifier, because it providesan equalization device, M7 in Fig. 4.3, and trades off speed to minimize power. This isdesirable for configuration applications where the state changes infrequently [84]. Eachstored bit requires one MTJ and sense amplifier, similar to designs in [80].

Static CMOS logic gates may be trimmed by adding the additional circuit elementsshown in Fig. 4.4. We connect a programmable resistance in series with the pull-downnetwork (PDN) of the logic gate and another programmable resistance in series with

Sec. 4.4. Nonvolatile digital trimming 51

write_mtj_n+(BL2)

write_mtj_n− (BL1)

write_enable(SL)

write_enable(SL)

access(WL)

access(WL)

sense_enable(SAE)

read_enable(EN)

out

out

M1

M3

M2

M4

M5 M6

M8 M10M9

M12M11 M14M13

M7

CL CL

CBCBRMTJ RREF

VDD

VDD

w=w0

w=w0 w=2w0 w=2w0

w=2.5w0w=2.5w0

w=10w0

w=w0

w=w0

w=w0 w=w0

w=3w0 w=w0 w=3w0RMTJ

Sets toRP

Sets toRAP

RMTJ is set by the direction of the write current:

Figure 4.3. MTJ bit cell: The MTJ is put in a low-resistance (RP ) or high-resistance (RAP ) statedepending on the write current direction, set digitally. The sense amplifier design to read RMTJ isbased on [84]. Transistors use the minimum width, w0 = 120 nm, when possible to reduce area.

the pull-up network (PUN) of the logic gate. The PUN programmable resistance is tiedto the power supply VDD and the PDN programmable resistance is tied to ground VSS .The PDN programmable resistance is a set of parallel binary-coded NMOS switches,while the PUN programmable resistance uses PMOS switches. This is one way to havean adjustable circuit propagation delay.

We choose the transistor relative widths in Fig. 4.4 to create a 3-bit conductancescale. Our default configuration is to have M1 programmed on, M2 programmed off,M3 programmed off, and M4 always on. This corresponds to an input code of 1,0,0.This allows one to re-program to lower and higher delay values with the same gran-ularity. The resistance values are determined by the parallel combination of enabledtransistors in the programmable resistances. The programmed resistance decreases withincreasing input code from 0,0,0 to 1,1,1. The resistance adjustment of this net-work is asymmetrical, as shown in Fig. 4.6(a). There is more latitude to slow downthe gate than to speed it up. Other transistor geometry schemes could be devised foradjustable delay; e.g., evenly spaced values, or more commonly used values [73]. Ourbinary scheme allows for a monotonic decrease in delay as the programmed input codevalues are increased.


Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Row Buffer

...

Address Decoder

VGC3

MTJ Bit Cell

MTJ Bit Cell

MTJ Bit Cell

VOUT

VG1

VG2

VG3

CL

sense_enable

out

out

out

out

out

out

w=wN0

w=2wN0

w=4wN0

read_enablewrite_enable

write_mtj_n+/n-access

sense_enableread_enable

write_enablewrite_mtj_n+/n-

access



access

VGC3

MTJ Bit Cell

MTJ Bit Cell

MTJ Bit Cell

CL

sense_enable

out

out

out





access



access

w=wN0

VDD

VOUT

Inputs

Inputs

Controller

STT-MRAM

Other Compute Logic

Subarray Compute Logic or Other Compute Logic

Initially, the Controller sets:B1, B2, B3 = 1, 0, 0

Next, the Controller reads theProgram Bit Subarraysto set B1, B2, B3

The Controller drives CLK1, CLK2, and CLK3

B1

B3

B2

CLK1

CLK2

CLK3

Sense Amp.

MTJ Array

Col.

Mux

.W

rite

Dri

vers

Compute Logic

Memory Subarray

Sense Amp.

MTJ Array

Col.

Mux

.W

rite

Dri

vers

Compute Logic

Memory Subarray

...

Sense Amp.

MTJ Array

Col.

Mux

.W

rite

Dri

vers

Compute Logic

Program Bit Subarray

Sense Amp.

MTJ Array

Col.

Mux

.W

rite

Dri

vers

Compute Logic


VGC1

VDD

VDDVGC2VDD

VDD

w=wP0

w=2wP0

w=4wP0VDD

w=wP0

Pull-Down Network

M3

M1

M2

M4

VGC1

VDD

VDDVGC2VDD

VDD

w=wP0

w=2wP0

w=4wP0VDD

w=wP0

M3

M1

M2

M4

Pull-Up Network

Pull-Down Network

M5

M6

M7

M8

VGC3

VOUT

VG1

VG2

VG3

CL

w=wN0

w=2wN0

w=4wN0 w=wN0

VDD

Inputs

VGC1

VDD

VDDVGC2VDD

VDD

w=wP0

w=2wP0

w=4wP0VDD

w=wP0

M3

M1

M2

M4

Pull-Up Network

Pull-Down Network

M5

M6

M7

M8

Figure 4.4. Digital trimmer applied to a generic static CMOS logic gate: Complementary resistancesare put above the PUN and below the PDN. Certain parts of the PUN or PDN may bypass theprogrammable resistances.

4.4.2 Trimming global clock tree buffers using MTJ bit cells

In standard cell digital designs, clock trees are constructed by the placement and routingtool from a collection of inverters and buffers of varying size. These cells are modifiedfollowing the static CMOS digital trimming technique shown in Fig. 4.4.

We show how to design large VLSI blocks such that their global clock tree can bemodified in different sections. A global clock network includes a hierarchy of buffers,where the first stage of buffers are spread across large areas of a chip, and the laststage of buffers are densely packed to supply the clock directly to a group of sequentialelements (e.g. registers). Programmable buffers can be introduced at any level ofthis hierarchy. For example, programmable buffers near the top levels of the clock treeinfluence the clock latency over a large region. Delay adjustments can be used to correctsetup or hold violations, whether by reducing the clock skew or introducing useful skew.

The programmable clock buffer schematic is shown in Fig. 4.5. Figure 4.6(b) showsthe amount that the delay can be changed. Only the first of the two inverter stages in

Sec. 4.4. Nonvolatile digital trimming 53

VGC1

VDD

VDDVGC2VDDVGC3

VDD

MTJ Bit Cell

MTJ Bit Cell

MTJ Bit Cell

VOUT

VG1

VG2

VG3

CL

sense_enable

out

out

out

out

out

out

w=wP0

w=2wP0

w=4wP0

w=wN0

w=2wN0

w=4wN0





access



access

VDD

w=wP0

w=wN0

VDD

VDD

w=1.08wP0

w=1.08wN0

VDD

w=4.34wP0

w=4.32wN0

VDD

VIN

Figure 4.5. Programmable clock buffer schematic: wP0 = 1.8 µm and wN0 = 1.2 µm for high drivestrength.

the buffer is trimmed, in order to preserve the drive strength of the second (output)inverter stage. The transistors for the programmable resistance must be at least aswide as those in the PUN or PDN that they are connected to.

Initially the clock buffers are programmed with the default configuration of 1,0,0.The designer can choose which clock buffers to adjust (faster or slower) based on statictiming analysis or automated in-situ measurements [85]. The programming bits are thenwritten to the MTJ bit cells once. The circuit should then work optimally without anyneed for re-programming. The option to re-program the bits online during operationstill exists, nevertheless.

0 , 0 , 0 0 , 0 , 1 0 , 1 , 0 0 , 1 , 1 1 , 0 , 0 1 , 0 , 1 1 , 1 , 0 1 , 1 , 1 2 0

3 0

4 0

5 0

0 1 0 2 0 3 0 4 0 5 0 6 0 7 00 . 00 . 20 . 40 . 60 . 81 . 0

0 1 0 2 0 3 0 4 0 5 0 6 0 7 00 . 00 . 20 . 40 . 60 . 81 . 0

( b )

P r o c e s s C o r n e r : T y p i c a l F a s t S l o w

Propa

gatio

n Dela

y (ps

)

P r o g r a m m a b l e N e t I n p u t C o d e

V D D = 1 . 0 VC L = 1 f F

( a )

0 . 8

1 . 0

1 . 2

1 . 4

1 . 6

1 . 8

Norm

alized

Stag

e 1 Re

sistan

ce (a

rb. un

its)

V OUT (V

)

T i m e f r o m V I N 1 →0 T r a n s i t i o n ( p s )

P r o g r a m 1 , 0 , 0 P r o g r a m 0 , 0 , 0 P r o g r a m 1 , 1 , 1 O r i g i n a lT u n a b l e

r a n g e

T u n a b l er a n g e

V OUT (V

)

T i m e f r o m V I N 0 →1 T r a n s i t i o n ( p s )

P r o g r a m 1 , 0 , 0 P r o g r a m 0 , 0 , 0 P r o g r a m 1 , 1 , 1 O r i g i n a l

Figure 4.6. (a) Relative programmed resistance and propagation delay for the programmable clockbuffer of Fig. 4.5, in different CMOS process corners. (b) Clock latency in the programmable clock bufferin the typical process for the 1→ 0 and 0→ 1 output transitions for default, minimum, and maximumprogrammed delays, showing nearly uniform transition times. The original non-programmable clockbuffer latency is added for reference.

Sec. 4.5. Performance analysis 55

4.4.3 Trimming STT-MRAM logic-in-memory

The concept of logic-in-memory has been used to address the processor-memory bot-tleneck in data-intensive computations [86]. Logic-in-memory circuits can be designedaround any memory technology, whether it be SRAM or an STT-MRAM such as [87].

When using non-volatile memory, configuration bits for programmable resistancescan be stored in a subset of the memory array, thus sharing sense amplifiers with otherdata and reducing the area overhead of the trimming technique. This approach isillustrated in Fig. 4.7.

CMOS logic gates, whether they are inside the memory or separate, can be trimmedas shown in Fig. 4.7. On the left, the memory array is partitioned into Memory Sub-arrays and Program Bit Subarrays. On the right, the trimmed CMOS gate is fed bylatches that, at startup, load values stored in the Program Bit Subarrays. This requiressome additional control logic, seen in the lower left, but eliminates the need for separateMTJ bit cells. During normal operation, the memory continues to function using onlythe Memory Subarrays.

Reconfiguration to address process variation in STT-MRAM has been addressedin [78], where blocks can be reconfigured at the architecture level. Our design couldcomplement this approach by trimming existing digital blocks to meet chip constraints.Clock trees in the STT-MRAM could be divided and trimmed in a manner similar tothe global clock trees in the previous section.

4.5 Performance analysis

In this section, we analyze timing in circuits with nonvolatile digital trimmers anddetermine the energy and area overhead of our trimming technique.

MTJs have been introduced into CMOS process technologies at the 90 nm node [88]and below. We perform simulations using the Cadence GPDK 45 nm CMOS processmodels, which are representative of the general performance trends encountered. VDD= 1.0 V in this process. Simulation models have been developed for MTJs [16, 80, 83].We use the SPICE model in [16] and modify it with the parameters in Table 4.1 torepresent a scaled version of the MTJ in [3]. We use the models in Table 4.1 to showthat the MTJ bit cell in Fig. 4.3 works in the slow, typical, and fast CMOS processcorners.

The operation of the MTJ bit cell to set its state is shown in Fig. 4.8. In thesimulation, SL enables writing, EN enables reading, SAE activates the sense amplifier,WL must be on for reading or writing, BL1 = BL2 is the bit that is written duringthe write stage, and out is the read bit output. We first write a 0 to the MTJ withBL1 = 0 and BL2 = VDD, pulsing on SL and WL for the write. The M11/M12transmission gate and M9 allow a current to pass through the MTJ in the desireddirection to change its stored bit. We then read this on out by turning on EN forsense amplifier precharge and then turning on SAE and WL while EN turns backoff. The sense amplifier output remains 0 even after EN , SAE, and WL are turned


Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Row Buffer

...

Address Decoder

VGC3

MTJ Bit Cell

MTJ Bit Cell

MTJ Bit Cell

VOUT

VG1

VG2

VG3

CL

sense_enable

out

out

out

out

out

out

w=wN0

w=2wN0

w=4wN0





access



access

VGC3

MTJ Bit Cell

MTJ Bit Cell

MTJ Bit Cell

CL

sense_enable

out

out

out





access



access

w=wN0

VDD

VOUT

Inputs

Inputs

Controller

STT-MRAM

Other Compute Logic

Subarray Compute Logic or Other Compute Logic

Initially, the Controller sets:B1, B2, B3 = 1, 0, 0

Next, the Controller reads theProgram Bit Subarraysto set B1, B2, B3

The Controller drives CLK1, CLK2, and CLK3

B1

B3

B2

CLK1

CLK2

CLK3

Sense Amp.

MTJ Array

Col.

Mux

.W

rite

Dri

vers

Compute Logic

Memory Subarray

Sense Amp.

MTJ Array

Col.

Mux

.W

rite

Dri

vers

Compute Logic

Memory Subarray

...

Sense Amp.

MTJ Array

Col.

Mu

x.W

rite

Dri

vers

Compute Logic


Sense Amp.

MTJ Array

Col

. Mux

.W

rite

Dri

vers

Compute Logic


VGC1

VDD

VDDVGC2VDD

VDD

w=wP0

w=2wP0

w=4wP0VDD

w=wP0

Pull-Down Network

M3

M1

M2

M4

VGC1

VDD

VDDVGC2VDD

VDD

w=wP0

w=2wP0

w=4wP0VDD

w=wP0

M3

M1

M2

M4

Pull-Up Network

Pull-Down Network

M5

M6

M7

M8

VGC3

VOUT

VG1

VG2

VG3

CL

w=wN0

w=2wN0

w=4wN0 w=wN0

VDD

Inputs

VGC1

VDD

VDDVGC2VDD

VDD

w=wP0

w=2wP0

w=4wP0VDD

w=wP0

M3

M1

M2

M4

Pull-Up Network

Pull-Down Network

M5

M6

M7

M8

Figure 4.7. Digital trimmer applied with memory arrays in STT-MRAM to nearby logic circuits inmemory.

off. Next, we write a 1 to the MTJ, with BL1 = VDD and BL2 = 0, with an onpulse on SL and WL. Finally, we read that 1 by turning on EN and later turningon SAE and WL while EN turns back off, changing out to 1. There is no powerdissipation besides transistor leakage when there is no sensing or writing, i.e., when SL= WL = EN = SAE = 0.

4.5.1 Static timing analysis

We use static timing analysis to identify issues that can be corrected by re-programmingtrimmed logic. We use Cadence Encounter to generate the design and use Silicon


0

1

0

10

1

0

1

0

1

0

1

13

0 . 00 . 20 . 40 . 60 . 81 . 0

0 1 0 2 0 3 0 4 0 5 00 . 00 . 20 . 40 . 60 . 81 . 0

SL

BL2

r e a d 1

r e a d 0

s e n s es e n s e

w r i t e 1 BL

1 w r i t e 0

WL

SAE

EN

R MTJ

(kΩ)

V OUT (V

)

V OUT (V

)

T i m e ( n s )

Figure 4.8. MTJ bit cell simulation: The MTJ is written a 0, then the 0 is read, then the MTJis written a 1, and then the 1 is read.

Smart and PrimeTime for timing analysis. These tools approximate signal latencywith a lumped element propagation delay model. The lumped elements include thePDN resistance RPDN , the PUN resistance RPUN , and the load capacitance CL, whichinclude parasitics in each device and in interconnects. RPDN and RPUN include theprogrammable resistances. The propagation delay, tP , is then modeled as:

tPLH = 0.69RPUNCL (4.1)

tPHL = 0.69RPDNCL (4.2)

tP = 0.5(tPLH + tPHL) (4.3)

4.5.2 Clock skew reduction in trimmed global clock trees

We analyze the impact of clock buffer trimming on two circuits of differing complexities.In these circuits, flip-flops (FFs) are the only storage elements before adding clockbuffer trimming. Each of our analyzed CMOS circuit designs is placed and routed withprogrammable clock buffers.

In Fig. 4.6(a), we show the relative programmable resistance values and tP of ourprogrammable clock buffer under each of the 8 possible configurations. The buffer has


Table 4.1. Device model parameters for the 45 nm Cadence GPDK CMOS process technology hybridwith an MTJ process technology

NMOS and PMOS from 45nm CMOS PDK

Gate length (nm) 45

Minimum width (nm) 120

MTJ Compatible with 45 nm CMOS


100%

Resistance states: RAP , RP (kΩ) 3, 1.5

Critical current to set RAP : IC,AP(µA)

-67

Critical current to set RP : IC,P (µA) 52

Length, width (nm)100,43

similar adjustability in all global process corners: slow, typical, and fast. The place androute tool works well with the case where the first inverter stage is trimmed as seen inFig. 4.5. The case where both inverter stages are trimmed by the same programmableresistances does not work as well, because the output drive strength is lower. Theprogrammable resistances for the clock buffer are designed such that the programmableclock buffer has the same output drive strength as the original clock buffer to preventmajor changes to placement and routing. This is seen in the similar output transitiontimes in Fig. 4.6(b). The transistor widths in the programmable resistances of Fig. 4.4increase by powers of two, but this comes with the trade-off that the programmableresistance values are more spread out in the higher values. The range of adjustabledelay values is sufficient for our examples, and we do not optimize the transistor widthsfurther.

In the following example circuits, we introduce CMOS process variations into se-lected regions including the programmable clock buffers and MTJ bit cells.

Dot product circuit

Our first example is a 32-bit fixed-point dot product module. With only 66 registers,this module is small enough to have a 2-level clock tree as shown in Fig. 4.9. In thisexample that we synthesize, place, and route, there are only two clock buffers. If bothbuffers operate under nominal PVT conditions, the clock latency to all registers is 43ps. However, if a circuit region including the upper clock buffer is in a fast processcorner, the clock latency to the upper registers through that buffer decreases to 37 ps.


clock

Clock BufferProgrammed

1,0,0(default)


1,0,0(default)

Registers:

Fast Process Corner:

tP = 37 ps

clock


0,0,0(slower)


1,0,0(default)

Registers:

Fast Process Corner:

tP = 43 psRe-program

MTJs

Typical Process Corner:

tP = 43 ps

Typical Process Corner:

tP = 43 ps

Figure 4.9. Clock tree in a dot product circuit: The upper clock buffer is first found to be in a fastprocess corner, which is corrected by re-programming that buffer to a slower output latency.

In this case, we can re-program the upper clock buffer to maximum resistance, whichrestores the clock latency to 43 ps. This means the registers driven by the upper bufferwill receive clock edges in sync with the registers connected to the lower buffer, even inthe presence of severe process skew.

Introducing programmable clock buffers increases the dot product circuit area by16 µm2, as seen in Table 4.2. This area increase is approximately the area of the clockbuffers.

Fast Fourier transform circuit

We also applied delay trimming to an 18-bit, 256-point radix-2 fast Fourier transform(FFT). This is a significantly more complex logic block, with 16x more logic cells and298x more registers than the dot product module. The FFT is designed to run at 0.5GHz.

After placement and routing, the FFT has a 3-level clock tree. Figure 4.10 showsthe distribution of clock latency in four scenarios. The upper left histogram showsthe nominal scenario, with a standard deviation of σ = 5.8 ps. We then introduce a


Table 4.2. Circuit area and power, before (orig.) and after (mod.) the clock buffers are replaced withthe programmable clock buffer in Fig. 4.5

Circuit Dot Prod. FFT

Orig. Mod. Orig. Mod.

Area (µm2) 11570 11586 259569 260733

No. of Cells:Logic, FF, Clk.

Buf.

3789, 66, 2 59683, 19649, 410

Clock Speed(GHz)

0.6 0.6

Static Power(µW)

0.72 4.27 21.4 152.3

Dynamic Power(mW)

9.0 9.0 161 160

worst-case process skew across the second level clock buffers, where about one-thirdare assigned to each process corner: slow, typical, and fast. This introduces additionalclock skew, resulting in σ = 16.1 ps. The bottom plots show two methods of skewcorrection. First, we program the second level clock buffers to compensate for theprocess variation. We then also program the third level buffers (which are unaffectedby process variation in this scenario) to assist the corrections being made at the secondlevel. These configuration changes result in a 39% skew reduction to σ = 9.8 ps.

MTJ delay trimming can also be used to extract higher operating frequencies froma circuit by exploiting imbalances between setup and hold slack; this technique is called“useful skew.” The effect can be visualized by plotting the histogram of negative slack(i.e. setup violations). The slack is the required signal arrival time minus the actualarrival time. The FFT has no such violations at the clock frequency of f = 0.600GHz, but such violations appear at higher f . The top plot of Fig. 4.11 shows thesetup violations that result from the overly aggressive clock speed of f = 0.625 GHz.Introducing useful skew by re-programming lower-level clock buffers eliminates 2 ofthe 25 setup violations in this scenario. The bottom plots of Fig. 4.11 show how thecorrections enable the FFT to become operational at f = 0.606 GHz. The amount ofnegative slack that can be corrected is limited by the adjustment range of the clockbuffer delay, 20 ps here.

Table 4.2 shows that the total area of the FFT increases by 0.4% after synthesizing,placing, and routing with the clock buffer of Fig. 4.5. The static power consumptionincreases by 0.13 mW when replacing 410 clock buffers with the programmable clockbuffer.

In more complex VLSI circuits beyond the FFT, there are more levels in the clock


160 170 180 190 200 210 220 230Clock Latency (ps)

0

1000

2000

3000

4000

5000

6000

σ = 5.8 ps

Nominal

160 170 180 190 200 210 220 230Clock Latency (ps)

0

1000

2000

3000

4000

5000

6000

σ = 16.1 ps

Skewed

160 170 180 190 200 210 220 230Clock Latency (ps)

0

1000

2000

3000

4000

5000

6000

σ = 11.3 ps

Level 2 Corrected

160 170 180 190 200 210 220 230Clock Latency (ps)

0

1000

2000

3000

4000

5000

6000

σ = 9.8 ps

Levels 2 and 3 Corrected

Histograms of Clock Latency

Figure 4.10. Correction of clock skew introduced by process variation within the FFT module.

tree, where more clock buffers in series provide a wider range to tune clock latency.

4.5.3 Applications to silicon debug

Costly design re-spins can be avoided using nonvolatile digital trimming. Other studiesshow how volatile delay elements can be programmed post-silicon based on character-izations [72, 85]. Statistical timing analysis, clock scheduling, and signal distributionanalysis can be done pre-silicon to identify the best logic to digitally trim. It is best ifthe range of programmable resistance values is as wide as needed to correct for knownprocess variations, and the programmable logic elements have the same drive strengthin the default configuration as the original logic element in the synthesis, place, androute flow. In post-silicon verification, the clock skew can be systematically reduced,timing violations can be eliminated, or the design can be programmed for overclocking,


70 60 50 40 30 20 10 0Slack (ps)

0

2

4

6

8

10

12No violations operatingat f = 0.600 GHz

f = 0.625 GHz

Nominal

70 60 50 40 30 20 10 0Slack (ps)

0

2

4

6

8

10

12No violations operatingat f = 0.600 GHz

f = 0.625 GHz

Level 3 Corrected

70 60 50 40 30 20 10 0Slack (ps)

0

2

4

6

8

10

12No violations operatingat improved f = 0.606 GHz

f = 0.625 GHz

Levels 3 and 2 Corrected

Distributions of Slack for Violating Paths

Figure 4.11. Useful skew fixing of setup violations for increased clock frequency. The slacks shift inthe positive direction by about 16 ps when corrected.

as we show for the FFT.

4.6 Conclusion

We present digital trimmers that configure the output delay of CMOS circuits, storingthe configuration bits in nonvolatile MTJs. Trimmed circuits in a design can be re-programmed after fabrication to reduce clock skew and resolve timing violations in theface of PVT variations. We show these improvements in example dot product and FFTcircuits with programmable clock buffers.

The configuration bits of programmable buffers could be accessible through existingscan chain infrastructure to avoid excessive I/O pins, although we have not implemented

Sec. 4.6. Conclusion 63

this.Signal distribution networks can be expanded to larger blocks with the aid of trim-

ming to ensure optimal timing for system functionality in the end. This would be usefulas chips move towards further integration and 3D stacking. We will consider furtherapplications of trimming to logic-in-memory circuits with STT-MRAM, where we ex-pect a similar area overhead because the area of a latch is similar to that of an MTJ bitcell. More complex VLSI circuits offer more levels in the clock tree for further latencyoptimization.


Chapter 5

Multi-Level Cell Magnetic TunnelJunctions

THE typical circuits with magnetic tunnel junctions (MTJs) seen thus far have onlyone bit stored in the MTJ, but more bits could be stored in a multi-level cell

(MLC) MTJ.

5.1 A theoretical MLC MTJ

There have been proposals to use a single-level cell (SLC) MTJ, designed to store one bit,as an MLC MTJ that stores multiple bits [89]. The design in [89] places two magneticdomains in the magnetic layer above the tunnel barrier and two magnetic domains inthe magnetic layer below the tunnel barrier, in an SLC MTJ to yield an MLC MTJ.Since the current values to switch each of the domains are stochastic due to unknownpinning sites and the tunnel barrier, and there are only 2 terminals, it may not bepossible to reach all of the 4 possible states in such a device. We attempt to amelioratethe former problem by connecting SLC MTJs in series and parallel configurations torealize an MLC MTJ in the next section.

5.2 Implementations of the MLC MTJ

In an MLC MTJ made with the shape of one SLC MTJ, there is a domain wall in themagnetic layers. This type of MLC MTJ based on a single MTJ is illustrated in [76]and [89].

An alternative method to construct an MLC MTJ would be to connect severalsingle-bit MTJs in series or parallel combinations to form one MLC MTJ. Differentcombinations using 4 MTJs are shown in Fig. 5.1.

An MLC with 4 parallel MTJs with series resistances is simulated as shown inFig. 5.2. MLC MTJ designs where MTJs are in parallel are better than designs whereMTJs are in series since in the former, all states can be accessed, and states are notskipped. Indeed, all 16 states are accessed in Fig. 5.2.

Although MLC MTJs may offer many states, one major concern with MLC MTJsis that probabilistic switching effects are not shown. Due to the tunnel barrier, there

65

Multiple MTJs for 1 Variable Resistor

Increasing ICIncreasing R

MTJ


MTJ

(b) 4 Parallel MTJs (c) 4 Parallel MTJs withSeries Resistances

R1RMT J1

RMT J2

RMT J3

RMT J4

R2

R3

R4

RMTJ1

RMTJ2

RMTJ3

RMTJ4

RMT J1

RMT J2

RMT J3

RMT J4


MTJ

(a) 4 Series MTJs

Figure 5.1. Circuit schematics of multi-level cell (MLC) designs with multiple magnetic tunnel junc-tions (MTJs) for one variable resistor.

Rto

tal (

k)

Rto

tal (

k)

4

1 4

1

2

33

2

Number labels indicate which MTJ is switching

Figure 5.2. MLC with 4 MTJs with series resistances, in parallel, as in Fig. 5.1(c). There are 16different resistance states in this circuit. The major and minor loops are shown. It is seen that the inputvoltage VIN must be swept across several intervals before a desired resistance state can be reached.

68 CHAPTER 5. MULTI-LEVEL CELL MAGNETIC TUNNEL JUNCTIONS

is a probability derived from quantum mechanics governing whether electrons will passthrough the tunnel barrier or not, making it a truly random process. In other words,it may be difficult to get to any given state deterministically.

5.3 Conclusions

Device designs and underlying physical mechanisms must be explored further to realizean MLC MTJ. The remainder of this dissertation shows how a DW in a nanowire canbe used to store multiple bits, explaining the underlying physical mechanisms makingthis possible and the practical applications to devices and circuits.

Chapter 6

The Information Limit for DomainWalls in Magnetic Nanowires

IT may seem that a single domain wall could be placed anywhere along an individualnanowire, though this would only be true if nanowires were perfectly smooth with no

defects. The reality is that nanowires are not perfectly smooth, but rather have line edgeroughness (LER) of a self-affine nature that results in a discrete number of positionsin which a domain wall could be placed. This chapter explores the fundamental limitto this discrete number of domain wall positions in a wire and relates the relevance ofthat limit to the resolution of device applications with magnetic nanowires.

Parts of this chapter appear in the paper titled “The Spatial Resolution Limit forDomain Walls in Magnetic Nanowires” by S. Dutta,† S. A. Siddiqui,† J. A. Currivan-Incorvia, C. A. Ross, and M. A. Baldo where † indicates equal authorship and contri-butions.

6.1 Abstract

Magnetic nanowires are the foundation of several promising nonvolatile computing de-vices, most notably magnetic racetrack memory and domain wall logic. Here, we de-termine the analog information capacity in these technologies, analyzing a magneticnanowire containing a single domain wall. Although wires can be deliberately patternedwith notches to define discrete positions for domain walls, the line edge roughness of thewire can also trap domain walls at dimensions below the resolution of the fabricationprocess, determining the fundamental resolution limit for the placement of a domainwall. Using a fractal model for the edge roughness, we show theoretically and exper-imentally that the analog information capacity for wires is limited by the self-affinestatistics of the wire edge roughness, a relevant result for domain wall devices scaled toregimes where edge roughness dominates the energy landscape in which the walls move.

6.2 Introduction

The position of a magnetic domain wall within a nanowire can be used as an informationtoken in spintronic memory and logic devices [3, 14, 38, 90]. Magnetic domain walls are

69

70 CHAPTER 6. THE INFORMATION LIMIT FOR DOMAIN WALLS IN MAGNETIC NANOWIRES

Figure 6.1. Concentric magnetic nanowire rings, each with a magnetic domain wall. The position-ing and motion of domain walls in such wires is analyzed with micromagnetic models and describedanalytically in this study.

characterized by their wall energy, and relax into minima in the energy landscape of thewire to lower their total energy [91]. When the width of a magnetic nanowire decreasesbelow ∼100 nm, domain walls are increasingly influenced by inhomogeneities at theedges of the wire that affect the energy landscape [12, 92, 93]. Thus, in principle, thenanowire fabrication process determines the edge profile and consequently the spatialresolution limit for a single domain wall positioned along the length of a nanowire.

Like all fabricated structures as in Fig. 6.1, the intrinsic line edge roughness ofnanowires is statistically self-affine [94, 95, 96]. In general, edges are self-affine if theylook smooth over large length scales, but close-up, appear like the coastlines familiarfrom fractals [95, 97]. The transition between smooth and rough length scales is definedby the correlation length, ξ, a property of the particular fabrication process used tomake the wires. The central result of this work is that the correlation length determinesthe spatial quantization in the wire because it is the shortest length scale before theamplitude of the edge roughness begins to decrease at finer resolution [94, 98, 99]. Thus,the coastline of the wire can be understood as a series of harbors of width ∼ ξ. Eachharbor contains finer edge roughness, but if the domain wall width is smaller than ξ,then each harbor roughly corresponds to a single dominant trap site for a domain wall.

6.3 Fractal model for fabricated wire edges

To analyze the impact of edge roughness, we fabricate magnetic nanowires, charac-terize their edge profiles, and compare to self-affine fractal models. Previously, theWeierstrass-Mandelbrot (WM) model has been used to describe the edges of wires,such as those of antennas [95]. The edge deviation ∆y as a function of the position x

Sec. 6.3. Fractal model for fabricated wire edges 71

along the wire length is given by the WM model in one dimension:

∆y(x) = BLER

∞∑p=1

Cpν−pHQ sin

[2π

ξνp (x cos Ψp) + Φp

](6.1)

The WM model is appropriate to model a self-affine wire because it starts with asinusoid with an amplitude BLER, defined by the line edge roughness of our fabricatedwires, and a period ξ, defined by the correlation length. Then, additional higher-frequency modes are added, with decreasing amplitude and increasing spatial frequency.The Hurst exponent, H Q = 0.3, and the seed of the geometric progression that accountsfor spectral separation, ν = 1.5, are empirical parameters taken from the antenna modelin [95]. H Q represents how much weaker higher frequency modes become. Since ν isnot unity, the spatial frequency of higher modes increases nonlinearly rather than asinteger multiples like harmonics. Furthermore, the amplitudes Cp of each mode arerandomized with a Gaussian distribution with zero mean and a standard deviation of1, and the phases Ψp and Φp of each mode are randomized with a uniform distributionon [-π, π].

Figure 6.2(a)-(c) shows scanning electron micrographs of Co wires with in-planemagnetic anisotropy (IMA), fabricated in the shape of concentric rings, L-shaped nano-wires, and straight lines, respectively, with an average width of 80 nm (Fig. 6.2(a)-(b))and thickness of 5 nm [40]. The power spectral density (PSD) of line edge roughnessin the measured wires is shown in Fig. 6.2(d). Straight lines such as Fig. 6.2(c) arecharacterized to extract the parameters: BLER = 2 nm and ξ = 255 nm, as determinedfrom the knee of the PSD at 1/ξ [100]. The concentric ring nanowires exhibit similaredge roughness, but longer correlation lengths of ξ = 0.99 µm.

For comparison with the experimentally determined edge roughness distribution ofthe Co wires, we numerically synthesize wires with self-affine statistics using (6.1). Inaddition, we synthesize wires with a random-edge profile, where the edge deviation ofeach edge cell is randomly assigned a value between −6 nm and 6 nm with a uniformdistribution. Good agreement is found between the PSD of the self-affine wires and theexperimental wires, but the random-edge wires clearly diverge from the characteristicsof real wires. This contrast highlights the key role of self-affine statistics suppressingroughness at short length scales in real wires.

The edge roughness of the wires leads to a variation in the total energy of a domainwall as a function of distance along the wire, with the consequence that a domain wallwill be trapped at certain locations as it moves along the wire in response to a mag-netic field. In order to determine the distribution of domain wall trap sites and relatedomain wall traveling distances to the applied field, we use micromagnetic models of60-nm-wide, 5-nm-thick Co wires with self-affine or random-edge roughness, discretizedby the cell size of 3 × 3 × 2.5 nm3. The Co has random magnetocrystalline anisotropyand its magnetic easy axis is dominated by shape, leading to in-plane magnetic aniso-tropy (IMA). For this wire geometry, the domain walls are transverse walls with coremagnetization lying in plane, transverse to the length of the wire, and the domain wall


Pow

er S

pect

ral D

ensi

ty(n

m2 /

m-1

)

1/ξ = (255 nm)-1

cd

H

2 µm

10 µm

y

x

a b

80 nm

Figure 6.2. Scanning electron microscope (SEM) images of (a) concentric rings and (b) L-shapednanowires. (c) SEM image of wire discretized as shown by the dotted line. (d) Power spectral densityof the line edge roughness of the discretized wire, the average of 104 synthesized self-affine wires, andthe average of 104 synthesized random-edge wires. The correlation length, ξ, is extracted from thediscretized wire and is applied to synthesize the self-affine wires.

width is similar to the wire width. Typical domain wall configurations are shown inFig. 6.9.

6.4 Resolution limit from domain wall position distributions

A single domain wall is placed at a specific location along a wire, and then allowedto relax at zero field with its final position recorded; this is repeated for all possible

Sec. 6.4. Resolution limit from domain wall position distributions 73

a

b

Position Along the Wire, x (µm)

x (nm)

LWR

(nm

)D

W C

ount

DW

Cou

nt

Figure 6.3. (a) Final positions into which domain walls relax from their initial positions of domainwall nucleation, for one wire in micromagnetic simulations. The bin width is chosen to be the size ofone simulation cell, 3 nm. Inset: Zoom of initial positions. (b) Line width roughness profile of the wireshows that the traps often correspond to local minima in width.

starting locations along the wire. Figure 6.3(a) shows the initial and final domain wallpositions for one wire, indicating that there are specific pinning sites or traps along thewire (Fig. 6.3(b)) to which the domain wall is attracted. These are typically locationsof local minima in the wire width at which the energy of the domain wall is minimized.

We define ∆x to be the distance from one discrete domain wall trap to the next onealong the wire. In other words, the ∆x values are the distances between the discretedomain wall traps in Fig. 6.3(a). Figure 6.4(a) shows the distribution of spacing betweendomain wall traps in simulations of 41 synthetically generated self-affine wires withξ = 255 nm. There is a peak, ∆x0, for self-affine wires at 85 nm ≈ ξ/3. Note thatthe distribution is non-uniform. The mean ∆xµ = 125 nm and the standard deviation∆xσ = 77 nm.

In simulations of 30 self-affine wires with a shorter correlation length ξ = 127 nm,we find ∆x0 = 54 nm, showing that ∆x0 decreases with decreasing ξ. In 30 wires with arandom-edge profile, the ∆x distribution shows that domain wall traps are more likelyto be spaced closely together. This is consistent with the higher amplitude of roughnessat short length scales in a random-edge wire.

In contrast, domain walls are narrower in wires with perpendicular magnetic aniso-tropy (PMA) [12, 93]. In Fig. 6.4(b), we simulate 30 CoFeB wires with PMA. As in theIMA simulations, we find that domain walls relax to positions distributed at discretelocations along the wire. The standard deviation of the trap locations is less than thatfound in the IMA Co wire simulations since the narrow PMA domain walls interact withonly high-frequency roughness. We further observe strong anticorrelations between thepinning sites, suggesting that the self-affine edge roughness statistics manifest similarlyin PMA and IMA.


048

1216

048

1216

Nor

mal

ized

DW

Cou

nt

0.0 0.1 0.2 0.3 0.4 0.5048

1216

Δx (μm)

Self-affine with ξ = 255 nm


Random-edge uniform within ± 6 nm


a

b

0.00 0.03 0.06 0.09 0.12 0.15 0.18048

121620

Nor

mal

ized

DW

Cou

nt

Δx (μm)

Δxµ = 125 nmΔxσ = 77 nm




Figure 6.4. Distribution of spacing ∆x between domain wall discrete positions for (a) IMA and(b) PMA wires, including wires with self-affine edge profiles with correlation length ξ and wires withrandom-edge profiles. The distributions are normalized by area.

The information density of a nanowire, i.e., the number of distinct possible locationsfor a domain wall, is determined from the average spacing between domain wall traps.We find from Fig. 6.4 that when ξ = 255 nm, the information density of IMA wiresis 1/∆xµ ∼ 10 positions/µm and that of PMA wires is 1/∆xµ ∼ 14 positions/µm.These results are expected to scale with the reciprocal of the correlation length, 1/ξ,and consequently the attainable information density fundamentally depends on theresolution of the fabrication process as long as the domain wall width is smaller thanξ. Notably, the maximum spatial resolution is below the correlation length. Thus, thenatural edge roughness of a nanowire yields a statistically higher information densitythan any fabricated pattern.

Sec. 6.5. A realistic model for domain wall motion 75

DW Distance Traveled (µm)

Nor

mal

ized

DW

Cou

nt

0.0 0.2 0.4 0.6 0.8 1.0 1.20

1

2

3

4

5

6

19

20

0.0 0.2 0.4 0.6 0.8 1.0 1.20

1

2

3

4

5

6

14

15

0.0 0.2 0.4 0.6 0.8 1.0 1.20

1

2

3

4

5

6

16

17

0.0 0.2 0.4 0.6 0.8 1.0 1.20

1

2

3

4

5

6

11

12

H = 7.0 kA/m

ξ = 0.255 µmk = 6750 µm−2

l0 = 0.086 µm

H = 7.5 kA/m

ξ = 0.255 µmk = 6500 µm−2

l0 = 0.110 µm

H = 8.0 kA/m

ξ = 0.255 µmk = 6400 µm−2

l0 = 0.125 µm

H = 8.5 kA/m

ξ = 0.255 µmk = 5300 µm−2

l0 = 0.141 µm

Figure 6.5. Distributions of distances traveled by domain walls in simulated nanowires. The samplesize is 1500 wires. The fractal analytical model is the overlaid line with parameters listed for eachapplied field.

6.5 A realistic model for domain wall motion

After identifying the zero-field pinning sites in self-affine nanowires, we apply steady-state magnetic fields to move the domain walls along 1500 micromagnetic models ofsynthetic IMA wires. A domain wall is initialized in a trap on one end of each wire, andthe traveling distance of the domain wall to its final position is then calculated afterapplying the field. Figure 6.5 shows the histograms of the distances traveled by domainwalls from an initial trap to a final trap, under four different steady-state magneticfield values. We do not perform this simulation in PMA because domain walls havea high critical field for motion, and as such operate only in the precessional regimeabove Walker breakdown and are completely detrapped, i.e., once they start movingthey continue until the end of the wire [12].

A similar procedure was conducted experimentally using the lithographically pat-terned Co wires. We apply steady-state magnetic fields and measure the distance trav-eled by domain walls in Co wires using magnetic force microscopy. Figure 6.6 shows ourexperimental results, including the magnetic force micrographs of the nanowire rings af-ter applying different magnetic fields along the wires. The head-to-head and tail-to-tailtransverse domain walls show as bright or dark contrast, respectively.

0 3 6 9 12 150.00

0.05

0.10

0.15

0.20

0.250 3 6 9 12 15

0.0

0.1

0.2

0.3

0.40 3 6 9 12 15

0.0

0.1

0.2

0.3

0.4

0.5

0.60 3 6 9 12 15

0.0

0.1

0.2

0.3

0.4

0.7

0.8

0 3 6 9 12 150.0

0.1

0.2

0.3

0.4

0.7

0.8

0 3 6 9 12 15

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0 3 6 9 12 150.0

0.1

0.2

0.3

0.4

0 3 6 9 12 150.00

0.05

0.10

0.15

0.20

0.25

DW Distance Traveled (µm)

Nor

mal

ized

DW

Cou

nt

20 kA/m

x

y

2 µm

24 kA/m

28 kA/m

32 kA/m

a b c

H = 20 kA/m

ξ = 0.99 µmk = 325 µm−2

l0 = 2.25 µm

H = 24 kA/m

ξ = 0.99 µmk = 300 µm−2

l0 = 3.81 µm

H = 28 kA/m

ξ = 0.99 µmk = 205 µm−2

l0 = 5.32 µm

H = 32 kA/m

ξ = 0.99 µmk = 150 µm−2

l0 = 6.07 µm

H = 20 kA/m

λ = 0.40 µmb = 0.01 µm−1

l0 = 2.25 µm

H = 24 kA/m

λ = 0.40 µmb = 0.02 µm−1

l0 = 3.18 µm

H = 28 kA/m

λ = 0.40 µmb = 0.06 µm−1

l0 = 3.40 µm

H = 32 kA/m

λ = 0.40 µmb = 0.07 µm−1

l0 = 6.07 µm

Figure 6.6. (a) Magnetic force micrographs of domain walls in concentric rings with applied magneticfields of 20 kA/m, 24 kA/m, 28 kA/m, and 32 kA/m along the y-axis. The black and the white arrowsindicate the positions of two separate domain walls in two different nanowires at different magneticfields. (b) Fractal analytical model and (c) exponential analytical model overlaid on distributions ofdistances traveled by domain walls in experiment. The sample size is 94, 94, 83, and 134 wires inincreasing order of applied fields.

Sec. 6.5. A realistic model for domain wall motion 77

We propose an analytical model for the distribution of the propagation distances ofdomain walls when applying a steady-state magnetic field H :

df

dx=

(− 1

l0+ k∆y (x− xmin)

)f (6.2)

Here, ∆y(x) is the edge profile, e.g., from the WM model in (6.1), and f is the frac-tion of domain walls that are still propagating at distance x. We define x = 0 to be theinitial position of each domain wall, and xmin is the value of x at the minimum of ∆y(x)within 0 < x < 1 µm. The fit parameters, k and l0, account for the effects on trappingprobability that are dependent and independent of edge roughness, respectively. Weassume that l0 characterizes all spatially independent sources of pinning, such as theeffect of fluctuations in the magnetic anisotropy within the microstructure [101]. Whileit is likely that there are spatial correlations in the magnetic anisotropy, we presumethat these are on the order of the grain size of ∼10 nm, and much smaller than ξ. Thefit parameter, k, weights the significance of trapping due to edge roughness.

The detailed solution of (6.2) is in (6.5). The solution is a function of random vari-ables, thus for fitting purposes we determine (−df/dx)mean, the average of (−df/dx) forN = 1500 wires. In Fig. 6.5 and Fig. 6.6(b), (−df/dx)mean is fitted to the distributionsof distances traveled by domain walls in IMA Co wires found in both simulation andexperiment, respectively.

The analytical model parameters, l0 and k, obtained from fits to the simulationdata, are listed in Fig. 6.5. As expected, l0 increases with increasing H since domainwalls move farther at higher fields before encountering a trap that can prevent furthermotion. The fit between (−df/dx)mean and the experimental data in Fig. 6.6(b) alsoshows an increase in l0 with field. The parameter k decreases with increasing H in fitsto both simulation and experimental data. We find that ξ = 0.99 µm in the analyticalmodel fits the experimental data well, and this value of ξ matches the correlation lengthextracted from the PSD of the specific concentric ring wires (Fig. 6.2(a)-(b)) fabricatedfor the experiment.

As shown in Fig. 6.5, the simulations and analytic model for domain wall propaga-tion in nanowires with self-affine edge profiles show oscillations in trapping probabilityover distances equivalent to multiple correlation lengths. The strongest impact of edgeroughness, however, occurs within the first correlation length. Under this approxima-tion, we can then simplify the analytic model of (6.2) to the width-independent term,−1/l0, and a width-dependent exponentially decaying auto-correlation term:

df

dx=

(− 1

l0+ b exp

(−xλ

))f (6.3)

This model, (6.3), is solved in detail in (6.7). We fit this simplified model in (6.7) tothe experimental data in Fig. 6.6(c). We choose λ = 0.4ξ because the distance to thefirst trap can be approximated by ∆x0, which is roughly ξ/3 and 0.45ξ in Fig. 6.4, forξ = 255 nm and 127 nm, respectively. In the fit in Fig. 6.6(c), ξ remains the same as in


Fig. 6.6(b), l0 has values similar to those in Fig. 6.6(b) and also increases with H. Thefit constant b represents the weight of the exponentially decaying term correspondingto line edge roughness. We have sufficient experimental data and resolution in themagnetic force micrographs to group the propagation distances into 0.5-µm-wide bins.Given this constraint, we require higher fields such that l0 0.5 µm to image anyeffects of roughness. Indeed, the simplified model of (6.3) diverges from an exponentialdecay with b > 0, as observed in the highest field case of H = 32 kA/m, when thereis at least enough energy to move domain walls beyond the first trap, confirming theeffect of edge roughness in these wires.

6.6 Conclusion

In conclusion, we study nanowires in the regime where the magnetic domain wall widthis smaller than the self-affine correlation length ξ of the wire edge roughness. We find adiscrete distribution of domain wall traps. The maximum density of locations at whicha domain wall can be found at remanence is observed in the peak of ∆x distributions,found to be below ξ. For a representative correlation length of ξ = 255 nm, the infor-mation density of PMA wires of 14 positions/µm is higher than that of IMA wires, 10positions/µm, for the same wire width. When domain walls are moved by a magneticfield or current, the distribution of their distance traveled can be modeled analyticallyby (6.2), whose width-independent component creates the shape of an exponential de-cay with increasing distance and whose width-dependent component is based on theedge deviation fractal model. Our wire edge model applies to narrower wires, consider-ing that the root-mean-square edge roughness does not decrease in narrower wires [40].Notches with a higher pinning potential than that derived from line edge roughnessmay be introduced for predictable control, but at the cost of decreasing the maximuminformation density. These results are relevant to the scaling performance of domainwall devices into regimes where line edge roughness dominates the energy landscape inwhich the walls move. Furthermore, similar limits to information capacity are expectedto apply to a wider class of domain wall nanoelectronics, such as ferroelectric devices[102].

6.7 Experimental methods

6.7.1 Nanowire fabrication

We fabricated 80-nm-wide Co wires as shown in Fig. 6.2(a)-(b). The fabrication processstarted with the deposition of thin films composed of nominally Ta(5 nm) / Co(8 nm)/ Au(3 nm) using ion-beam sputtering at a base pressure of 5×10-8 torr, Ar pressureof 1×10-4 torr and target power of 920 W over 3” diameter targets yielding depositionrates of 0.029 nm s-1. The saturation magnetization of the film suggested an actual Cothickness of 5 nm. The films were patterned into rings using a poly (methyl methacry-late) (PMMA)/hydrogen silsesquioxane (HSQ) bilayer resists and ion beam etching [40].

Sec. 6.7. Experimental methods 79

First, we spun 2% PMMA and then 2% HSQ on the thin film. Then the HSQ was ex-posed with an Elionix ELS-F125 electron beam lithography tool operated at 125 keVand 19.2 mC cm-2 dose. The exposed HSQ was developed with salty developer (1 wt% NaOH and 4 wt % NaCl in deionized water) at room temperature and the under-lying PMMA removed with oxygen plasma except under the HSQ patterns. Using thebilayer resist as an etch mask, the metal film was ion-beam etched with Ar+ at 0.45 kVaccelerating voltage and the resist was removed with N-methyl-2-pyrrolidone heated at135 C on a hot plate. Further details of the patterning process are found in [40]. Theroot-mean-square (RMS) edge roughness of fabricated Co wires is ∼2 nm. Interwireinteractions are minimal at the wire spacing of 40 nm, which we verified by micromag-netic modeling, since the stray field from a domain wall in the neighboring wire is wellbelow the pinning field observed in experiments.

6.7.2 Domain wall traveling distance measurements

Our experiment was designed to understand how domain wall pinning sites are dis-tributed in sub-100-nm-wide magnetic wires. We measured the edge deviations in ourwires using scanning electron micrographs and measured domain wall positions usingmagnetic force micrographs [92]. The concentric rings in Fig. 6.6 had domain wallsinitialized with a 239 kA/m magnetic field to saturate the rings in the +x direction,resulting in remanent onion states with two 180 domain walls (one head-to-head, onetail-to-tail) along the diameter in each ring [103, 104]. Circular rings and the curva-ture in L-shaped nanowires with similar radii (5-5.5 µm) were used for convenience innucleating domain walls with an applied field; the walls then relaxed toward nearbypinning sites at remanence. After nucleating the tail-to-tail domain walls, we appliedthe magnetic field along the length of the wires to move them and imaged their finalpositions afterwards. When the applied field was high enough to depin the domainwalls, they started moving and were eventually trapped by the next pinning site thatwas sufficient to prevent further motion. Domain walls were re-nucleated for each ap-plied magnetic field value, and we recorded both their initial and final positions. Thetraveling distances below 2.5 µm were taken from the concentric rings and those above2.5 µm were taken from the L-shaped nanowires.

Figure 6.7 shows a high-resolution magnetic force micrograph of domain walls ini-tialized with a 239 kA/m magnetic field to saturate the rings in the +x direction,resulting in remanent onion states. Figure 6.8 shows magnetic force micrographs of theL-shaped nanowires for the different applied fields.

The domain walls in the nanowires experienced the magnetic stray field from themagnetic force microscope probes during scanning which can perturb their positions.To exclude this possibility, we note there was no dragging of domain walls evident inlow-resolution (large scan area) magnetic force micrographs. The dragging effect ismore prominent in samples imaged after applying small fields and it decreases withhigher applied fields. Domain wall displacements showing any evidence of domain walldragging by the probe were removed from the statistical analysis. Otherwise, magnetic


Figure 6.7. Magnetic force micrograph of domain walls in the initial onion states.

x

y

2 µm

20 kA/m 24 kA/m 28 kA/m 32 kA/m

Figure 6.8. Magnetic force micrographs of L-shaped nanowires.

force microscopy is considered as a noninvasive process, and domain wall positions canbe determined to within 50 nm.

The translation of domain walls in the nanowire rings required fields > 16 kA/m,much higher than the coercive field of the continuous Co film, which is 2 kA/m. Fig-ure 6.6 shows the magnetic force micrographs of nanowire rings after applying differentmagnetic fields along the wire. The average traveling distances of the domain wallsincreased with field, but some of the domain walls remained at the same position evenafter applying a higher magnetic field. For example, the domain wall marked by ablack arrow in Fig. 6.6 remained at the same position at 20 kA/m and 24 kA/m, wastranslated after applying 28 kA/m, then remained pinned at 32 kA/m. Similarly, thedomain wall marked by the white arrow changed its position when the field increasedfrom 20 kA/m to 24 kA/m and remained at that site at 28 kA/m and 32 kA/m. In


this case, the depinning field of the first site was between 20 kA/m and 24 kA/m, andthe second was more than 32 kA/m.

6.7.3 Micromagnetic modeling

We modeled IMA Co wires with a micromagnetic solver, the Object-Oriented Micro-Magnetic Framework (OOMMF), which solves the Landau-Lifshitz-Gilbert (LLG) equa-tion deterministically [57]. The Co wires were modeled using: damping parameter α =0.018, saturation magnetization M sat = 1.40 × 106 A/m, a uniaxial magnetocrystallineanisotropy of 4100 J/m3 with random direction in each cell, a wire thickness of 5 nm,and a cell size of 3 nm × 3 nm × 2.5 nm. In our modeled PMA CoFeB wires, we used:α = 0.01, M sat = 7.96 × 105 A/m, an anisotropy in the out-of-plane direction of K z

= 7.82 × 105 J/m3 with an additional random-direction anisotropy in each cell of 100J/m3, a wire thickness of 5 nm, and a cell size of 3 nm × 3 nm × 2.5 nm. For IMAand PMA, the exchange constant = 10-11 J/m. These parameters are justified in ourprevious study [12].

In our micromagnetic modeling of synthetic wires, we chose a wire length of 1.26 µmbecause that is significantly longer than ξ = 255 nm. This wire length supporteda distribution of pinning sites while minimizing the computation time needed, whichgrows by the cube of wire length. We fixed the magnetization direction in the left andright ends of the wire to ensure a domain wall is nucleated in the wire. To limit theperturbation of the domain wall by the fringing field from these fixed ends, the simulatedwire included a smooth region 150 nm long on the left and right of the 1.26-µm-longregion with line edge roughness. About 40% of the domain walls in simulated wires didnot move due to the stray field. These were discarded from the data. The discretizationof wire edges into cells for micromagnetic modeling results in pinning fields whose scalemay not agree with pinning fields in experimental measurements, but the models stillagree qualitatively with experiments [90]. The wire size chosen was long enough for afull range of motion while allowing for a large sample size.

The domain walls have a transverse structure with the magnetization of the corealong y, transverse to the wire. Although a domain wall may be affected by one wire edgemore than the other based on its core magnetization direction [91, 105], our simulationdata and analytical models used a large number of non-deterministic edge profiles andthus were not biased by the fact that we set domain wall core magnetization in the +ydirection.

Figure 6.9 shows a domain wall in an IMA wire and a PMA wire in micromagneticmodels. The typical domain wall widths in IMA wires and PMA wires are ∼40 nm and∼10 nm, respectively.

6.7.4 Power spectral density (PSD) calculations

The PSD of synthesized self-affine and random-edge profiles in Fig. 6.2(d) is the averagePSD of 104 such synthesized edges in order to reduce the variation of points typicallyseen in the PSD of only one wire edge. This shows the characteristic PSD of self-


a

b

z x

y

x

y

z

z x

y

x

y

z

mx

-1 10

mz

-1 10

Figure 6.9. A single domain wall as seen in a close-up top-down view of the magnetic moments, mx

and mz , in micromagnetic models of (a) an IMA wire and (b) a PMA wire, respectively.

affine and random-edge profiles without wire-to-wire variations. A standard rectangularwindow is used for the PSD of the self-affine and random-edge profiles. However, forthe PSD of the single discretized wire edge, the Hann window is used because it haslow aliasing, which is useful considering the fewer number of points especially at lowspatial frequency in the single discretized wire edge. The discretized wire edge is takenfrom a scanning electron micrograph of 1024×1024 pixels with a 1 nm pixel size. Allwire edges in Fig. 6.2(d) have a root-mean-square line edge roughness of 3 nm.

6.7.5 Domain wall position spacing models

The ∆x distributions in Fig. 6.4 are from extensive micromagnetic modeling of indi-vidual wires. For each wire, there is a simulation for each of the 421 3-nm-long cells inwhich a domain wall could be initialized. Hence, the number of wires for each histogramin Fig. 6.4 is much smaller than the number of wires simulated for each applied H fieldin Fig. 6.5.

6.7.6 Analytical models

The analytical model (−df/dx)mean is found by solving (6.2) with (6.1) by separatingvariables:

f(x) = exp

(− xl0

) ∞∏p=1

exp

(kBLERCpξ

2π

ν−p(HQ+1)

cos Ψp(cos Φp

− cos

(2π

ξνp ((x− xmin) cos Ψp) + Φp

))) (6.4)


− dfdx

=−

− 1

l0+ kBLER

∞∑p=1

Cpν−pHQ sin

(2π


)× exp

(− xl0

) ∞∏p=1

exp

(kBLERCpξ

2π

ν−p(HQ+1)

cos Ψp(cos Φp

− cos

(2π


)))(6.5)

(− dfdx

)mean

=1

N

N∑w=1

(− dfdx

)w

(6.6)

We use −df/dx because it is the derivative of 1 − f that gives how many DWsare trapped at a certain distance x. We expect df /dx < 0 because f (x = 0) = 1 and fdecreases with x until all domain walls have been trapped. We determine the analyticalmodel (−df/dx)mean in (6.6) by calculating (6.5) for N = 1500 wires. The coefficientk, which weights the edge profile ∆y(x) in the substitution of (6.1) in (6.2), representsthe magnitude of width effects and is chosen such that (−df/dx)mean is still positiveand the total area under the curve converges to unity.

The simplified analytical model given in (6.3) is solved as follows:

− dfdx

= −(− 1

l0+ b exp

(−xλ

))exp

(− xl0

+ bλ(

1− exp(−xλ

)))(6.7)


Chapter 7

Circuits for High-ResolutionMagnetic Devices

THE discrete positions that domain walls may be placed in a nanowire due to its self-affine nature leads to several possibilities for device applications. For one, nano-

wires with mobile domain walls can be used with magnetic tunnel junctions (MTJs) forhigh-resolution tunable resistances. Furthermore, the stochastic nature of domain wallmotion, in part due to the fabrication process, can be used for applications requiringrandomness. This work evaluates deterministic logic and memory applications of do-main walls in nanowires as well as potential applications for when domain wall motionis designed to be stochastic.

7.1 Introduction

Just as how single-bit MTJs are applied to deterministic memory cells and stochasticrandom number generators [106], magnetic nanowires with domain walls (DWs) can beused for both deterministic and stochastic applications.

Figure 7.1 shows a typical example of a 3-terminal magnetic logic device that couldbe used for nonvolatile memory and logic circuit applications. The nanowire in thedevice contains a DW whose position determines the resistance of the MTJ above. Theroughness of the wire edge defines how many positions the DW could be in. This inturn defines the bit resolution of the device.

7.2 Deterministic domain wall motion applications

The modeling of distances traveled by domain walls in Chapter 6 helps identify theregime where domain walls can be predictably moved by an electric current or a mag-netic field.

A major result of those analytical models, (6.2) and (6.3), is that a DW can pre-dictably travel at least one correlation length given enough applied field or current.Domain wall motion can be predicted using (6.2) and (6.3) in conjunction with the one-dimensional model in [45] for spin-transfer torque (STT) or [34] for spin-orbit torque(SOT).

85

Figure 7.1. A 3-terminal magnetic logic device including a long MTJ can be used for a variety ofcircuit applications. The line edge roughness depicted plays a role in setting the device bit resolution,i.e., the number of bits that the device could be used to process.

Sec. 7.3. Stochastic domain wall motion applications 87

The current required to move DWs can be reduced by a factor of 10× and the latencycan be reduced by changing the domain switching mechanism from STT in ferromagnetsto SOT with antiferromagnets [33]. Further physical research on antiferromagnets andtopological insulators is in progress, but for the present-term, our devices are based onferromagnets with STT or SOT.

Notches are often engineered into wires to create deterministic pinning sites. An-other opportunity to create deterministic pinning sites is to use voltage-controlled mag-netic anisotropy [107]. Engineered pinning sites could serve to perform logical opera-tions, or could serve to discretize the positions that a device is used to store and readdata. The fundamental spacing between discrete DW positions is the natural spacingbetween DW positions, but engineered pinning sites could improve the regularity andaccuracy of DW positioning at the expense of higher energy to move the DW.

If the magnetic nanowire with a domain wall is placed under a long MTJ to form a 3-terminal MTJ, the applied current pulse should have its amplitude and length designedsuch that the read value is deterministic.

The 3-terminal MTJ with a long MTJ and long nanowire make it possible to per-form essentially a majority gate operation, used in [3] for a NAND operation, which islogically complete. Complex logic blocks may be synthesized using majority gate designas in [108] or NAND gate combinational logic as in [14, 109]. The MTJ is long to allowthe device to store and output multiple bits if desired.

The latest advances with SOT in PMA wires leaves a growing opportunity for low-energy and high-speed deterministic logic and memory devices. Such characteristics arealso ideal for logic-in-memory.

7.3 Stochastic domain wall motion applications

One application of the stochastic variation in domain wall traveling distances is to usethat variation for physically unclonable functions (PUFs) [110]. A PUF is a functionthat produces a unique value, or signature, for every copy of a chip, where the uniquesignature is usually due to process variations from chip to chip.

For a single domain wall in a nanowire, there is a probability that the domain wallwill get to a certain position. For a given wire, the process variations defining thewire edge determines that probability. In other words, domain wall motion is describedprobabilistically for many wires, but it should be fully deterministic for an individualfabricated wire. Each wire has the unique signature of its own stochastic self-affine edgeprofile, which is ideal for application in a PUF.

However, it may not be ideal to use domain walls in nanowires for a continuousstream of random numbers. This is because once a wire is patterned, its stochasticself-affine edge profile is already set. A domain wall distance traveled could be used asa seed for a different random number generator, but the domain wall distance traveledwould not be inherently repeatable random process.

88 CHAPTER 7. CIRCUITS FOR HIGH-RESOLUTION MAGNETIC DEVICES

7.3.1 How to add stochasticity to current applications

While most applications with domain walls in nanowires assume deterministic domainwall motion [3, 38, 111], a new class of devices could be based on the regime wheredomain wall traveling distances are stochastic. Stochastic domain wall motion need notcome only from the self-affine wire edge under standard applied current or magneticfield, but other sources of random behavior that could be introduced are applying higheffective field for Walker breakdown and applying high temperature for more randomthermal effects [111]. If there are enough stochastic processes in effect on the domainwall in a nanowire, randomized elements in logic circuits are possible. The randomizedlogic elements would need to be checked for true randomness using the tests by theNational Institute of Standards and Technology [112]. The randomized logic elementscould be useful to applications in cryptography.

7.3.2 Randomness from thermal effects

If thermal effects are to be included for randomness, it would be worthwhile to applythe stochastic LLG equation for micromagnetic modeling [113]. This could be used totest how to break from the thermal stability found in the typical room-temperaturedevices in [3, 38].

Since DW motion is essentially deterministic once the nanowires are fabricated,it may be difficult to extract random behavior from the standard room temperaturebehavior under standard operating conditions modeled in Chapter 6. One possiblemethod to extract random behavior in these conditions is to fabricate over 10× as manywires as the desired resolution of a random number, and apply a low field or current. Thefew wires in which the DW moves past the first trap should have an essentially randomdistance traveled. However, this random value is repeated and new random valueswould require new wires. An alternative method that would work with higher fieldsand fewer wires would be to perform a transformation on the DW distance traveled.That is, using the DW distance traveled probability density function g(x) = df

dx onewould derive the cumulative distribution function G(x) =

∫ x0 g(x)dx. The measured

DW distance traveled would be multiplied by G(x) such that the possible measuredDW distance traveled would be random with a uniform distribution. Although this israndom the first time, this too is not repeatably random.

Since thermal effects introduce true randomness to DWs in very short distance scalesusually smaller than the DW, thermal effects could be amplified by a large number ofswitching events. A single DW could be moved back and forth by an oscillating ACinput current or magnetic field. In each oscillation period, the DW is depinned from apinning site, i.e., a local minimum in wire width, and moves to one of several pinningsites on the other side depending on thermal noise. After a large number of oscillations,the DW final position becomes hard to predict and can be considered random. The ACinput techniques would work if the energy of thermal noise is high enough in comparisonto the energy differences at neighboring DW pinning sites, which is usually not the case

Sec. 7.4. Conclusions 89

for wires used for deterministic applications in other chapters of this work. Hence,thermal effects are random but may not be of high enough energy to affect DW-baseddevices.

Although this dissertation studies DW-based devices exclusively for the case wherethe device has a single DW, another technique would be to generate multiple DWswith an AC magnetic field, as in [105], to make it possible to read out possibly randomresistance values with an MTJ on top, though this has not been studied.

7.3.3 Randomness from the tunnel barrier

When magnetic nanowires are placed under a long MTJ, stochastic behavior may arisefrom the tunnel barrier as well. The applied current pulse amplitude and duration maybe tuned for maximum stochasticity in this case. Analogous to this in single-bit MTJsis the characterization of current pulse amplitude and length to optimize for the mostrandomness [106].

7.4 Conclusions

We find that magnetic devices with nanowires are primarily best suited for determinis-tic logic and memory applications, though it may be possible to extract some stochasticelectrical behavior from devices engineered for that use. Although the standard con-ditions for DW motion in nanowires are not enough to generate repeatable randombehavior, some repeatable randomness could be sourced from thermal effects, tunnel-ing, and high effective field breakdown effects.

90 CHAPTER 7. CIRCUITS FOR HIGH-RESOLUTION MAGNETIC DEVICES

Chapter 8

Neuromorphic Computing withHigh-Resolution Magnetic Devices

NOW that we understand how to get high bit resolution in magnetic domain walllogic devices, we can develop practical applications of such devices to hardware

acceleration. This chapter focuses on a specific device and application of a 3-terminalmagnetic tunnel junction device to neural network hardware acceleration.

Parts of this chapter appear in the paper titled “A Logic-in-Memory Design with3-Terminal Magnetic Tunnel Junction Function Evaluators for Convolutional NeuralNetworks” by S. Dutta, S. A. Siddiqui, F. Buttner, L. Liu, C. A. Ross, and M. A.Baldo.

8.1 Abstract

Analog implementations of neuromorphic circuits within digital systems are increasinglybecoming attractive due to the high throughput and low energy per operation they offer.Magnetic logic devices based on spin-orbit torque offer a pathway to low-power resistiveanalog circuits. We present the magnetic tunnel junction (MTJ) function evaluator, adesign for a logic device that evaluates a wide variety of functions for neural networks.We extend the device into a functional design implementation in a logic-in-memoryarchitecture in a hybrid process with magnetic device layers and 45 nm CMOS.

8.2 Introduction

Deep neural networks have become effective in automating tasks such as image andspeech recognition, but are increasingly bottlenecked by processing limitations [3, 90,114]. While deep neural networks are being optimized for conventional hardware, reduc-tions in latency and power could further arise from dedicated circuits based on emergingnonvolatile devices such as resistive memory or magnetic devices. We propose usingmagnetic domain wall (DW) devices for logic and memory applications. This technol-ogy provides a variable resistor that is read and written electrically [3], and we showhere this technology can be designed to evaluate an arbitrary nonlinear function. Thevariable resistor is a magnetic tunnel junction (MTJ), which has a fixed magnetic layer

91

92 CHAPTER 8. NEUROMORPHIC COMPUTING WITH HIGH-RESOLUTION MAGNETIC DEVICES

Σ

ʃ

ini

ini+1ini+2

inn

out

Figure 8.1. Neural network operations: multiply-accumulate for the dot product and one of manynonlinear functions for activation or thresholding.

and a free magnetic layer to provide nominally two resistance states, RP and RAP .When there is a moving DW in the soft free layer under the MTJ, the MTJ resistancevalue RMTJ can have one of many values between RP and RAP depending on the po-sition of the DW. The wall is moved by an electrical current due to spin-orbit torque(SOT) [19, 115], which requires less current to move a wall compared to spin-transfertorque (STT) [15]. This technology is compatible with conventional complementarymetal-oxide-semiconductor (CMOS) circuits in electrical design and process technol-ogy. We propose a logic-in-memory system to implement an efficient convolutionalneural network (CNN) and high-density memory based on magnetic logic components.

A generic multiply-accumulate and activation function operation looks like thatshown in Fig. 8.1.

8.2.1 Device and circuit design approach

A 3-terminal MTJ, shown in Fig. 8.2, is a device whose resistance value is set by anelectrical current from the input terminal to the ground terminal, and is read by anelectrical current through the tunnel junction terminal on top.

The line edge roughness (LER) of a magnetic nanowire leads to a discrete number ofDW positions along the nanowire. Pinning may also be intrinsic from wire anisotropyvariations [93, 96]. The fundamental limit to the number of DW positions in a nano-wire ultimately sets how many resistance values can be used for RMTJ . Nevertheless,the discrete domain wall positions are spaced closely enough to provide the resolutionneeded in this work.

Sec. 8.2. Introduction 93

(a)

(b)

z x

y

x

y

z mz

-1 10

Conductive

substrate/spin Hall

heavy metal

Patterned MTJ

Domain wall

MTJ tunnel barrier

MTJ free layer

GND

IIN

w(x)

x

MTJ fixed layer

Figure 8.2. Design of the MTJ function evaluator. (a) Device drawing. (b) Micromagnetic model ofdevice with domain wall motion with a variable current density along the wire due to spin-orbit torque.

Our logic-in-memory system design borrows several architectural elements from [6],a logic-in-memory system based on resistive random access memory (RRAM). MTJshave a tunable resistance, like memristors in RRAM [116]. Similar systems have beendesigned using MTJs [117], but here we show that the MTJ-based approach can be


extended to the evaluation of nonlinear functions, a significant advantage for systemsthat seek to integrate logic in memory for the purposes of accelerating hardware imple-mentations of neuromorphic algorithms.

The main contributions of this work are highlighted in the following sections. Sec-tion 8.3 describes the design of an MTJ function evaluator, a device that can evaluateany custom linear or nonlinear function used in convolutional neural networks. Sec-tion 8.4 describes a system architecture for a logic-in-memory using the MTJ functionevaluator, including the relevant circuits. The paper then reports results from simula-tion in Section 8.5 and concludes with Section 8.6.

8.3 The MTJ function evaluator

Let x 0(I IN ) be a function to generate output domain wall position x 0 with input currentI IN . The output of the MTJ function evaluator is an analog resistance value, which wecan use a small signal or a sense amplifier to read. The output resistance of our deviceis given by:

RMTJ = RP

(x0L

)+RAP

(1− x0

L

)(8.1)

where x0 is the final domain wall position in an MTJ of length L. We derive the widthw(x) as a function of distance x along the wire from the initial DW position, necessaryto satisfy x0(IIN ), following from equations with the final DW position. We start withthe DW velocity function v(x):

v (x) =

0, J < J0 (i.e., x = 0)

η (J (x)− J0) , J ≥ J0 (i.e., x > 0)(8.2)

where the critical current density for DW motion is J0, and η is the proportionalityconstant between the current density and domain wall velocity. Next, we note that theinput current, I, is related to the current density by:

dx

dt= η (J (x)− J0) = η

I (x0)− I0w (x) d

(8.3)

where d is the thickness of the nanowire and I0 is the critical current. If we assume thecurrent is applied in a pulse of width t0 and if we assume x > 0, it is possible to showthat:

w(x0) =ηt0d

(dI

dx0

)(8.4)

Here, I (x 0) is the inverse function of x 0(I IN ). Note that because I (x 0) is a functionwith an offset, we do not need to know the offset in order to calculate its derivative inorder to get w(x ). The shape with width w(x ) is then fabricated.

Further, there is a constraint on the desired transfer function that the x0 mustincrease monotonically with IIN . However, there are no further constraints on higherorder derivatives of the function.

Figure 8.3. (a) Transfer characteristic from input current to output domain wall position and resis-tance, determined in micromagnetic simulation. Normalized data are on the left and bottom axes andactual data are on the top and right axes. (b) Width function required for the shown shifted sigmoidthresholding function. (c) Current density profile due to the width profile.


Table 8.1. Analytical model fit parameters for Fig. 8.3 using the shifted sigmoid equation (8.5)

ParameterData Fit

(Normalized)Data Fit(Actual)

Ideal

xA 0.4548 288 nm 12

xB 0.4060 367 nm 12

I1 0.3608 2.85 µA 12

I2 0.2600 1.97 µA 14

8.3.1 Function implementation with a thresholding MTJ

The thresholding function, or activation function, performed on the dot product outputis an arbitrary nonlinear function. We choose a shifted sigmoid function in this work,and derive the width function required for an implementation with a single thresholdingMTJ device.

We have the following general input to output analytical relation for a shifted sig-moid function:

x0 (I) = xA tanh

(I − I1I2

)+ xB (8.5)

The MTJ width function required for the shifted sigmoid thresholding function inFig. 8.3 is determined using (8.4):

w(x) =ηt0d

(xAI2

x2A − (x− xB)2

)(8.6)

Table 8.1 shows values for the parameters in (8.5) to best match the micromagneticmodeling results. Note that (8.5) has an ideal shifted sigmoidal shape if the idealparameter values in Table 8.1 are used. In Fig. 8.3, the input I and output x 0 are alsoshown as normalized, yielding a plot of x 0(I IN ).

The material parameters listed in in Table 8.2 are chosen such that the wire exhibitsperpendicular magnetic anisotropy (PMA). The saturation magnetization is low enoughsuch that the terminal domain wall velocity is not reached [34].

We use Mumax3 to simulate the micromagnetic dynamics capturing spin-orbit tor-que [118]. The parameters for the micromagnetic models are listed in Table 8.2, in-cluding relevant SOT parameters for the spin Hall effect (SHE) and the Dzyaloshinskii-Moriya interaction (DMI) which are responsible for spin currents to move the domainwall from charge currents in the underlying layer. While the domain wall velocitydepends on input current in spin-transfer torque (STT) devices [15], SHE and DMIgreatly enhance the efficiency, providing greater domain wall velocity for a given inputcurrent. The completed micromagnetic simulations from Mumax3 are then visualizedin the Object-Oriented Micromagnetic Framework (OOMMF) [57].

Sec. 8.4. Logic-in-memory system design 97

Table 8.2. Micromagnetic model parameters for 3-terminal MTJs using CoFeB with perpendicularmagnetic anisotropy

Damping parameter (α) 0.01

Saturation magnetization(Msat)

7.96× 106 A/m

Out-of-plane anisotropy(Kz)

7.82× 105 J/m3

Spin Hall angle (θSH) 0.15

Exchange constant (A) 10−11 J/m

Cell size1.5 nm × 1.5 nm

× 2 nm

InterfacialDzyaloshinskii-Moriya

strength (DDMI)-0.5 mJ/m2

Current advances in devices based on SHE allow the critical current density to befurther reduced to make circuits such as that presented here possible.

8.4 Logic-in-memory system design

We design a system that can perform an analog multiply-accumulate for convolutionalneural networks, and store and load data as a memory. Our architecture is based onthe MTJ function evaluator, which we use in a crosspoint array as a synapse and as athresholding function.

8.4.1 Crosspoint array

A CNN has a synaptic function and an activation function. Figure 8.4(a) shows thesynaptic function where a generic analog multiply-accumulate is performed using pro-grammed conductances Gi,j in each row i and column j. Figure 8.4(b) shows theactivation function, also known as a thresholding function.

In our crosspoint array in Fig. 8.5, we implement the current summation in (8.7)with analog voltages Vi on the inputs, and we implement the thresholding function in(8.8) by providing an analog voltage VOUT,j on the output that depends on Ij :

Ij =∑i

ViGi,j (8.7)

VOUT,j = f(Ij) (8.8)


Gi,j

Gi+1,j

Gi+2,j

Gi+3,j

Gi,j+1

Gi+1,j+1

Gi+2,j+1

Gi+3,j+1

Vi

Vi+1

Vi+2

Vi+3

1 , 1j i i j

i

I V G ,j i i j

i

I V G

Input: voltages

(converted

from digital)

Output: currents

(to thresholding devices)

Gi,j

Gi+1,j

Gi+2,j

Gi+3,j

Gi,j+1

Gi+1,j+1

Gi+2,j+1

Gi+3,j+1

Vi

Vi+1

Vi+2

Vi+3

f(Ij) f(Ij+1)Output: voltages

(for next layer,

or convert to digital)

Input: currents

(from columns above)

…

…

…

…

(a)

(b)

VOUT,j VOUT,j+1

Figure 8.4. General approach to using Ohm’s law to perform an analog multiplication for deepconvolutional neural networks. (a) The synaptic function performs the dot product. (b) The activationfunction evaluator, performs thresholding based on a generally nonlinear function f(IJ).

The crosspoint array of 3-terminal MTJs has only one access transistor. While cur-rent paths could be further decoupled with additional access transistors for 3-terminalMTJs [119], this work trades that for less area and adds other line impedances to in-active cells to reduce leakage. The WL access transistors ensure accurate writing. Theresistive circuit model for 3-terminal MTJs in Fig. 8.5(a) is based on the model in [14].

8.4.2 System architecture

The architecture for the logic-in-memory system, including the crosspoint array and itsassociated circuits, is shown in Fig. 8.6. The cell arrays are physically identical butseparated into memory and CNN subarrays as in [6], but here the CNN data flow isable to cycle fully in the analog domain.

We do not implement column multiplexing in our design, but suggest it as an optionfor wider memory subarrays. This is because reading our multi-level cell MTJ makes


WL1

WL2

... ......

...

SL2 BL2SL1 BL1

RL1

RST1RST1

RST1

RSTP

RST2RST2

RST2

RSTP

FBL1 FBL2

RL2

Synaptic MTJ:

Thresholding MTJ:VMTJ

VHML VHMR

VMTJ

VHML VHMRVMTJ

VHML VHMR

Circuit Model for Synaptic MTJ and Thresholding MTJ:

(a)

(b)

RMTJ

RHMRRHML

Figure 8.5. (a) Circuit symbols and models for the synaptic MTJ and thresholding MTJ. (b) Cross-point array using synaptic MTJs and thresholding MTJs as analog logic and memory elements.

multiple bits of data available at once. In other words, column multiplexing is lessimportant at higher bit resolution.

8.4.3 Voltage drivers

Figure 8.7 shows the voltage driver circuits using transmission gate multiplexers. Acommon source amplifier with an active load is used because it provides a high gainmagnitude at a low power expense. Since the common source amplifier has negativegain, it has an inverting effect on data. We realize from (8.1) that high IIN results in


Vo

lta

ge D

rive

rs

Controller

add

ress

in

Me

mo

ry Su

ba

rray

Crosspoint Array

Sense Amplifiers (No Optional Multiplexer)

Me

mo

ry Su

ba

rray

Crosspoint Array


CN

N

Sub

array

Crosspoint Array


CN

N

Sub

array

Crosspoint Array


...

...

Ro

w D

eco

de

r an

dM

ult

iple

xers

fo

r R

L, W

L, a

nd

BL

data

ou

tRow Buffer

Figure 8.6. Architecture of the logic-in-memory system.

high x0, in turn resulting in low RMTJ . Thus, when a small voltage is read in the voltagedivider across RMTJ , the output can be made large again by an inverting amplifier, i.e.one with negative gain. The common source amplifier designed and shown in Fig. 8.7(c)has a sufficient gain of -4.0, which could be higher in magnitude in exchange for morearea.

8.4.4 The sense amplifier

The design for the sense amplifier for analog to digital conversion is shown in Fig. 8.8. Itis the design in [120], which is used by [6], and is also suitable for this logic-in-memory.

8.4.5 Direct layer to layer connections

Figure 8.10 shows that a single layer of a deep neural network implemented in a CNNsubarray can have its output feed directly into another layer, meaning that no additionaloverhead is needed for the analog to digital conversion, storage in RAM, and digitalto analog conversion as typical in other CNN implementations. The connection for theanalog signal between the thresholding function output and the input to another layeris a long interconnect with an amplifier. The amplifier in Fig. 8.10 is the common sourceamplifier in Fig. 8.7. In the overall system architecture, one CNN subarray output couldbe connected with a long interconnect and amplifier to another CNN subarray input.However, some CNN single layer outputs may be designed to connect to the inputs ofmultiple other CNN subarrays. The programmer would choose which CNN subarraysto use depending on the connectivity between CNN subarrays for deep layers in theirneural networks.


2:1 MUX

i0

in-1 s

out

Select

4:1 MUX

i0

i2 s

out

4:1 MUX

i0

i3 s

outi1

i2

0

RLiZ

0

BLjSense

ControlControl

i1

i2Z

PriorLayer

...

...

N:1

Set to highest

RMTJ

Set to lowest

RMTJ

2:1 MUX

i0

in-1 s

out

Select

...

...

N:1

Set to highest

RMTJ

Set to lowest

RMTJ

(a) (b)

wM0 8wM0

6wM0

6wM0VIN

VOUT

(c) VIN VOUT

Figure 8.7. (a) Read-line driver. (b) Bitline driver. (c) Common source amplifier with an active loadused in the driver circuits.

8.4.6 System operation

Figure 8.9 illustrates how the logic-in-memory system is operated in perceptron modeor in memory mode. In perceptron mode, a feed-forward network layer involves thesesteps:

1. Reset the thresholding MTJs by setting all RSTj = 1 and inject a DW with RSTP.

2. RSTj = 0. WLi = 0. Apply input voltages on RLi. The currents add up on BLj .For each column, the current into the thresholding MTJ set its resistance.


VBIAS1 VBIAS1

EN

VBIAS2

EN

NSENSE VREF

OUT

RREF2

3wM0 3wM0

wM0 wM0

1.5wM0

wM0

1.5wM0

wM0

2wM0 2wM0

2.5wM02.5wM02.5wM02.5wM0 2.5wM0

RMTJ

wM0wM0

wM0wM0

RREF2 =...

R1

S0

R2

S1

RN

SN-1

...Counter and Control Logic

CLKS0

SN-1

...

V+

V-

V+

V-

Figure 8.8. Multi-bit sense amplifier design based on [6].

3. BLj = 0. Pass output to the next layer as in Fig. 8.10. If on final cycle, sense thethresholding MTJ resistances.

In memory mode, the steps to read a row of bits are: select 1 row with RLi = 0,other RLi = Z, set WLi = 0, and sense the resistance of each MTJ on BLj .

In both modes, the steps to write the synaptic MTJs are: write 1 row in the matrix,driving WLi = 1; RLi = Z; SLj = 0; repeat for every row, injecting a DW with eachBLj .

8.5 Discussion

The logic-in-memory system was modeled in a hybrid process with 45 nm CMOS fromthe Cadence GPDK and a scaled magnetic tunnel junction process based on [3]. Sim-

WL1

WL2

... ...

...

...

SL2 BL2SL1 BL1

RL1

RST1RST1

RST1

RSTP

RST2RST2

RST2

RSTP

FBL1 FBL2

RL2

(a)

(b)WL1

WL2

... ...

...

...

SL2SL1

RL1

RST1RST1

RST1

RSTP

RST2RST2

RST2

RSTP

FBL1 FBL2

RL2

BL1 BL2

VIN1

VIN2

IOUT1 IOUT2

IWRITE1 IWRITE2

ISENSE1

VNSENSE1 VNSENSE2

ISENSE2

VWRITE1 VWRITE2

Figure 8.9. (a) Feed-forward network operation and (b) memory operations: reading (dashed linesonly) and writing (solid lines only) weights.


...

...

RST1RST1

RST1

RSTP

RST2RST2

RST2

RSTP ...

...

...

SL2 BL2

WL1

WL2

FRL2FRL1

Figure 8.10. Any number of layers can be processed in the analog domain, without having to converteach layer result from analog to digital back to analog.

Table 8.3. Device model parameters for the 45 nm Cadence GPDK CMOS process technology hybridwith an MTJ process technology

NMOS and PMOS from 45nm CMOS PDK

Gate length, LMG (nm) 45

Minimum width, wM0 (nm) 120

3-Terminal MTJ Compatible with 45 nm CMOS


100%

Resistance limits: RAP , RP (kΩ) 25.8, 12.9

Critical current to set RAP : IC,AP (µA) -7.6

Critical current to set RP : IC,P (µA) 7.6

Length, width of magnetic layer (nm) 900, 36

ulations were performed in Cadence Spectre using the process parameters in Table 8.3and results are in Table 8.4.

Power is reported directly from simulations. However, latency also accounts forthe MTJ switching time. In the micromagnetic models, our pulse length t0 is 4 ns inFig. 8.3, which sets the MTJ switching time. The critical current density requirementis met because the MTJ write currents set the background current density to over

Sec. 8.6. Conclusion 105

1011 A/m2, well above values reported in the literature [121].In our devices, CoFeB is used for the magnetic layers, Ta forms the heavy metal

layer underneath, and Ru is used for the contacts. These materials are chosen such thatPMA behavior is exhibited by the device. The resistivities we assume are based on ourprevious work in [3] and parameters reported in [19]:

ρCoFeB = 2.1× 10−6 Ω-m (8.9)

ρTa = 1.9× 10−6 Ω-m (8.10)

ρRu = 5.0× 10−7 Ω-m (8.11)

RMTJAMTJ ≈ 1.0× 10−8 Ω-m2 (8.12)

The convolutional neural network based on this logic-in-memory system has severaladvantages over conventional designs. The variable resistance provided by the synapticMTJ ensures high accuracy at a low energy cost, because the same scaled CNN ona general-purpose processor consumes at least 50× more energy [114]. Our designovercomes the limitation of binary MTJ designs, noting that the sense amplifier bitresolution is designed together with the available DW positions, i.e., RMTJ values. Theavailable DW locations are determined by self-affine statistics [90]. Alternately, notchescan be patterned in the wire to create pinning sites deterministically [12]. Based onour nanowire models, we determined that the synaptic MTJ and thresholding MTJin this work could store up to 3 bits. It was predicted that binary weights lead to a20% accuracy loss [122]. Our design ensures correct operation even though the TMRof 100% here is modest compared to the highest reported TMR of 604% [3, 123]. Theability to store layer outputs in the inherently nonvolatile thresholding MTJ reducedthe need for the significant overhead to store or convert the activation function results.

8.6 Conclusion

Full VLSI systems can be built with CMOS and 3-terminal MTJs. The ability ofMTJs to support the evaluation of nonlinear functions is expected to provide significantadvantage to this class of devices in logic-in-memory applications, as shown here fordeep convolutional neural networks. A single layer CNN computation has the latencyof just a single cycle, outperforming conventional digital CMOS systems where a singlelayer requires multiple cycles of approximately the same latency [124]. Considering thereduced number of cycles for a single layer computation and reduction of cycles whenpassing information from one layer to the next in the design here, we project a 10×reduction in total latency for neural network computations compared to digital CMOSimplementations.

Table 8.4. Power and delay analysis for the logic-in-memory in Fig. 8.6

Feed-forward network operation

Static power for 2×2convolution

68.6 µW

Dynamic power for 2×2convolution

15.4 nW

Latency per layer 4 ns

Weight or memory cell write operation

Static power for 2×2array

68.6 µW

Dynamic power for 2×2array

10.7 nW

Latency 4 ns

Memory cell read operation

Static power for 2×2array

68.6 µW

Dynamic power for 2×2array

129 µW

Latency 2 ns

Chapter 9

Conclusions and Future Studies

THIS dissertation shows a complete vertical integration of magnetic logic devicesfrom the materials and physics of a device up to circuits and systems applications

with multiple devices. Starting with a general class of 3-terminal magnetic logic devicespresented in Chapter 2, several instances of such devices are introduced throughout thethesis. Key materials effects governing domain wall pinning and motion are studied inChapters 3 and 6. Circuit applications evolve from using magnetic devices storing asingle bit in Chapter 4 to using magnetic devices storing multiple bits in Chapter 5.Multi-bit applications are understood further by uncovering the fundamental spatialresolution limit for domain wall positions in Chapter 6, which is applied to circuits inChapters 7 and 8. Indeed, long MTJs need to be fabricated to realize multi-bit devices.The limits we uncover are for a single DW in a magnetic wire, but higher informationdensity with multiple DWs could also be studied.

9.1 Tunnel magnetoresistance required

One general finding of this work is that MTJs should have a TMR of 100% or more tobe useful in circuit applications. Digital logic with only magnetic devices in Chapter 2requires this much TMR in order for the range of output current for one device tomatch the range of input current of the next device. In the circuits in Chapters 4 and8 in a hybrid process with MTJs and CMOS, amplification from transistors allows therequirement on TMR to be relaxed but we keep TMR at 100%. The circuits takingMTJ resistances as inputs are sense amplifiers for analog to digital conversion, andcommon source amplifiers in the neuromorphic logic-in-memory for feeding one storedlayer result in an MTJ to MTJs in the next layer.

9.2 Digital logic

Magnetic logic devices are one of many emerging technologies that can be used to per-form digital logic [3, 5, 125]. Digital logic gates implemented with in-plane magneticdevices in [3] show functionality, but new means to scale down switching energy as sug-gested in Chapter 1 are becoming available to make digital logic with magnetic devicescompetitive with conventional CMOS logic gates. In addition to the efficient switch-

107

108 CHAPTER 9. CONCLUSIONS AND FUTURE STUDIES

ing of ferromagnets with SOT instead of STT [2, 17, 18, 19, 34, 35], magnetic devicescould be made with antiferromagnets and topological insulators instead of ferromag-nets to realize even lower switching energies [33, 126, 127, 128, 129]. Continued researchon switching antiferromagnets and magnetic insulators with less current may make digi-tal logic with magnetic devices more energy-efficient, and faster with speeds in the GHzto THz range.

9.3 Machine learning applications

A variety of emerging devices are under consideration for the hardware acceleration ofmachine learning, and magnetic devices are particularly competitive in low switchingenergy and low latency in comparison to other devices [130]. We show that convolutionalneural networks can run at high throughput using spintronic elements in analog circuitswithin a digital implementation. Nevertheless, there are some other areas of machinelearning beyond neural network circuits that would benefit from spintronic elements aswell. Although we implement neural networks, spiking networks are another type ofmachine learning that could be implemented with spintronic devices [131].

Statistical learning involves math that involves multiplications, which could be car-ried out by Ohm’s law. Essentially, linear regression functions whose parameters aredetermined for an application could be implemented with circuits implementing Ohm’slaw by multiplying an input voltage times the stored conductance as in Fig. 8.4.

Some other machine learning techniques involve using nonlinear math, such asadding squares of differences together [132]. These calculations could be approximatedin the digital domain, or the crosspoint array in Fig. 8.4 could be programmed suchthat the multiplication result gives the sums of squares. Critically, the thresholdingMTJ designed in this work could be used to implement any nonlinear function, whosevalue can be stored in every cycle.

9.4 Future directions

Magnetic logic devices provide a unique set of logic and memory capabilities outlinedin this thesis. Ongoing research in the physics, materials science, and circuit design arelikely to make further practical applications of the devices possible.

Bibliography

[1] S. Dutta. Floating-point unit (FPU) designs with nano-electromechanical (NEM)relays. Master’s thesis, Massachusetts Institute of Technology, 2013.

[2] J. A. Currivan-Incorvia. Nanoscale Magnetic Materials for Energy-Efficient Spin-Based Transistors. PhD thesis, Harvard University, 2015.

[3] J. A. Currivan-Incorvia, S. Siddiqui, S. Dutta, E. R. Evarts, J. Zhang, D. Bono,C. A. Ross, and M. A. Baldo. Logic circuit prototypes for three-terminal magnetictunnel junctions with mobile domain walls. Nat. Commun., 7:10275, Jan. 2016.doi:10.1038/ncomms10275.

[4] S. Dutta, M. Price, and M. A. Baldo. Nonvolatile Online CMOS Trimming withMagnetic Tunnel Junctions. In IEEE/ACM Int. Symp. Nanoscale Archit., pages61–66, 2016. doi:10.1145/2950067.2950091.

[5] G. W. Burr, M. J. Brightsky, A. Sebastian, H. Y. Cheng, J. Y. Wu, S. Kim,N. E. Sosa, N. Papandreou, H. L. Lung, H. Pozidis, E. Eleftheriou, and C. H.Lam. Recent Progress in Phase-Change Memory Technology. IEEE J. Emerg.Sel. Top. Circuits Syst., 6(2):146–162, 2016. doi:10.1109/JETCAS.2016.2547718.

[6] P. Chi, S. Li, and C. Xu. PRIME: A Novel Processing-in-memory Architecturefor Neural Network Computation in ReRAM-based Main Memory. In IEEE Int.Symp. Comput. Archit., Jun. 2016. doi:10.1109/ISCA.2016.13.

[7] I. L. Markov. Limits on fundamental limits to computation. Nature, 512(7513):147–154, Aug. 2014. doi:10.1038/nature13570.

[8] F. Castano, C. A. Ross, A. Eilez, W. Jung, and C. Frandsen. Magnetic configura-tions in 160-520-nm-diameter ferromagnetic rings. Phys. Rev. B, 69(14):144421,Apr. 2004. doi:10.1103/PhysRevB.69.144421.

[9] S. S. P. Parkin and S.-H. Yang. Memory on the racetrack. Nat. Nanotechnol., 10(3):195–198, Mar. 2015. doi:10.1038/nnano.2015.41.

109

http://dx.doi.org/10.1038/ncomms10275

http://dx.doi.org/10.1145/2950067.2950091

http://dx.doi.org/10.1109/JETCAS.2016.2547718

http://dx.doi.org/10.1109/ISCA.2016.13

http://dx.doi.org/10.1038/nature13570

http://dx.doi.org/10.1103/PhysRevB.69.144421

http://dx.doi.org/10.1038/nnano.2015.41

110 BIBLIOGRAPHY

[10] M. R. Stan, P. D. Franzon, S. C. Goldstein, J. C. Lach, and M. M. Ziegler.Molecular electronics: From devices and interconnect to circuits and architecture.Proc. IEEE, 91(11):1940–1957, Nov. 2003. doi:10.1109/JPROC.2003.818327.

[11] M. Sihotang, S. Matsunaga, and T. Hanyu. A fine-grained power gatingscheme of a nonvolatile logic-in-memory circuit for low-power motion-vectorextraction. In IEEE Int. New Circuits Syst. Conf., pages 485–488, 2012.doi:10.1109/NEWCAS.2012.6329062.

[12] S. Dutta, S. A. Siddiqui, J. A. Currivan-Incorvia, C. A. Ross, and M. A.Baldo. Micromagnetic modeling of domain wall motion in sub-100-nm-wide wireswith individual and periodic edge defects. AIP Adv., 5(12):127206, Dec. 2015.doi:10.1063/1.4937557.

[13] J. A. Currivan-Incorvia, S. A. Siddiqui, S. Dutta, E. R. Evarts, C. A. Ross, andM. A. Baldo. Spintronic logic circuit and device prototypes utilizing domainwalls in ferromagnetic wires with tunnel junction readout. In IEEE Int. ElectronDevices Meet., pages 847–850, 2015. doi:10.1109/IEDM.2015.7409817.

[14] J. A. Currivan, Y. Jang, M. D. Mascaro, M. A. Baldo, and C. A. Ross. Lowenergy magnetic domain wall logic in short, narrow, ferromagnetic wires. IEEEMagn. Lett., 3:3000104, Apr. 2012. doi:10.1109/LMAG.2012.2188621.

[15] G. S. D. Beach, M. Tsoi, and J. L. Erskine. Current-induced domain wallmotion. J. Magnetism and Magnetic Materials, 320(7):1272–1281, Apr. 2008.doi:10.1016/j.jmmm.2007.12.021.

[16] J. D. Harms, F. Ebrahimi, X. Yao, and J.-P. Wang. SPICE Macromodel ofMagnetic Tunnel Junctions. IEEE Trans. Electron Devices, 57(6):1425–1430, Jun.2010. doi:10.1109/TED.2010.2047073.

[17] D. Bhowmik, O. Lee, L. You, and S. Salahuddin. Magnetization Switching andDomain Wall Motion Due to Spin-Orbit Torque. In Nanomagnetic and SpintronicDevices for Energy-Efficient Memory and Computing. John Wiley & Sons, Inc.,West Sussex, United Kingdom, 2016. doi:10.1002/9781118869239.ch6.

[18] D. Bhowmik, M. E. Nowakowski, L. You, O. Lee, D. Keating, J. Bokor, andS. Salahuddin. Deterministic Domain Wall Motion Orthogonal To Current FlowDue To Spin Orbit Torque. Sci. Rep., 5:11823, 2015. doi:10.1038/srep11823.

[19] L. Liu, C.-F. Pai, Y. Li, H. W. Tseng, D. C. Ralph, and R. A. Buhrman. Spin-Torque Switching with the Giant Spin Hall Effect of Tantalum. Science, 336(6081):555–558, May 2012. doi:10.1126/science.1218197.

[20] Y. Fan, P. Upadhyaya, X. Kou, M. Lang, S. Takei, Z. Wang, J. Tang, L. He, L.-T.Chang, M. Montazeri, G. Yu, W. Jiang, T. Nie, R. N. Schwartz, Y. Tserkovnyak,

http://dx.doi.org/10.1109/JPROC.2003.818327

http://dx.doi.org/10.1109/NEWCAS.2012.6329062

http://dx.doi.org/10.1063/1.4937557

http://dx.doi.org/10.1109/IEDM.2015.7409817

http://dx.doi.org/10.1109/LMAG.2012.2188621

http://dx.doi.org/10.1016/j.jmmm.2007.12.021

http://dx.doi.org/10.1109/TED.2010.2047073

http://dx.doi.org/10.1002/9781118869239.ch6

http://dx.doi.org/10.1038/srep11823

http://dx.doi.org/10.1126/science.1218197

BIBLIOGRAPHY 111

and K. L. Wang. Magnetization switching through giant spin-orbit torque ina magnetically doped topological insulator heterostructure. Nat. Mater., 13(7):699–704, 2014. doi:10.1038/nmat3973.

[21] S. Emori, D. C. Bono, and G. S. D. Beach. Interfacial current-induced torques inPt/Co/GdOx. Appl. Phys. Lett., 101(4), Jul. 2012. doi:10.1063/1.4737899.

[22] W.-G. Wang, M. Li, S. Hageman, and C. L. Chien. Electric-field-assistedswitching in magnetic tunnel junctions. Nat. Mater., 11(1):64–68, Nov. 2011.doi:10.1038/nmat3171.

[23] R. Heindl, W. H. Rippard, S. E. Russek, M. R. Pufall, and A. B. Kos. Validityof the thermal activation model for spin-transfer torque switching in magnetictunnel junctions. J. Appl. Phys., 109(7):1–5, Apr. 2011. doi:10.1063/1.3562136.

[24] S. Yuasa, T. Nagahama, A. Fukushima, Y. Suzuki, and K. Ando. Giant room-temperature magnetoresistance in single-crystal Fe/MgO/Fe magnetic tunneljunctions. Nat. Mater., 3(12):868–71, Oct. 2004. doi:10.1038/nmat1257.

[25] J. M. Rabaey, A. Chandrakasan, and B. Nikolic. Digital Integrated Circuits.Pearson, Upper Saddle River, New Jersey, 2nd edition, 2003. ISBN 0130909963.

[26] P. Villard, U. Ebels, D. Houssameddine, J. A. Katine, D. Mauri, B. Delaet, P. Vin-cent, M. C. Cyrille, B. Viala, J. P. Michel, J. Prouvee, and F. Badets. A GHzspintronic-based RF oscillator. IEEE J. Solid-State Circuits, 45(1):214–223, Jan.2010. doi:10.1109/JSSC.2009.2034432.

[27] M. R. Stan, M. Kabir, S. Wolf, and J. Lu. Spin torque nano oscillators as keybuilding blocks for the systems-on-chip of the future. In IEEE/ACM Int. Symp.Nanoscale Archit., pages 37–38, Jul. 2014.

[28] C. Patrick Yue and S. Simon Wong. Physical modeling of spiral inductors on sili-con. IEEE Trans. Electron Devices, 47(3):560–568, 2000. doi:10.1109/16.824729.

[29] S. C. Chan, K. L. Shepard, and P. J. Restle. Uniform-phase uniform-amplituderesonant-load global clock distributions. IEEE J. Solid-State Circuits, 40(1):102–109, 2005. doi:10.1109/JSSC.2004.838005.

[30] A. M. Niknejad. Analysis, design, and optimization of spiral inductors and trans-formers for Si RFICs. PhD thesis, University of California, Berkeley, 1998.

[31] N. Kani and A. Naeemi. Pipeline Design in Spintronic Circuits. In IEEE/ACMInt. Symp. Nanoscale Archit., Jul. 2014. doi:10.1109/NANOARCH.2014.6880496.

[32] J. Zhang. Geometrical Control of Domain Walls and the Study of Domain WallProperties of Materials with Perpendicular Magnetic Anisotropy. PhD thesis,Massachusetts Institute of Technology, 2016.

http://dx.doi.org/10.1038/nmat3973

http://dx.doi.org/10.1063/1.4737899


http://dx.doi.org/10.1063/1.3562136


http://dx.doi.org/10.1109/JSSC.2009.2034432

http://dx.doi.org/10.1109/16.824729


http://dx.doi.org/10.1109/NANOARCH.2014.6880496

112 BIBLIOGRAPHY

[33] J. Finley and L. Liu. Spin-Orbit-Torque Efficiency in Compensated Fer-rimagnetic Cobalt-Terbium Alloys. Phys. Rev. Appl., 6(5):1–6, Nov. 2016.doi:10.1103/PhysRevApplied.6.054001.

[34] E. Martinez, S. Emori, N. Perez, L. Torres, and G. S. D. Beach. Current-drivendynamics of Dzyaloshinskii domain walls in the presence of in-plane fields: Fullmicromagnetic and one-dimensional analysis. J. Appl. Phys., 115(21):213909,Jun. 2014. doi:10.1063/1.4881778.

[35] A. Quindeau, C. O. Avci, W. Liu, C. Sun, M. Mann, A. Tang, M. C. On-basli, D. C. Bono, P. M. Voyles, Y. Xu, J. Robinson, G. S. D. Beach,and C. A. Ross. Tm3Fe5O12/Pt heterostructures with perpendicular mag-netic anisotropy for spintronic applications. Adv. Electron. Mater., Dec. 2016.doi:10.1002/aelm.201600376.

[36] C. Kittel. Physical theory of ferromagnetic domains. Rev. Mod. Phys., 21:541–583,Oct. 1949. doi:10.1103/RevModPhys.21.541.

[37] D. A. Allwood, G. Xiong, C. C. Faulkner, and D. Atkinson. Magnetic domain-walllogic. Science, 309(5741):1688–1692, Sep. 2005. doi:10.1126/science.1108813.

[38] S. S. P. Parkin, M. Hayashi, and L. Thomas. Magnetic Domain Wall RacetrackMemory. Science, 320(5873):190–194, Apr. 2008. doi:10.1126/science.1145799.

[39] S. Bandyopadhyay and M. Cahay. Electron spin for classical information process-ing: a brief survey of spin-based logic devices, gates and circuits. Nanotechnology,20(41):412001, Oct. 2009. doi:10.1088/0957-4484/20/41/412001.

[40] J. A. Currivan, S. A. Siddiqui, S. Ahn, L. Tryputen, G. S. D. Beach, M. A. Baldo,and C. A. Ross. Polymethyl methacrylate/hydrogen silsesquioxane bilayer resistelectron beam lithography process for etching 25 nm wide magnetic wires. J. Vac.Sci. Technol. B, 32(2):021601, Mar. 2014. doi:10.1116/1.4867753.

[41] M. Albert, M. Franchin, T. Fischbacher, G. Meier, and H. Fangohr. Domainwall motion in perpendicular anisotropy nanowires with edge roughness. J. Phys.Condens. Matter, 24(2):024219, Jan. 2012. doi:10.1088/0953-8984/24/2/024219.

[42] S. Fukami, Y. Nakatani, T. Suzuki, K. Nagahara, N. Ohshima, and N. Ishiwata.Relation between critical current of domain wall motion and wire dimension inperpendicularly magnetized Co/Ni nanowires. Appl. Phys. Lett., 95(23):232504,Nov. 2009. doi:10.1063/1.3271827.

[43] M.-Y. Im, L. Bocklage, P. Fischer, and G. Meier. Direct Observation of StochasticDomain-Wall Depinning in Magnetic Nanowires. Phys. Rev. Lett., 102(14):147204,Apr. 2009. doi:10.1103/PhysRevLett.102.147204.

http://dx.doi.org/10.1103/PhysRevApplied.6.054001

http://dx.doi.org/10.1063/1.4881778

http://dx.doi.org/10.1002/aelm.201600376

http://dx.doi.org/10.1103/RevModPhys.21.541



http://dx.doi.org/10.1088/0957-4484/20/41/412001

http://dx.doi.org/10.1116/1.4867753

http://dx.doi.org/10.1088/0953-8984/24/2/024219

http://dx.doi.org/10.1063/1.3271827

http://dx.doi.org/10.1103/PhysRevLett.102.147204

BIBLIOGRAPHY 113

[44] M. Klaui. Head-to-head domain walls in magnetic nanostructures. J. Phys. Con-dens. Matter, 20(31):313001, Aug. 2008. doi:10.1088/0953-8984/20/31/313001.

[45] T. Suzuki, S. Fukami, N. Ohshima, K. Nagahara, and N. Ishiwata. Analy-sis of current-driven domain wall motion from pinning sites in nanostrips withperpendicular magnetic anisotropy. J. Appl. Phys., 103(11):113913, Apr. 2008.doi:10.1063/1.2938843.

[46] H. Y. Yuan and X. R. Wang. Domain wall pinning in notched nanowires. Phys.Rev. B, 89(5):054423, Feb. 2014. doi:10.1103/PhysRevB.89.054423.

[47] V. Uhlır, S. Pizzini, N. Rougemaille, J. Novotny, V. Cros, E. Jimenez, G. Faini,L. Heyne, F. Sirotti, C. Tieg, A. Bendounan, F. Maccherozzi, R. Belkhou, J. Grol-lier, A. Anane, and J. Vogel. Current-induced motion and pinning of domain wallsin spin-valve nanowires studied by XMCD-PEEM. Phys. Rev. B, 81(22):224418,Jun. 2010. doi:10.1103/PhysRevB.81.224418.

[48] M. Jamali, K.-J. Lee, and H. Yang. Metastable magnetic domain wall dynamics.New J. Phys., 14(3):033010, Mar. 2012. doi:10.1088/1367-2630/14/3/033010.

[49] E. Martinez, L. Lopez-Diaz, L. Torres, C. Tristan, and O. Alejos. Thermal effectsin domain wall motion: Micromagnetic simulations and analytical model. Phys.Rev. B, 75(17):174409, May 2007. doi:10.1103/PhysRevB.75.174409.

[50] J. Leliaert, B. Van de Wiele, A. Vansteenkiste, L. Laurson, G. Durin, L. Dupre,and B. Van Waeyenberge. Current-driven domain wall mobility in polycrystallinePermalloy nanowires: A numerical study. J. Appl. Phys., 115(23):233903, Jun.2014. doi:10.1063/1.4883297.

[51] Y. Nakatani, A. Thiaville, and J. Miltat. Faster magnetic walls in rough wires.Nat. Mater., 2(8):521–3, Aug. 2003. doi:10.1038/nmat931.

[52] C. Burrowes, D. Ravelosona, C. Chappert, S. Mangin, E. E. Fullerton, J. A.Katine, and B. D. Terris. Role of pinning in current driven domain wall motionin wires with perpendicular anisotropy. Appl. Phys. Lett., 93(17):172513, Sep.2008. doi:10.1063/1.2998393.

[53] S. Glathe and R. Mattheis. Magnetic domain wall pinning by kinks in magneticnanostripes. Phys. Rev. B, 85(2):024405, Jan. 2012. doi:fxmmcq.

[54] T. Komine, H. Murakami, T. Nagayama, and R. Sugita. Influence of Notch Shapeand Size on Current-Driven Domain Wall Motions in a Magnetic Nanowire. IEEETrans. Magn., 44(11):2516–2518, Nov. 2008. doi:10.1109/TMAG.2008.2002614.

[55] S. Lepadatu, a. Vanhaverbeke, D. Atkinson, R. Allenspach, and C. Marrows. De-pendence of Domain-Wall Depinning Threshold Current on Pinning Profile. Phys.Rev. Lett., 102(12):127203, Mar. 2009. doi:10.1103/PhysRevLett.102.127203.

http://dx.doi.org/10.1088/0953-8984/20/31/313001

http://dx.doi.org/10.1063/1.2938843



http://dx.doi.org/10.1088/1367-2630/14/3/033010


http://dx.doi.org/10.1063/1.4883297


http://dx.doi.org/10.1063/1.2998393

http://dx.doi.org/fxmmcq

http://dx.doi.org/10.1109/TMAG.2008.2002614


114 BIBLIOGRAPHY

[56] J. Ryu and H.-W. Lee. Current-induced domain wall motion: Domain wall veloc-ity fluctuations. J. Appl. Phys., 105(9):093929, May 2009. doi:10.1063/1.3125522.

[57] OOMMF: Object Oriented MicroMagnetic Framework, 2016. URL http://math.

nist.gov/oommf.

[58] D. Petit, A.-V. Jausovec, H. Zeng, E. Lewis, L. O’Brien, D. Read, and R. P. Cow-burn. Mechanism for domain wall pinning and potential landscape modificationby artificially patterned traps in ferromagnetic nanowires. Phys. Rev. B, 79(21):214405, Jun. 2009. doi:10.1103/PhysRevB.79.214405.

[59] A. Thiaville, Y. Nakatani, J. Miltat, and Y. Suzuki. Micromagnetic understandingof current-driven domain wall motion in patterned nanowires. Europhys. Lett., 69(6):990–996, Mar. 2005. doi:10.1209/epl/i2004-10452-6.

[60] M. Laufenberg, W. Buhrer, D. Bedau, P.-E. Melchy, M. Klaui, L. Vila, G. Faini,C. a. F. Vaz, J. A. C. Bland, and U. Rudiger. Temperature Dependence of theSpin Torque Effect in Current-Induced Domain Wall Motion. Phys. Rev. Lett.,97(4):046602, Jul. 2006. doi:10.1103/PhysRevLett.97.046602.

[61] X. Han, Q. Liu, J. Wang, S. Li, Y. Ren, R. Liu, and F. Li. Influence of crystalorientation on magnetic properties of hcp Co nanowire arrays. J. Phys. D. Appl.Phys., 42(9):095005, Apr. 2009. doi:10.1088/0022-3727/42/9/095005.

[62] H. Luo, D. Wang, J. He, and Y. Lu. Magnetic cobalt nanowire thin films. J.Phys. Chem. B, 109(5):1919–1922, Jan. 2005. doi:10.1021/jp045554t.

[63] S. Armyanov. Crystallographic structure and magnetic properties of electrode-posited cobalt and cobalt alloys. Electrochim. Acta, 45(20):3323–3335, Jun. 2000.doi:10.1016/S0013-4686(00)00408-4.

[64] R. C. O’Handley. Modern Magnetic Materials: Principles and Applications.Wiley-Interscience, Hoboken, New Jersey, 1999.

[65] S. Li, C. Potter, D. Palmer, D. D. Eberl, T. Klemmer, J. Spear, C. Reiss,D. Brown, and A. Morrone. Determination of grain size distributions in magneticrecording media by grazing incidence X-ray diffraction. IEEE Trans. Magn., 37(4):2000–2002, Jul. 2001. doi:10.1109/20.951017.

[66] S. Fukami, T. Suzuki, Y. Nakatani, N. Ishiwata, M. Yamanouchi, S. Ikeda,N. Kasai, and H. Ohno. Current-induced domain wall motion in perpendicu-larly magnetized CoFeB nanowire. Appl. Phys. Lett., 98(8):082504, Feb. 2011.doi:10.1063/1.3558917.

[67] R. Mantovan, A. Lamperti, G. Tallarida, L. Baldi, M. Mariani, B. Ocker, S. Ahn,I. Barisic, and D. Ravelosona. Perpendicular magnetic anisotropy in ta/cofeb/mgo

http://dx.doi.org/10.1063/1.3125522

http://math.nist.gov/oommf

http://math.nist.gov/oommf


http://dx.doi.org/10.1209/epl/i2004-10452-6


http://dx.doi.org/10.1088/0022-3727/42/9/095005

http://dx.doi.org/10.1021/jp045554t

http://dx.doi.org/10.1016/S0013-4686(00)00408-4

http://dx.doi.org/10.1109/20.951017

http://dx.doi.org/10.1063/1.3558917

BIBLIOGRAPHY 115

systems synthesized on treated sin/sio2 substrates for magnetic memories. ThinSolid Films, 533:75–78, Apr. 2013. doi:10.1016/j.tsf.2012.12.111.

[68] E. R. Lewis, D. Petit, A.-V. Jausovec, L. O. Brien, D. E. Read, H. T. Zeng, andR. P. Cowburn. Measuring Domain Wall Fidelity Lengths Using a Chirality Filter.Phys. Rev. Lett., 102:057209, Feb. 2009. doi:10.1103/PhysRevLett.102.057209.

[69] D. L. Mills and J. A. C. Bland. Nanomagnetism: Ultrathin Films, Multilayersand Nanostructures. Elsevier, Amsterdam, The Netherlands, 2006.

[70] M. Hayashi, L. Thomas, C. Rettner, R. Moriya, X. Jiang, and S. S. P. Parkin.Dependence of Current and Field Driven Depinning of Domain Walls on TheirStructure and Chirality in Permalloy Nanowires. Phys. Rev. Lett., 97(20):207205,Nov. 2006. doi:10.1103/PhysRevLett.97.207205.

[71] E. R. Lewis, D. Petit, L. O. Brien, J. Sampaio, A.-V. Jausovec, H. T. Zeng, D. E.Read, and R. P. Cowburn. Fast domain wall motion in magnetic comb structures.Nat. Mater., 9:980–983, Oct. 2010. doi:10.1038/NMAT2857.

[72] J.-L. Tsai, D. H. Baik, C. C.-P. Chen, and K. K. Saluja. A yield im-provement methodology using pre- and post-silicon statistical clock schedul-ing. In IEEE/ACM Int. Conf. Comput. Aided Des., pages 611–618, Nov. 2004.doi:10.1109/ICCAD.2004.1382649.

[73] V. B. Suresh and W. P. Burleson. Variation Aware Design of Post-Silicon TunableClock Buffer. In IEEE Comput. Soc. Annu. Symp. VLSI, pages 1–6, Jul. 2014.doi:10.1109/ISVLSI.2014.95.

[74] Y. Elboim, A. Kolodny, and R. Ginosar. A Clock-Tuning Circuit for System-on-Chip. IEEE Trans. Very Large Scale Integr. Syst., 11(4):616–626, Sep. 2003.doi:10.1109/TVLSI.2003.812371.

[75] C.-Y. Yeh and M. Marek-Sadowska. Skew-programmable Clock Design for FPGAand Skew-aware Placement. In ACM/SIGDA Int. Symp. Field-ProgrammableGate Arrays, pages 33–40, Feb. 2005. doi:10.1145/1046192.1046198.

[76] Y. Chen, W.-F. Wong, H. Li, C.-K. Koh, Y. Zhang, and W. Wen. On-chipcaches built on multilevel spin-transfer torque RAM cells and its optimizations.J. Emerg. Technol. Comput. Syst., 9(2), 2013. doi:10.1145/2463585.2463592.

[77] V. Srinivasan, D. Graham, and P. Hasler. Floating-gates transistors for precisionanalog circuit design: an overview. In Midwest Symp. Circuits Syst., pages 71–74Vol. 1, 2005. doi:10.1109/MWSCAS.2005.1594042.

[78] W. Kang, L. Zhang, J.-O. Klein, Y. Zhang, D. Ravelosona, and W. Zhao.Reconfigurable Codesign of STT-MRAM Under Process Variations in Deeply

http://dx.doi.org/10.1016/j.tsf.2012.12.111



http://dx.doi.org/10.1038/NMAT2857

http://dx.doi.org/10.1109/ICCAD.2004.1382649

http://dx.doi.org/10.1109/ISVLSI.2014.95

http://dx.doi.org/10.1109/TVLSI.2003.812371

http://dx.doi.org/10.1145/1046192.1046198

http://dx.doi.org/10.1145/2463585.2463592

http://dx.doi.org/10.1109/MWSCAS.2005.1594042

116 BIBLIOGRAPHY

Scaled Technology. IEEE Trans. Electron Devices, 62(6):1769–1777, Mar. 2015.doi:10.1109/TED.2015.2412960.

[79] D. Suzuki, M. Natsui, T. Endoh, H. Ohno, and T. Hanyu. Six-input lookup tablecircuit with 62 fewer transistors using nonvolatile logic-in-memory architecturewith series/parallel-connected magnetic tunnel junctions. J. Appl. Phys., 111:07E318, Feb. 2012. doi:10.1063/1.3672411.

[80] W. Zhao and G. Prenat. Spintronics-based Computing. Springer, Cham, Switzer-land, 2015. doi:10.1007/978-3-319-15180-9.

[81] C. Xu, D. Niu, Y. Zheng, S. Yu, and Y. Xie. Impact of Cell Failure on ReliableCross-Point Resistive Memory Design. ACM Trans. Des. Autom. Electron. Syst.,20(4):1–21, Sep. 2015. doi:10.1145/2753759.

[82] W. Xu, T. Zhang, and Y. Chen. Design of spin-torque transfer magnetoresistiveRAM and CAM/TCAM with high sensing and search speed. IEEE Trans. VeryLarge Scale Integr. Syst., 18(1):66–74, Jan. 2010.

[83] W. Zhao, E. Belhaire, and C. Chappert. A spin-MTJ based non-volatileflip-flop. In IEEE Int. Conf. on Nanotechnol., pages 399–402, Aug. 2007.doi:10.1109/NANO.2007.4601218.

[84] C. T. Cheng, Y. C. Tsai, and K. H. Cheng. A high-speed current mode sense am-plifier for spin torque transfer magnetic random access memory. In Midwest Symp.Circuits Syst., pages 181–184, Aug. 2010. doi:10.1109/MWSCAS.2010.5548588.

[85] A. Chattopadhyay and Z. Zilic. Built-in clock skew system for on-line de-bug and repair. In Des. Autom. Test Europe, pages 248–251, Mar. 2008.doi:10.1109/DATE.2008.4484890.

[86] S. Paul, R. Karam, S. Bhunia, and R. Puri. Energy-efficient hardware accelerationthrough computing in the memory. In Des. Autom. Test Europe, page 266, Mar.2014. doi:10.7873/DATE2014.279.

[87] E. Kultursay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu. Evaluat-ing STT-RAM as an energy-efficient main memory alternative. In IEEE Int.Symp. Perform. Analysis of Systems and Software, pages 256–267, Apr. 2013.doi:10.1109/ISPASS.2013.6557176.

[88] D. Suzuki, M. Natsui, A. Mochizuki, S. Miura, H. Honjo, H. Sato, S. Fukami,S. Ikeda, T. Endoh, H. Ohno, and T. Hanyu. Fabrication of a 3000-6-Input-LUTsEmbedded and Block-Level Power-Gated Nonvolatile FPGA Chip Using p-MTJ-Based Logic-in-Memory Structure. In Symp. VLSI Circuits, pages 172–173, Jun.2015.

http://dx.doi.org/10.1109/TED.2015.2412960

http://dx.doi.org/10.1063/1.3672411

http://dx.doi.org/10.1007/978-3-319-15180-9

http://dx.doi.org/10.1145/2753759

http://dx.doi.org/10.1109/NANO.2007.4601218

http://dx.doi.org/10.1109/MWSCAS.2010.5548588

http://dx.doi.org/10.1109/DATE.2008.4484890

http://dx.doi.org/10.7873/DATE2014.279

http://dx.doi.org/10.1109/ISPASS.2013.6557176

BIBLIOGRAPHY 117

[89] Y. Zhang, L. Zhang, W. Wen, G. Sun, and Y. Chen. Multi-level cell STT-RAM:Is it realistic or just a dream? In IEEE/ACM Int. Conf. Comput. Des., pages526–532, Nov. 2012. doi:10.1145/2429384.2429498.

[90] X. Jiang, L. Thomas, R. Moriya, and S. S. P. Parkin. Discrete domain wallpositioning due to pinning in current driven motion along nanowires. Nano Lett.,11(1):96–100, Dec. 2011. doi:10.1021/nl102890h.

[91] M. Hayashi, L. Thomas, C. Rettner, R. Moriya, and S. S. P. Parkin. Directobservation of the coherent precession of magnetic domain walls propagating alongpermalloy nanowires. Nat. Phys., 3:21–25, Jan. 2007. doi:10.1038/nphys464.

[92] X. P. Huang, Z. L. Shi, M. Wang, M. Konoto, H. S. Zhou, G. B. Ma, D. Wu,R. Peng, and N. B. Ming. Formation of regular magnetic domains on sponta-neously nanostructured cobalt filaments. Adv. Mater., 22(24):2711–2716, Apr.2010. doi:10.1002/adma.200904066.

[93] T. Koyama, D. Chiba, K. Ueda, K. Kondou, H. Tanigawa, S. Fukami, T. Suzuki,N. Ohshima, N. Ishiwata, Y. Nakatani, K. Kobayashi, and T. Ono. Observationof the intrinsic pinning of a magnetic domain wall in a ferromagnetic nanowire.Nat. Mater., 10(3):194–7, Feb. 2011. doi:10.1038/nmat2961.

[94] G. Palasantzas. Roughness spectrum and surface width of self-affine fractal sur-faces via the K-correlation model. Phys. Rev. B, 48(19):14472–14478, Nov. 1993.doi:10.1103/PhysRevB.48.14472.

[95] R. Pike and P. Sabatier. Scattering: scattering and inverse scattering in pure andapplied science. Elsevier, 2002.

[96] J. S. Urbach, R. C. Madison, and J. T. Markert. Interface depinning, self-organized criticality, and the Barkhausen effect. Phys. Rev. Lett., 75(2):276–279,Jul. 1995. doi:10.1103/PhysRevLett.75.276.

[97] A. Majumdar and C. L. Tien. Fractal characterization and simulation of roughsurfaces. Wear, 136(2):313–327, Mar. 1990. doi:10.1016/0043-1648(90)90154-3.

[98] A. Hiraiwa and A. Nishida. Statistical-noise effect on autocorrelation functionof line-edge and line-width roughness. J. Vac. Sci. Technol. B, 28(6):1242, Nov.2010. doi:10.1116/1.3514206.

[99] Z. Moktadir, B. Darquie, M. Krafty, E. A. Hinds, M. Kraft, and E. A. Hinds.The effect of self-affine fractal roughness of wires on atom chips. J. Mod. Opt.,54(13-15):2149–2160, Sep. 2007. doi:10.1080/09500340701427151.

[100] H. Ji and M. O. Robbins. Percolative, self-affine, and faceted domain growthin random three-dimensional magnets. Phys. Rev. B, 46(22):160–193, Dec. 1992.doi:10.1103/PhysRevB.46.14519.

http://dx.doi.org/10.1145/2429384.2429498

http://dx.doi.org/10.1021/nl102890h

http://dx.doi.org/10.1038/nphys464

http://dx.doi.org/10.1002/adma.200904066




http://dx.doi.org/10.1016/0043-1648(90)90154-3

http://dx.doi.org/10.1116/1.3514206

http://dx.doi.org/10.1080/09500340701427151


118 BIBLIOGRAPHY

[101] J. Zhang, S. A. Siddiqui, P. Ho, J. A. Currivan-Incorvia, L. Tryputen, E. Lage,D. C. Bono, M. A. Baldo, and C. A. Ross. 360 Domain Walls: Stability, Mag-netic Field and Electric Current Effects. New J. Phys., 18(5):053028, May 2016.doi:10.1088/1367-2630/18/5/053028.

[102] L. J. McGilly, P. Yudin, L. Feigl, A. K. Tagantsev, and N. Setter. Controllingdomain wall motion in ferroelectric thin films. Nat. Nanotechnol., 10(2):145–50,Jan. 2015. doi:10.1038/nnano.2014.320.

[103] J. Rothman, M. Klaui, L. Lopez-Diaz, C. A. F. Vaz, A. Bleloch, J. A. C. Bland,Z. Cui, and R. Speaks. Observation of a Bi-domain state and nucleation freeswitching in mesoscopic ring magnets. Phys. Rev. Lett., 86(6):1098–1101, Feb.2001. doi:10.1103/PhysRevLett.86.1098.

[104] S. P. Li, D. Peyrade, M. Natali, A. Lebib, Y. Chen, U. Ebels, L. D. Buda, andK. Ounadjela. Flux closure structures in cobalt rings. Phys. Rev. Lett., 86(6):1102–1105, Feb. 2001. doi:10.1103/PhysRevLett.86.1102.

[105] Y. Jang, S. R. Bowden, M. Mascaro, J. Unguris, and C. A. Ross. Formationand structure of 360 and 540 degree domain walls in thin magnetic stripes. Appl.Phys. Lett., 100(6):2–5, Feb. 2012. doi:10.1063/1.3681800.

[106] W. H. Choi, Y. Lv, J. Kim, A. Deshpande, G. Kang, J. P. Wang, and C. H. Kim.A Magnetic Tunnel Junction based True Random Number Generator with condi-tional perturb and real-time output probability tracking. In IEEE Int. ElectronDevices Meet., pages 12.5.1–12.5.4, Feb. 2015. doi:10.1109/IEDM.2014.7047039.

[107] U. Bauer, S. Emori, and G. S. D. Beach. Voltage-controlled domain walltraps in ferromagnetic nanowires. Nat. Nanotechnol., 8(6):411–6, May 2013.doi:10.1038/nnano.2013.96.

[108] S. Amarel, G. Cooke, and R. O. Winder. Majority gate networks. IEEE Trans.Electron. Comput., EC-13(1):4–13, Feb. 1964. doi:10.1109/PGEC.1966.264384.

[109] B. Behin-Aein, D. Datta, S. Salahuddin, and S. Datta. Proposal for an all-spinlogic device with built-in memory. Nat. Nanotechnol., 5(4):266–70, Apr. 2010.doi:10.1038/nnano.2010.31.

[110] S. Ghosh. Spintronics and security: Prospects, vulnerabilities, attack models, andpreventions. Proc. IEEE, 104(10), Oct. 2016.

[111] K. A. Omari and T. J. Hayward. Chirality-based vortex domain-wall logic gates.Phys. Rev. Appl., 2(4):1–9, 2014. doi:10.1103/PhysRevApplied.2.044001.

[112] L. E. Bassham III, A. L. Rukhin, J. Soto, J. R. Nechvatal, M. E. Smid, E. B.Barker, S. D. Leigh, M. Levenson, M. Vangel, D. L. Banks, N. A. Heckert, J. F.

http://dx.doi.org/10.1088/1367-2630/18/5/053028




http://dx.doi.org/10.1063/1.3681800



http://dx.doi.org/10.1109/PGEC.1966.264384



BIBLIOGRAPHY 119

Dray, and S. Vo. A statistical test suite for random and pseudorandom numbergenerators for cryptographic applications. Technical report, National Institute ofStandards & Technology, Gaithersburg, Maryland, Sep. 2010.

[113] S. Moretti, V. Raposo, and E. Martinez. Influence of Joule heating on current-induced domain wall depinning. J. Appl. Phys., 213902(21):213902, Jun. 2016.doi:10.1063/1.4953008.

[114] G. W. Burr, P. Narayanan, R. M. Shelby, S. Sidler, I. Boybat, C. Di Nolfo, andY. Leblebici. Large-scale neural networks implemented with non-volatile memoryas the synaptic weight element: Comparative performance analysis (accuracy,speed, and power). In IEEE Int. Electron Devices Meet., pages 4.4.1–4.4.4, Dec.2015. doi:10.1109/IEDM.2015.7409625.

[115] K.-S. Ryu, L. Thomas, S.-H. Yang, and S. S. P. Parkin. Chiral spin tor-que at magnetic domain walls. Nat. Nanotechnol., 8(7):527–33, Jun. 2013.doi:10.1038/nnano.2013.102.

[116] B. Li, P. Gu, Y. Wang, and H. Yang. Exploring the Precision Limitation forRRAM-Based Analog Approximate Computing. IEEE Des. Test, 33(1):51–58,Feb. 2016. doi:10.1109/MDAT.2015.2487218.

[117] A. Sengupta, Y. Shim, and K. Roy. Proposal for an All-Spin Artificial NeuralNetwork: Emulating Neural and Synaptic Functionalities Through Domain WallMotion in Ferromagnets. IEEE Trans. Biomed. Circuits Syst., 10(6):1152–1160,Dec. 2016. doi:10.1109/TBCAS.2016.2525823.

[118] A. Vansteenkiste, J. Leliaert, M. Dvornik, M. Helsen, F. Garcia-Sanchez, andB. Van Waeyenberge. The design and verification of MuMax3. AIP Adv., 4(10):107133, Oct. 2014. doi:10.1063/1.4899186.

[119] A. Sengupta, A. Banerjee, and K. Roy. Hybrid Spintronic-CMOS Spiking NeuralNetwork with On-Chip Learning: Devices, Circuits and Systems. Phys. Rev.Appl., 6:064003, Nov. 2016. doi:10.1103/PhysRevApplied.6.064003.

[120] J. Li, C. I. Wu, S. C. Lewis, J. Morrish, T. Y. Wang, R. Jordan, T. Maffitt,M. Breitwisch, A. Schrott, R. Cheek, H. L. Lung, and C. Lam. A novel recon-figurable sensing scheme for variable level storage in phase change memory. InIEEE Int. Mem. Work., pages 1–4, May 2011. doi:10.1109/IMW.2011.5873227.

[121] E. Martinez, S. Emori, and G. S. D. Beach. Current-driven domain wall motionalong high perpendicular anisotropy multilayers: The role of the Rashba field, thespin Hall effect, and the Dzyaloshinskii-Moriya interaction. Appl. Phys. Lett., 103(7):072406, Aug. 2013. doi:10.1063/1.4818723.

[122] M. Courbariaux and Y. Bengio. BinaryNet: Training Deep Neural Networks withWeights and Activations Constrained to +1 or -1. arXiv, page 9, Feb. 2016.

http://dx.doi.org/10.1063/1.4953008



http://dx.doi.org/10.1109/MDAT.2015.2487218

http://dx.doi.org/10.1109/TBCAS.2016.2525823

http://dx.doi.org/10.1063/1.4899186


http://dx.doi.org/10.1109/IMW.2011.5873227

http://dx.doi.org/10.1063/1.4818723

120 BIBLIOGRAPHY

[123] S. Ikeda, J. Hayakawa, Y. Ashizawa, Y. M. Lee, K. Miura, H. Hasegawa, M. Tsun-oda, F. Matsukura, and H. Ohno. Tunnel magnetoresistance of 604% at 300 K bysuppression of Ta diffusion in CoFeB/MgO/CoFeB pseudo-spin-valves annealedat high temperature. Appl. Phys. Lett., 93(8):082508, Aug. 2008. ISSN 00036951.doi:10.1063/1.2976435.

[124] Y.-H. Chen, T. Krishna, J. Emer, and V. Sze. Eyeriss: An Energy-EfficientReconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J.Solid-State Circuits, 52(1):127–138, Jan. 2017. doi:10.1109/JSSC.2016.2616357.

[125] S. Dutta and V. Stojanovic. Floating-point unit design with nano-electro-mechanical (NEM) relays. In IEEE/ACM Int. Symp. Nanoscale Archit., pages145–150, Jul. 2014. doi:10.1109/NANOARCH.2014.6880487.

[126] P. Li, T. Liu, H. Chang, A. Kalitsov, W. Zhang, G. Csaba, W. Li, D. Richardson,A. DeMann, G. Rimal, H. Dey, J. S. Jiang, W. Porod, S. B. Field, J. Tang,M. C. Marconi, A. Hoffmann, O. Mryasov, and M. Wu. Spin-orbit-torque-assistedswitching in magnetic insulator thin films with perpendicular magnetic anisotropy.Nat. Commun., 7:12688, Sep. 2016. doi:10.1038/ncomms12688.

[127] D. I. Paul. Interaction of antiferromagnetic spin waves with a bloch wall. Phys.Rev., 126(1):78–82, Apr. 1962. doi:10.1103/PhysRev.126.78.

[128] D. I. Paul. Spin waves and nuclear magnetic resonance in NiF2 domain walls.Phys. Rev., 131(1):178–182, Jul. 1962.

[129] T. Suzuki, R. Chisnell, A. Devarakonda, Y.-T. Liu, W. Feng, D. Xiao, J. W.Lynn, and J. G. Checkelsky. Large anomalous Hall effect in a half-Heusler anti-ferromagnet. Nat. Phys., pages 1–6, Jul. 2016. doi:10.1038/nphys3831.

[130] H.-S. P. Wong and S. Salahuddin. Memory leads the way to better computing.Nat. Nanotechnol., 10:191–194, Mar. 2015. doi:10.1038/nnano.2015.29.

[131] B. V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A. R. Chandrasekaran, J. M.Bussat, R. Alvarez-Icaza, J. V. Arthur, P. A. Merolla, and K. Boahen. Neurogrid:A mixed-analog-digital multichip system for large-scale neural simulations. Proc.IEEE, 102(5):699–716, May 2014. doi:10.1109/JPROC.2014.2313565.

[132] H. W. Lin and M. Tegmark. Why does deep and cheap learning work so well?arXiv, May 2017.

http://dx.doi.org/10.1063/1.2976435


http://dx.doi.org/10.1109/NANOARCH.2014.6880487

http://dx.doi.org/10.1038/ncomms12688

http://dx.doi.org/10.1103/PhysRev.126.78

http://dx.doi.org/10.1038/nphys3831


http://dx.doi.org/10.1109/JPROC.2014.2313565

Date post:	24-Jan-2022
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Magnetic Logic Circuits with High Bit Resolution for ...

Documents