+ All Categories
Home > Documents > AC-DIMM: Associative Computing with STT-MRAM

AC-DIMM: Associative Computing with STT-MRAM

Date post: 18-Dec-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
27
AC-DIMM: Associative Computing with STT-MRAM Qing Guo, Xiaochen Guo, Ravi Patel Engin Ipek, Eby G. Friedman University of Rochester Published In: ISCA-2013 Mustafa Shihab: 02/28/2014
Transcript

AC-DIMM: Associative Computing with STT-MRAM

Qing Guo, Xiaochen Guo, Ravi Patel

Engin Ipek, Eby G. Friedman

University of Rochester

Published In: ISCA-2013

Mustafa Shihab: 02/28/2014

Prevalent Trends in Modern Computing: 1. Technology Scaling > Creates Power and Bandwidth Challenges - Transistor density doubles every two years, but power efficiency does not scale proportionally - Number of pins grows approximately at 16% / year only 2. Data-Intensive Work Load

Resultant Bottlenecks: - On-Chip Power Dissipation - Off-Chip Memory Bandwidth

One Promising Solution: Associative Computing Using Content-Addressable Memories (CAM)

Motivation

Mustafa Shihab: 02/28/2014

Content Addressable Memory

• Simultaneously compares all stored keys against a search key

• Energy- and bandwidth-efficient on an important subset of data-intensive applications

CAM Array

Stored keys

Search Key

Matchlines

Searchlines

Number of matches

Highest priority match

Mustafa Shihab: 02/28/2014

Current Challenges With CAMs

• Commercial uses of CAMs are limited – Highly associative caches, TLBs

– Microarchitectural queues

– Networking routers

CMOS-based CAMs are large, costly, and power-hungry

[Goel and Gupta, 2010]

Mustafa Shihab: 02/28/2014

Resistive CAMs • Resistive memories (e.g., PCM and STT-MRAM) offer high density

and very low leakage power

• Previously proposed PCM-based TCAM accelerator [MICRO’11]

– A gigabyte, DDR3-compatible DIMM

– TCAM caters to a wide range of search-intensive applications

Processor

Search key Number of matches

Highest priority match

+ Optimized for density

and search throughput

− limited functionality

Mustafa Shihab: 02/28/2014

Associative Computing Paradigm • Broadens the use of CAMs to a more general programming

framework

• Data organized by key-value pairs

– Linked list, array, stack, queue

– Matrix, tree, graph

L. Potter, “ASC: an associative-computing paradigm”, 1994

a b

c d

Mustafa Shihab: 02/28/2014

AC-DIMM • AC-DIMM combines associative lookup and processing in

memory

Processor

Microcontroller

runs user-defined kernels

STT-MRAM Array

enhanced endurance

lower write energy

shorter access latency

Key-value co-location

<keys, ops>

results

Mustafa Shihab: 02/28/2014

System Interface • AC-DIMM is a DDR3 compatible module

Processor

AC-DIMM

DDR3 bus Chips

DDR3 bus DRAM DIMMs

Mustafa Shihab: 02/28/2014

Programming Model • Program accesses AC-DIMM via a user-level library

Array

μCode

μCtrl

Key

Search Search

Search Search

Processor

Mustafa Shihab: 02/28/2014

Programming Model • Program accesses AC-DIMM via a user-level library

Array

μCode

μCtrl

Result Processor

Mustafa Shihab: 02/28/2014

Array Organization • Memory row can be searched, read, and written

• Co-locate key-value pairs in the same row

Search

Re

ad

/ W

rite

Mustafa Shihab: 02/28/2014

Bit-Serial Search • Progressively searches column-by-column across the array

• Improves power efficiency and simplifies cell structure

Value Key

Example: search for 011

1

1

1

Mustafa Shihab: 02/28/2014

Bit-Serial Search • Progressively searches column-by-column across the array

• Improves power efficiency and simplifies cell structure

Value Key

Example: search for 011

0

1

0

0

Mustafa Shihab: 02/28/2014

Bit-Serial Search • Progressively searches column-by-column across the array

• Improves power efficiency and simplifies cell structure

Value Key

Example: search for 011

0

1

0

1

Mustafa Shihab: 02/28/2014

Bit-Serial Search • Progressively searches column-by-column across the array

• Improves power efficiency and simplifies cell structure

Value Key

Example: search for 011

0

1

0

1

Match

Mustafa Shihab: 02/28/2014

Microcontroller • Microcontroller runs user-defined kernel on the matching rows

4 arrays share a μController

A total of 64 μControllers on a

256Mb chip (4% area)

Reduction tree

Mustafa Shihab: 02/28/2014

AC-DIMM Cell Structure -- 2T1R CAM Cell Using STT-MRAM

• Data is stored in a magnetic tunnel junction (MTJ)

MTJ

Bitlines act as read and write ports

Matchline acts as a search port Mustafa Shihab: 02/28/2014

Reading • Stored data is read by bitline sense amps

VDD

IREAD

Bitline sense amplifier

Mustafa Shihab: 02/28/2014

Writing • Programming an MTJ requires a bi-directional write current

ISET

IRESET

Mustafa Shihab: 02/28/2014

Writing

• Resetting an AC-DIMM cell

• Setting an AC-DIMM cell

VDD VDD

VDD

Mustafa Shihab: 02/28/2014

Searching • Accomplished by reading a column of bits, and

comparing against the search key

• Outputs a 1 on a match, a 0 otherwise

VDD

ISEARCH Matchline sense amplifier

XNOR

Mustafa Shihab: 02/28/2014

Experimental Setup • System configuration

– Processor: 8 cores, 4GHz

– Memory bus: DDR3-1066

• Simulation tools – Cadence (Spectre), Encounter RTL Compiler with FreePDK

– SESC simulator

• Applications – NuMineBench

– MiBench

– Phoenix

– SPEC INT 2000

Mustafa Shihab: 02/28/2014

System Performance

• AC-DIMM outperforms the previous TCAM-DIMM when search key is short (<32 bits)

AC-DIMM

TCAM-DIMM

Mustafa Shihab: 02/28/2014

System Performance AC-DIMM only

• AC-DIMM caters to a broader range of applications

Mustafa Shihab: 02/28/2014

System Energy AC-DIMM only

• Dynamic energy saved by eliminating data movement • Leakage energy saved by reducing execution time

AC-DIMM

TCAM-DIMM

Mustafa Shihab: 02/28/2014

Summary

• AC-DIMM is an STT-MRAM based compute engine

– DDR3 compatible module

– Applicable to other RAM-based technologies

– Integrates programmable microcontrollers

– Co-locates key-value pairs

• Improves energy and bandwidth efficiency

– Eliminates unnecessary data movement

– Reduces instruction and address processing overheads

Mustafa Shihab: 02/28/2014

Thank you

Mustafa Shihab: 02/28/2014


Recommended