+ All Categories
Home > Documents > 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447...

18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447...

Date post: 24-Jun-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
42
18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon University Spring 2015, 4/13/2015
Transcript
Page 1: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

18-447 Computer Architecture

Lecture 30: In-memory Processing

Vivek Seshadri Carnegie Mellon University Spring 2015, 4/13/2015

Page 2: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Goals  for  This  Lecture  

•  Understand  DRAM  technology  – How  it  is  built?  – How  it  operates?  – What  are  the  trade-­‐offs?  

•  Can  we  use  DRAM  for  more  than  just  storage?  –  In-­‐DRAM  copying  –  In-­‐DRAM  bitwise  operaCons  

2  

Page 3: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

DRAM  Module  and  Chip  

3  

Page 4: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Goals  of  DRAM  Design  

•  Cost  •  Latency  •  Bandwidth  •  Parallelism  •  Power  •  Energy  •  Reliability  

4  

Page 5: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

DRAM  Chip  

5  

Bank  

Page 6: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

DRAM  Cell  –  Capacitor  

6  

Empty  State   Fully  Charged  State  

Logical  “0”   Logical  “1”  

1

2

Small  –  Cannot  drive  circuits  

Reading  destroys  the  state  

Page 7: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Sense  Amplifier  

7  

enable  

top  

bo*om  

Inverter  

Page 8: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Sense  Amplifier  –  Two  Stable  States  

8  

en   en  

0  

0  VDD  

VDD  

Logical  “1”   Logical  “0”  

Page 9: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Sense  Amplifier  OperaMon  

9  

dis  

VT  

VB  

VT    >  VB  en  

0  

VDD  

Page 10: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Capacitor  to  Sense  Amplifier  

10  

en  

0  

VDD  

en  

VDD  

0  ?  

Page 11: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

DRAM  Cell  OperaMon  

11  

½VDD  

½VDD  

dis  en  

0  

VDD  ½VDD+δ  Cell  loses  charge  

Cell  regains  charge  

Page 12: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

AmorMzing  Cost  –  DRAM  Tile  

12  

Row  Driv

er  

Page 13: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

DRAM  Subarray  

13  

Row  Driv

er  

Tile   Tile   Tile  

Row  Decod

er  

Page 14: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

DRAM  Subarray  

14  

Row  Driv

er  

Tile   Tile   Tile  

Row  Decod

er  

Tile  Tile  Tile  Tile  

Page 15: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

DRAM  Bank  

15  

Row  Decod

er  

Array  of  Sense  Amplifiers  (8Kb)  

Cell  Array  

Cell  Array  

Row  Decod

er  

Array  of  Sense  Amplifiers  

Cell  Array  

Cell  Array  

Bank  I/O  (64b)  

Address  

Address  Data  

Page 16: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

DRAM  Chip  

16  

Shared  internal  bus  

Memory  channel  -­‐  8bits  

Row  Decoder  

Array  of  Sense  Amplifiers  (8Kb)  

Cell  Array  

Cell  Array  

Row  Decoder  

Array  of  Sense  Amplifiers  

Cell  Array  

Cell  Array  

Bank  I/O  (64b)  

Row  Decoder  

Array  of  Sense  Amplifiers  (8Kb)  

Cell  Array  

Cell  Array  

Row  Decoder  

Array  of  Sense  Amplifiers  

Cell  Array  

Cell  Array  

Bank  I/O  (64b)  

Row  Decoder  

Array  of  Sense  Amplifiers  (8Kb)  

Cell  Array  

Cell  Array  

Row  Decoder  

Array  of  Sense  Amplifiers  

Cell  Array  

Cell  Array  

Bank  I/O  (64b)  

Row  Decoder  

Array  of  Sense  Amplifiers  (8Kb)  

Cell  Array  

Cell  Array  

Row  Decoder  

Array  of  Sense  Amplifiers  

Cell  Array  

Cell  Array  

Bank  I/O  (64b)  

Row  Decoder  

Array  of  Sen

se  Amplifiers  (8K

b)  

Cell  Array  

Cell  Array  

Row  Decoder  

Array  of  Sen

se  Amplifiers  

Cell  Array  

Cell  Array  

Bank

 I/O  (6

4b)  

Row  Decoder  

Array  of  Sen

se  Amplifiers  (8K

b)  

Cell  Array  

Cell  Array  

Row  Decoder  

Array  of  Sen

se  Amplifiers  

Cell  Array  

Cell  Array  

Bank

 I/O  (6

4b)  

Row  Decoder  

Array  of  Sen

se  Amplifiers  (8K

b)  

Cell  Array  

Cell  Array  

Row  Decoder  

Array  of  Sen

se  Amplifiers  

Cell  Array  

Cell  Array  

Bank

 I/O  (6

4b)  

Row  Decoder  

Array  of  Sen

se  Amplifiers  (8K

b)  

Cell  Array  

Cell  Array  

Row  Decoder  

Array  of  Sen

se  Amplifiers  

Cell  Array  

Cell  Array  

Bank

 I/O  (6

4b)  

Page 17: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

DRAM  OperaMon  

17  

Row  Decod

er  

Row  Decod

er  

Array  of  Sense  Amplifiers  

Cell  Array  

Cell  Array  

Bank  I/O  Data  

1

2

ACTIVATE  Row  

READ/WRITE  Column

3 PRECHARGE

Row  Add

ress  

Column  Address  

Page 18: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Goals  for  This  Lecture  

•  Understand  DRAM  technology  – How  it  is  built?  – How  it  operates?  – What  are  the  trade-­‐offs?  

•  Can  we  use  DRAM  for  more  than  just  storage?  –  In-­‐DRAM  copying  –  In-­‐DRAM  bitwise  operaCons  

18  

Page 19: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Trade-­‐offs  in  DRAM  Design  

•  Cost  •  Latency  •  Bandwidth  •  Parallelism  •  Power  •  Energy  •  Reliability  

19  

 — Rows/Subarray  — Data  width,  Chips/DIMM  — Banks/Chip      

Page 20: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Goals  for  This  Lecture  

ü Understand  DRAM  technology  – How  it  is  built?  – How  it  operates?  – What  are  the  trade-­‐offs?  

•  Can  we  use  DRAM  for  more  than  just  storage?  –  In-­‐DRAM  copying  –  In-­‐DRAM  bitwise  operaCons  

20  

Page 21: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

RowClone  Fast  and  Energy-­‐Efficient  In-­‐DRAM  Bulk  Data  Copy  and  IniMalizaMon  

 Y.  Kim,  C.  Fallin,  D.  Lee,  R.  Ausavarungnirun,    

G.  Pekhimenko,  Y.  Luo,  O.  Mutlu,    P.  B.  Gibbons,  M.  A.  Kozuch,  T.  C.  Mowry    

 

Vivek  Seshadri  

Page 22: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Memory  Channel  –  Bo\leneck  

Core  

Core  

Cache  

MC  

Mem

ory  

Channel  

Limited  Bandwidth  

High  Energy  

Page 23: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Goal:  Reduce  Memory  Bandwidth  Demand  

Core  

Core  

Cache  

MC  

Mem

ory  

Channel  

Reduce  unnecessary  data  movement  

Page 24: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Bulk  Data  Copy  and  IniMalizaMon  

Bulk  Data  Copy  

Bulk  Data  IniMalizaMon  

src   dst  

dst  val  

Page 25: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Bulk  Data  Copy  and  IniMalizaMon  

Bulk  Data  Copy  

Bulk  Data  IniMalizaMon  

src   dst  

dst  val  

Page 26: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Bulk  Copy  and  IniMalizaMon  –  ApplicaMons  

Forking  

000000000000000

Zero  iniMalizaMon  (e.g.,  security)  

VM  Cloning  DeduplicaMon  

CheckpoinMng  

Page  MigraMon  

Many  more  

Page 27: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Shortcomings  of  ExisMng  Approach  

Core  

Core  

Cache  

MC   Channel   src  

dst  

High  latency    (1046ns  to  copy  4KB)  

Interference  

High  Energy  (3600nJ  to  copy  4KB)  

Page 28: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Our  Approach:  In-­‐DRAM  Copy  with  Low  Cost  

Core  

Core  

Cache  

MC   Channel  dst  

High  latency  

Interference  

High  Energy  

src  

X  

X  

X  

?  

Page 29: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

RowClone:  In-­‐DRAM  Copy  

29  

Page 30: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Bulk  Copy  in  DRAM  –  RowClone  

30  

½VDD  

½VDD  

0  1  

0  

VDD  ½VDD  +δ  

Data  gets  copied  

Page 31: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Fast  Parallel  Mode  –  Benefits  

31  

Latency   Energy  

Bulk  Data  Copy  (4KB  across  a  module)  

1046ns  to  90ns   3600nJ  to  40nJ  

No  bandwidth  consumpMon  

Very  li\le  changes  to  the  DRAM  chip  

11X   74X  

Page 32: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Fast  Parallel  Mode  –  Constraints  

•  LocaCon  constraint  – Source  and  desCnaCon  in  same  subarray  

•  Size  constraint  – EnCre  row  gets  copied  (no  parCal  copy)  

32  

1

2

Can  sCll  accelerate  many  exisCng  primiCves  (copy-­‐on-­‐write,  bulk  zeroing)  

Alternate  mechanism  to  copy  data  across  banks  (pipelined  serial  mode  –  lower  benefits  than  Fast  Parallel)  

Page 33: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

End-­‐to-­‐end  System  Design  

•  Soeware  interface  – memcpy  and  meminit  instrucCons  

•  Managing  cache  coherence  – Use  exisCng  DMA  support!  

•  Maximizing  use  of  Fast  Parallel  Mode  – Smart  OS  page  allocaCon  

33  

Page 34: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

ApplicaMons  Summary  

34  

0

0.2

0.4

0.6

0.8

1

bootup compile forkbench mcached mysql shell

Frac

tion

of M

emor

y Tr

affic

Zero Copy Write Read

Page 35: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Results  Summary  

35  

0%

10%

20%

30%

40%

50%

60%

70%

bootup compile forkbench mcached mysql shell

Com

pare

d to

Bas

elin

e

IPC Improvement Memory Energy Reduction

Page 36: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Goals  for  This  Lecture  

ü Understand  DRAM  technology  – How  it  is  built?  – How  it  operates?  – What  are  the  trade-­‐offs?  

•  Can  we  use  DRAM  for  more  than  just  storage?  –  In-­‐DRAM  copying  –  In-­‐DRAM  bitwise  operaCons  

36  

Page 37: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Triple  Row  AcMvaMon  

37  

½VDD  

½VDD  

dis  

A  

B  

C  

Final  State  AB  +  BC  +  AC  

½VDD+δ  

C(A  +  B)  +  ~C(AB)  en  

0  

VDD  

Page 38: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

In-­‐DRAM  Bitwise  AND/OR  

Required  OperaCon:  Perform  a  bitwise  AND  of  two  rows  A  and  B  and  store  the  result  in  C    •  R0  –  reserved  zero  row,  R1  –  reserved  one  row  •  D1,  D2,  D3  –  Designated  rows  for  triple  acCvaCon  

1.  RowClone    A    into    D1  2.  RowClone    B    into    D2  3.  RowClone    R0    into    D3  4.  ACTIVATE    D1,D2,D3  5.  RowClone    Result    into    C  

38  

Page 39: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Throughput  Results  

39  

0  10  20  30  40  50  60  70  80  8K

B  

16KB

 

32KB

 

64KB

 

128K

B  

256K

B  

512K

B  

1MB  

2MB  

4MB  

8MB  

16MB  

32MB  AN

D/OR  Th

roughp

ut  (G

B/s)  

Size  of  vectors  involved  in  AND/OR  

Intel-­‐AVX  (one  core)  Our  Proposal  (Aggressive)  (one  bank)  

L1  

L2   L3  Memory  

Page 40: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Bitmap  Index  •  AlternaCve  to  B-­‐tree  and  its  variants  •  Efficient  for  performing  range  queries  and    joins    

40  

Bitm

ap  1  

Bitm

ap  2  

Bitm

ap  4  

Bitm

ap  3  

age  <  18   18  <  age  <  25   25  <  age  <  60   age  >  60  

Page 41: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Performance  EvaluaMon  

41  

0.0  0.2  0.4  0.6  0.8  1.0  1.2  1.4  

3   9   20   45   98   118   128  

Performan

ce  RelaM

ve  to

 Ba

selin

e  

Number  of  OR  bins  

ConservaMve  (1  Bank)   Aggressive  (1  Bank)  

ConservaMve  (4  Banks)   Aggressive  (4  Banks)  

Page 42: 18-447 Computer Architecture Lecture 30: In-memory Processingece447/s15/lib/exe/... · 18-447 Computer Architecture Lecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon

Goals  for  This  Lecture  

ü Understand  DRAM  technology  – How  it  is  built?  – How  it  operates?  – What  are  the  trade-­‐offs?  

ü Can  we  use  DRAM  for  more  than  just  storage?  –  In-­‐DRAM  copying  –  In-­‐DRAM  bitwise  operaCons  

42  


Recommended