+ All Categories
Home > Documents > Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing...

Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing...

Date post: 22-Sep-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
39
Summarizer: Trading Communication with Computing Near Storage Gunjae Koo *, Kiran Kumar Matam*, Te I , H.V. Krishina Giri Nara*, Jing Li , Hung-Wei Tseng , Steven Swanson , Murali Annavaram* *University of Southern California North Carolina State University University of California, San Diego
Transcript
Page 1: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Summarizer: Trading Communication with Computing Near Storage

Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. Krishina Giri Nara*, Jing Li‡,Hung-Wei Tseng†, Steven Swanson‡, Murali Annavaram*

*University of Southern California†North Carolina State University

‡University of California, San Diego

Page 2: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Host

Motivation – High Data Movement Cost

2

CPU Storage interface

Data computation @ host Data transfer from storage

External (host -- storage) Internal

Limited data bandwidth High access latency

External (host -- storage) Internal

Page 3: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

StorageProcessor

(SP)

Host

Near Data Processing (NDP)

3

CPU Storage interface

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

Page 4: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Host

CPU

Near Data Processing (NDP)

4

Storage interface

StorageProcessor

(SP)

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

W/O NDP

With NDPData computation @ storage

Page 5: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Host

Near Data Processing (NDP) on SSDs

5

CPU Storage interface SP

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

W/O NDP

With NDPData computation @ storage

Garbage collection

Wear-leveling

Data computation @ storage

Page 6: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Host

Near Data Processing (NDP) on SSDs

6

CPU Storage interface SP

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

W/O NDP

With NDP

Garbage collection

Wear-leveling

Data computation @ storage

Obstacles to in-SSD processing

• Less powerful embedded processor

• Dynamic computation resource availability

• Manual workload partitioning is difficult Summarizer: Dynamic NDP framework for SSD

Page 7: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Host

CPU

Summarizer – Basic Concept

7

Storage interface AP

Monitoring resources

Page 8: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Host

CPU

Summarizer – Basic Concept

8

Storage interface AP

Monitoring resources

Page 9: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Summarizer – Detailed Firmware Architecture

9

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

SSD Embedded Processors

Page 10: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Normal Page Read Request

10

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

RD ( LBA)

(RD) PPA

Page 11: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Normal Page Read Request

11

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

RD(PPA 1)RD(PPA 2)

Page data

Page 12: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Normal Page Read Request

12

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

Page data

Page 13: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Summarizer – Initialization (Function Offloading)

13

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

INIT ( foo)

foo()

foo()f#1Function offloading

Function registration

New NVMe command

Page 14: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Summarizer – Computation (Dynamic mode)

14

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

foo()f#1

RD&PROC( LBA,foo)

New NVMe command

New NVMe command decode

RD&PROC(PPA,foo)

goo()f#2

Page 15: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Summarizer – Computation (Dynamic mode)

15

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

foo()f#1

RD&PROC(PPA,foo)

RD&P(PPA1,foo)

RD&P(PPA2,foo)

Page data

RD&P(PPA1,foo)

goo()f#2

Page 16: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Summarizer – Computation (Dynamic mode)

16

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

foo1()f#1

RD&PROC(PPA,foo)

Page data

RD&P(PPA1,foo)

buf1, foo

CC/Proc

Register in TQ

goo()f#2

Page 17: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Summarizer – Computation (Dynamic mode)

17

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

foo()f#1

RD&PROC(PPA,foo)

Page data

RD&P(PPA1,foo)

CC

TQ is full

goo()f#2

Page 18: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Summarizer – Finalization

18

Host Memory

SQ CQ

Host CPU

Sto

rag

e I

nte

rfa

ce (P

CIe

/ N

VM

e)

SSD Firmware

NAND FlashNAND FlashNAND FlashNAND Flash

Flash Controller

SSD DRAM

DRAM Controller

Summarizer

User Functions

TQ

Re

qu

est

qu

eu

e

Re

spo

nse

qu

eu

e

I/O Controller(NVMe command decoder)

SSD SoC Interconnection

Flash Translation Layer (FTL)

NVMe Host Driver

User Applications /Operating Systems

Task Controller

FINAL ( foo)

New NVMe command

foo()f#1

Results

goo()f#2

Page 19: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Summarizer API and NVMe commands

19

Initialization

Finalization

Computation

• NVMe command: INIT_TSKn• Transfer a in-SSD procedure to SSD memory• Initialize data structure and temporal variables for in-SSD

computation

• NVMe command: READ_PROC_TSKn• Page read command is issued with the flag indicating the user

procedure embedded in SSD memory• Return the special code if the requested page is processed in SSD• Page data is transferred to the host if the requested page is NOT

computed in SSD

• NVMe command: FINAL_TSKn• Gather final in-SSD computation results and transfer to the host

Page 20: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation Platform

• LS2085a intelligent SSD development platform

• ARM cores running FTL and Summarizerfirmware

• FPGA implementing NAND flash controller

• PCIe Gen. 3 4x lanes for host communication

20

LS2085a

Interconnection

DDR4 Memory Controller

DRAM DRAM

CPU

L1D(32KB)

L2(1MB)

L1I(48KB)

CPU

L1D(32KB)

L1I(48KB)

PC

Ie(h

ost

–L

S2

08

5a

)

PC

Ie(L

S2

08

5a

-F

PG

A)

FPGA(ALTERA Stratix V)

NAND flash DIMMNAND flash DIMMs

CPU

L1D(32KB)

L2(1MB)

L1I(48KB)

CPU

L1D(32KB)

L1I(48KB)

Page 21: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

21

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

Static workload offloading

Page 22: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

22

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

CPU only processing (baseline) SSD only processing

Page 23: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

23

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

Summarizer Dynamic Offloading

Page 24: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

24

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

SSD processing + transfer time(internal + external + In-SSD processing)

Host CPU processing time

Page 25: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

25

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host timeExecution time normalized to baseline (CPU only)

Page 26: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

26

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Page 27: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

27

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

0.70 0.60

0.30

0.24

0.0

0.2

0.4

0.6

0.8

1.0

1.2

CPU only Dynamic

Chart TitleSDD time Host timeE

xe

cuti

on

tim

e (n

orm

ali

zed

to

ba

seli

ne

)

Page 28: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

28

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

0.70 0.62

0.30

0.24

0.0

0.2

0.4

0.6

0.8

1.0

1.2

CPU only Dynamic

Chart TitleSDD time Host time

Performance improved by 14%

Data computation @ host Data transfer from storage

InternalExternal (host – storage)

W/O NDP

With NDPData computation @ storage

Page 29: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

29

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

Static Dynamic

TPC-H Query6

SDD time Host time

Performance degraded by static NDP

Page 30: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Evaluation - Performance

30

16% 10%

20% 7%

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Ex

ecu

tio

n t

ime

(no

rma

lize

d t

o b

ase

lin

e)

Page 31: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Design Exploration – Higher Internal Bandwidth

31

Host

CPU Storage interface

Data transfer bottleneck

Commercial SSD maintains internal bandwidth ≈ external bandwidth

Page 32: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Design Exploration – Higher Internal Bandwidth

32

Host

CPU Storage interface SP

Data transfer bottleneck

Higher internal bandwidth without increasing external bandwidth

Page 33: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

0%

20%

40%

60%

80%

100%

1:1 1:2 1:3 1:4 1:1 1:2 1:3 1:4 1:1 1:2 1:3 1:4 1:1 1:2 1:3 1:4 1:1 1:2 1:3 1:4

TPC-H Query 6 TPC-H Query 1 TPC-H Query 14 String Similarity Join Average

Sp

ee

du

pDesign Exploration – Higher Internal Bandwidth

33

External : Internal bandwidth ratio

Page 34: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

0%

20%

40%

60%

80%

100%

1:1 1:2 1:3 1:4 1:1 1:2 1:3 1:4 1:1 1:2 1:3 1:4 1:1 1:2 1:3 1:4 1:1 1:2 1:3 1:4

TPC-H Query 6 TPC-H Query 1 TPC-H Query 14 String Similarity Join Average

Sp

ee

du

pDesign Exploration – Higher Internal Bandwidth

34

Summarizer is effective if an SSD platform has higher internal bandwidth

Page 35: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Design Exploration – Better SSD Processor

35

Host

CPU Storage interface

Better embedded processor is cost effective

AP

Page 36: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Design Exploration – Higher Internal Bandwidth

36

0%

20%

40%

60%

80%

100%

120%

X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16

TPC-H Query6 TPC-H Query1 TPC-H Query14 String Similarity Join Average

Sp

ee

du

p

Embedded processor performance

Page 37: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Design Exploration – Higher Internal Bandwidth

37

0%

20%

40%

60%

80%

100%

120%

X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16 X1 X2 X4 X8 X16

TPC-H Query6 TPC-H Query1 TPC-H Query14 String Similarity Join Average

Sp

ee

du

p

Summarizer is a cost effective NDP solution with powerful storage processors

Page 38: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Conclusion

38

▪Dynamic computation offloading framework• Opportunistic in-SSD computation

• Page-level task control

• Optimal performance improvement

▪ Summrizer programming model

✓ Dynamic NDP framework for SSDs• Opportunistically enables in-SSD processing• Page-level NDP control• Automatic workload partitioning

✓ Summarizer programming model• Evaluation on the real development platform• Explored design space for future SSDs

Page 39: Trading Communication with Computing Near Storage...Summarizer: Trading Communication with Computing Near Storage Gunjae Koo*, Kiran Kumar Matam*, Te I†, H.V. KrishinaGiri Nara*,

Thank you

Summarizer:Trading Communication with Computing Near Storage

Gunjae Koo, Kiran Kumar Matam, Te I, H. V. Krishna Giri Nara, Jing Li,Hung-Wei Tseng, Steven Swanson, Murali Annavaram

(We thank to Dell EMC for supporting the SSD development board)


Recommended