+ All Categories
Home > Documents > FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long...

FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long...

Date post: 21-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
22
1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1 , Zhihang Yuan 1 , Guangyu Sun 1,3 , Jason Cong 2,3,1 1 Center for Energy-Efficient Computing and Applications, Peking University, China 2 Computer Science Department, University of California, Los Angeles, USA 3 PKU/UCLA Joint Research Institute in Science and Engineering
Transcript
Page 1: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

1

FPGA-based Accelerator for Long Short-Term

Memory Recurrent Neural Networks

Yijin Guan1, Zhihang Yuan1, Guangyu Sun1,3, Jason Cong2,3,1

1Center for Energy-Efficient Computing and Applications, Peking University, China2Computer Science Department, University of California, Los Angeles, USA

3PKU/UCLA Joint Research Institute in Science and Engineering

Page 2: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

2

Deep Learning

Scenarios

Applications

Page 3: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

3

Recurrent Neural Network

Input Layer

Hidden Layer

Output Layer

Feed-forward NN RNN

Input Layer (t)

Hidden Layer (t)

Output Layer (t)

Hidden Layer (t-1)

Recurrent

Connection

Page 4: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

4

Recurrent Neural Network

Input Layer (t-1)

Hidden Layer (t-1)

Output Layer (t-1)

Input Layer (t)

Hidden Layer (t)

Output Layer (t)

Input Layer (t+1)

Hidden Layer (t+1)

Output Layer (t+1)

RNN unfolds into a DNN over time

Page 5: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

5

Long Short-Term Memory

Input Gate

Wxi & Whi

it

xt ht-1

𝒊𝒕 = σ (𝑾𝒙𝒊𝒙𝒕 + 𝐖𝒉𝒊𝒉𝒕−𝟏 + 𝒃𝒊)

Page 6: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

6

Long Short-Term Memory

Input Gate

Wxi & Whi Wxf & Whf

Forget Gate

itft

xt ht-1

𝒇𝒕 = σ (𝑾𝒙𝒇𝒙𝒕 + 𝐖𝒉𝒇𝒉𝒕−𝟏 + 𝒃𝒇)

Page 7: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

7

Long Short-Term Memory

Input Gate

Wxi & Whi

Cell Gate

Wxc & Whc

Wxf & Whf

Forget Gate

itft

xt ht-1

𝒄t

𝒄𝒕 = tanh (𝑾𝒙𝒄𝒙𝒕 + 𝐖𝒉𝒄𝒉𝒕−𝟏 + 𝒃𝒄)

Page 8: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

8

Long Short-Term Memory

Input Gate

Wxi & Whi

Cell Gate

Wxc & Whc

Wxf & Whf

Forget Gate

itft

ct

ct-1

xt ht-1

𝒄t

𝒄𝒕 = 𝒇𝒕⊙𝒄𝒕−𝟏 + 𝒊𝒕⊙𝒄𝒕

Page 9: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

9

Long Short-Term Memory

Input Gate

Wxi & Whi

Cell Gate

Wxc & Whc

Wxf & Whf

Forget Gate

Wxo & Who

Output Gate

itft

ct

ct-1

xt ht-1

𝒄t ot

𝒐𝒕 = σ (𝑾𝒙𝒐𝒙𝒕 + 𝐖𝒉𝒐𝒉𝒕−𝟏 + 𝒃𝒐)

Page 10: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

10

Long Short-Term Memory

Input Gate

Wxi & Whi

Cell Gate

Wxc & Whc

Wxf & Whf

Forget Gate

Wxo & Who

Output Gate

itft

ct

ct-1

xt ht-1

ht

𝒄t ot

𝒉𝒕 = 𝒐𝒕⊙ tanh(𝒄𝒕)

Page 11: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

11

Why FPGA

Adaptability

Performanc

Energy EfficiencyProgrammability

Scalability

GPU

Adaptability

Performance

Energy EfficiencyProgrammability

Scalability

FPGA

Adaptability

Performance

Energy EfficiencyProgrammability

Scalability

ASIC

Adaptability

Performance

Energy EfficiencyProgrammability

Scalability

CPU

Page 12: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

12

Design Challenges and Optimizations

FPGA Chip

Computation

Engine

Data

Buffers

Off-chip

Memory

Page 13: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

13

Design Challenges and Optimizations

FPGA Chip

Computation

Engine

Data

Buffers

Off-chip

Memory

Computation Resources & Performance

Loop Unroll

Deep Pipeline

Page 14: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

14

Design Challenges and Optimizations

FPGA Chip

Computation

Engine

Data

Buffers

Off-chip

Memory

On-chip Memory Resources

Loop Tiling

Eclectic Data Partition

Page 15: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

15

Design Challenges and Optimizations

FPGA Chip

Computation

Engine

Data

Buffers

Off-chip

Memory

Bandwidth

Ping-pong Buffers

Reshaping Data Layout

Page 16: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

16

FPGA Chip

System Design

MicroBlaze

LSTM

Accelerator

Timer

Data

Dispatcher

AX

I4 B

us

DD

R3 D

RA

M UART

AX

I4L

ite

Bu

s

Vivado High-level Synthesis (v2015.4)

Vivado Design Suite (v2015.4)

Page 17: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

17

Accelerator Design

f c o

LSTM Functional Logic

Input Group 0 Input Group 1

Output Group 0 Output Group 1

Cell Buffer

i

Page 18: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

18

Experimental Results

0

5

10

15

20

25

1 thread -O3 16 thread -O3 Our Imp.

Speedup

Device Model Freq. Development Env.

CPU Xeon E5-2430 2.20 GHz gcc -O3 & OpenMP

FPGA Xilinx Virtex7-485t 150 MHz Vivado Design Suite

5.4x

20.2x

Page 19: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

19

Experimental Results

0

5

10

15

20

25

Previous Imp. A Previous Imp. B Our Imp. A Our Imp. B

Speedup

Imp. Model Freq. Data Precision

Previous Imp. Xilinx Zynq7020 142MHz Fixed-16

Our Imp. Xilinx Virtex7-485t 150 MHz Float-32

~47%

15.5x

2x

Page 20: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

20

Future Work

Data quantization

Low-precision fixed-point numbers

Model compression

Connection Pruning

Matrix compression (e.g. SVD)

General architecture

Support for all LSTM variants

Page 21: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

21

Conclusions

An accelerator for LSTM-RNN

Optimizations for computation & communication

at architecture level

On-board implementation with high-performance

computation engines & data dispatcher

Outperforms CPU- & other FPGA- implementations

Page 22: FPGA-based Accelerator for Long Short-Term Memory ... · 1 FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks Yijin Guan 1, Zhihang Yuan , Guangyu Sun1,3,

22

Thank You

Q & A


Recommended