+ All Categories
Home > Documents > Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science &...

Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science &...

Date post: 25-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
14
A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović E3S Retreat September 20, 2018
Transcript
Page 1: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Creating Intelligence at the Edge- Part 2

Vladimir StojanovićE3S Retreat

September 20, 2018

Page 2: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Computing is moving toward the Edgeole of electronics

2

Big issues – speed, available power for local computation

Opportunity for new technologies to help – enter E3S

Page 3: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Autonomous Driving: What networks do we need to run?

3

[Lin et al. ASPLOS18]

DET - YOLO

TRA - GOTURN

LOC – Orb-Slam

~ Level 3 pipeline

Page 4: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Autonomous driving hw requirements

4

Fastest human driver reaction 100-150ms Automated driving system requirements

99.99th- percentile latency <100ms Frame rate > 10 fps

Driving range reduction on Chevy Bolt

Overhead of storage and cooling ~100%

[Lin et al. ASPLOS18]

Page 5: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

CPU performance is not enough

5

DNNs take most of the time in all tasks! Need a few orders of magnitude improvement

Opportunity for acceleration

DNNs take most of the time![Lin et al. ASPLOS18]

Page 6: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Acceleration results

6 GPU+FPGA/ASIC needed for <10% driving range impact FPGA/ASIC needed for <5% driving range impact

[Lin et al. ASPLOS18]

Xeon TitanX Stratix V EIE, Eyriss 45nm SOI

Page 7: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

What the future holds?

Would like to achieve Level 5 (fully autonomous) Higher accuracy required => More complex DNNs More sensors (e.g. LIDAR) => Additional/more complex DNNs Higher resolution => More storage/computation Additional algorithms => Human-machine interaction [Dragan]

7

GPU/ASIC latency o.k. at Full HD, but no design meets QHD

Page 8: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Current inference accelerators

8

PX Xavier (12nm)30W, 30 INT8 Top/s, 1 Tops/W

PX Pegasus (12nm)500W, 320 INT8 Top/s, 0.6 Tops/W

Stanford EIE 600mW, 0.1 INT8 Top/s, 0.5 Tops/W (process normalized to 16nm)

Page 9: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Last time: PBP impact on micro-architecture

Algorithmic transformations enable simple, systolic (flow-through) architecture with in-situ coefficient storage (minimal energy) Fixed or reconfigurable input/output shuffle (permutation)

Page 911/1/2018

Multiply-accumulates with

local weight storage (dense sub-blocks)

Input vector shuffle

Output vector shuffle

Page 10: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Optimizing accelerator energy

10

Minimize data-movement Store weights on-chip

• Pruning• Dense-memory

In-memory computation• Reconfigurable interconnect• Dense-memory

Computation flexibility Need a mix of acceleration and

regular CPU 20 Top/s, 0.4W

50 Tops/W 50-100x improvement 90% of the power in the accelerator

Rocket Core

SPACE Accelerator

Berkeley AI core chip [Naous, Kang, Stojanovic]Taped-out May 2018

2.5mm x 2.5mm, 16nm

Page 11: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Rout

ing

Mat

rix

Tile Interface

PE1

PE1

PE3

PEn1Ro

utin

g M

atrix PE1

PE1

PE6

PEn2

Rout

ing

Mat

rix

11

Top Level Accelerator View

Page 12: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Input Activations

PE1

PE1

PE3

PEn

outp

ut A

ctiv

atio

nsInput

Permutations

LatchesInput

Activations

Multiply Units

SRAM(Weights)

Adder Tree

Quantizer

SRAMOutput

Activations

ReLU

Network layer accelerator implementation

Minimize the size of the routing matrix through smart output SRAM/Mux scheduling Input permutations via small muxes Output permutations via output SRAM reads

12

Page 13: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Parameterized generator allows evaluation of various scenarios

13

SRAM still dominates power and area, and limits throughput

PE power breakdown

Page 14: Creating Intelligence at the Edge - Part 2 › wp-content › uploads › ... · A Science & Technology Center Creating Intelligence at the Edge - Part 2 Vladimir Stojanović. E3S

A Science & Technology Center

Where next?

Now have a full architecture generator and advanced CMOS benchmark

Architecture that can be tuned further for new devices e.g. NEMS reconfigurable interconnect e.g. NEMS or other dense memory

Fully benchmark the architecture with new devices vs. advanced CMOS design

Page 1411/1/2018


Recommended