+ All Categories
Home > Documents > SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno...

SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno...

Date post: 13-Jul-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
17
1 SeeDot: Compiling ML to IoT Devices Sridhar Gopinath , Nikhil Ghanathe, Vivek Seshadri, Rahul Sharma aka.ms/SeeDot
Transcript
Page 1: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

1

SeeDot: Compiling ML to IoT DevicesSridhar Gopinath, Nikhil Ghanathe, Vivek Seshadri, Rahul Sharma

aka.ms/SeeDot

Page 2: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

2Smart healthcare

Smart cities

Smart homes

World of smart devices

Smart factories

Page 3: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

ML on the cloud

Data

Sensor/IoT devices

3

Cloud

Page 4: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Limitations of ML in the cloud

Connectivity

Battery life

PrivacyFarmBeats GesturePod

Limitations

Page 5: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Limitations of ML in the cloud

Connectivity

Battery life

PrivacyIntelligent edge

IoT device

Limitations

Page 6: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

6

1. Low memory/compute resources

IoT devices

Microcontrollers, FPGAs

2. No floating-point unit

New ML algorithms with low memory/compute requirement

Expressed in floating-point

Translate to integer code

Page 7: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

SeeDot overview

ML inference algorithm

SeeDotcompiler

7

Efficient integer program

•Mathematical syntax•Linear algebra operations•Supports ML operators like conv, maxpool, relu

Language Compiler•Automatic floating-point to fixed-point compiler

Page 8: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Low High

Low

High

Classification accuracy

Perf

orm

ance

Floating-point emulation

High-bitwidthfixed-point

SeeDot(low-bitwidthfixed-point)

Related work

8

Low-bitwidthfixed-point

Page 9: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

9

Fixed-point Representation

Floating Point 8-bit Fixed Point

𝑥 𝑦, 𝑘 where y = 𝑥 ∗ 2𝑘

𝑦 is an 8-bit signed integer, higher 𝑘 implies better precision

pi = 3.1415… (-55,6) (100,5) (50,4)

e = 2.7182… (-83,6) (86,5) (43,4)

OverflowLow

precisionIdeal

pi + e(-70,5) ✕

(93,4) ✓(100,5) + (86,5)

Page 10: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Standard Fixed-point Arithmetic

a = (x,k); b = (y,k)

8-bit Fixed-point Addition:a + b = (x»1 + y»1, k-1)

8-bit Fixed-point Multiplication:a * b = (x»4 * y»4, 2k-8)

10

Smaller scale than

original numbers

Scale down operation

Page 11: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Naïve fixed-point program

11

u = a * bv = c + dw = ...x = u * wy = x + v

ML algorithm

u = a»4 * b»4v = c»1 + d»1w = ...x = u»4 * w»4y = x»1 + v»1

Generated code

Using standard fixed-point rules

Equivalent to a random classifier due to imprecision

Page 12: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Our insight – 1 of 2

12

u = a * bv = c + dw = ...x = u * wy = x + v

ML algorithm

u = a»4 * b»4v = c»1 + d»1w = ...x = u * wy = x + v

Generated code

PrefixStandard fixed point

SuffixNo scaling down

Avoid scaling down towards the end of the program

Improves precision of the generated program

Page 13: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Our insight – 2 of 2

ML inference algorithm

Program - 1

Program - 2

Program - n

...

Compilation

Accuracy - 1

Accuracy - 2

Accuracy - n

...

Execution

Program with best classification accuracy is selected13

Measure goodness of the program using classification accuracy

Page 14: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Experiments

14

• Datasets• Cifar• Character

recognition• Curet

• Letter• Mnist• Usps• Ward

Xilinx Arty FPGA

• 20 KB LUT• 225 KB memory• 450 MHz freq.

• ML models• Bonsai• ProtoNN• Lenet

Arduino Uno

• 2 KB RAM• 32 KB flash• 16-bit MCU

Arduino MKR1000

• 32 KB RAM• 256 KB flash• 32-bit MCU

IoT devices

Page 15: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Random Ideal

5

Classification accuracy

Spee

dup

Experimental results

15

Floating-point emulation

High-bitwidthfixed-point

Low-bitwidthfixed-point

SeeDot(low-bitwidthfixed-point)

8.2%

0.8% , 4.8x

, 0.1x

1234

46% , ~4.8x

Page 16: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Other contributions

16

•Optimized exponentiation• Two table look-ups and one fixed-point multiplication• Performs 23.3x faster than math.h

•FPGA backend• Generates Verilog code

• Custom SpMV implementation is 13.6x faster than HLS• Generates parallelization hints for HLS

• SeeDot performs 7.1x better• SeeDot improves FPGA programmability

Page 17: SeeDot: Compiling ML to IoT devicessridhargopinath.in/files/SeeDot - PLDI 2019.pdf · Arduino Uno •2 KB RAM •32 KB flash •16-bit MCU Arduino MKR1000 •32 KB RAM •256 KB flash

Conclusion

• Running ML on IoT devices is an emerging domain

• SeeDot• Language can express ML algorithms succinctly• Float-to-fixed compiler to run ML efficiently on IoT devices

• Results• Improved performance on microcontrollers by 4.8x• Improved performance on FPGAs by 7.1x• Implementation available on GitHub:

github.com/Microsoft/EdgeML17


Recommended