Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep...

transcript

Distributed Inference Between Mobile Edge Devices and the Cloud

Sandeep Chinchali*, Jenya Pergament*, Eyal Cidon*, Marco Pavone, Sachin Katti

Neural Net

Can robot perception tasks be done in the cloud?• Automated Sensing from Video/LIDAR

• Compute-intensive Deep Neural Nets (DNNs)

• Can resource-constrained robots scalably use

“the cloud?”

Uplink-limited

Credit: Alexander Kazeka, https://www.youtube.com/watch?v=1j_3fh34E44

Sensory Input

Robot Model

Limited Network

Offload Compute

Mobile Robot

Cloud Model

Image, MapDatabases

OffloadLogic

Local Compute

Query the cloud for better accuracy?Latency vs. Accuracy vs. Power …

OutlineLearning-Based Approach to Cloud Offloading in Robotics Sandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament, Eyal Cidon, Sachin Katti, Marco Pavone, [accepted to Robotics: Science and Systems (RSS) 2019]

1. Accuracy vs Compute-Efficiency Trade-offs of DNNs2. Network Costs of Streaming Video/ LIDAR

3. A learning-based approach to Cloud Offloading

4. Simulation and Hardware Experiments

Accuracy of Robot and Cloud DNNs

Cloud ModelRobot Model

If embedded AI gets better, will I still need the cloud?

Cloud is still useful to:1. Pool video from multiple

robots2. Access large map, image

databases3. Query models trained on

more/newer data

“Cloud”: could even be a bigger on-board model

Jetson TX2 GPU (~$480)

Google Edge TPU (~$150)Jetson Nano (~$99)

Model Raspberry PI 3

R-pi 3 + Intel Neural Compute Stick

Jetson Nano

Edge TPU

SSD MobileNet-v2 (300x300)

1 FPS 11 FPS 39 FPS 48 FPS

Source: https://devblogs.nvidia.com/jetson-nano-ai-computing/

Outline1. Accuracy vs Compute-Efficiency Trade-offs of DNNs

2. Network Costs of Streaming Video/ LIDAR

Uplink-limited

Network Costs of Cloud Communication

1. Congested Wireless Links2. High Bandwidth: Designed for Human, Not Robot Perception

J. Emmons, S. Fouladi, G. Ananthanarayanan, S. Venkataraman, S. Savarese, K. Winstein, “Cracking Open the DNN blackbox”

Our Network Congestion Experiments

“ROS Ate My Network Bandwidth!”(ROS User Forums)

~70 Mbps

WastedQueries

Cloud Offloading as a Decision Problem

csandeep@stanford.edu 11

Cloud Queries

RobotConfidence

Robot Correct Contending goals• Maximize Accuracy• Minimize latency• Limited Network

Optimal Control

Limited Cloud Queries

RL Approach to Cloud Offloading

Edge Cloud

Reinforcement Learning (RL)

Goal: Maximize the total reward

Agent Environment

Observe state !"

Action #"

Reward $"

13Adapted from Pensieve (Sigcomm 18, Mao et. al.)

Exploration vs. Exploitation Tradeoff

Exploit: On-board Robot Model

Explore: Utility of Cloud by learning

RobotLimited Network

Reward

!"#$%&'($))#$*&

!+$,$''-' Offload

Cloud Model Predict*' = /

*' = {1, 3}Past Predictions

*' = 5

State 6'

The Robot Offloading MDP

Cloud Model

Robot Limited Network

!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

,% = /

,% = {1, 3}Past Predictions

,% = 5

State 6%

The Robot Offloading MDP: Action Space

Cloud Model

!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

,% = /

,% = 5

State 6%

The Robot Offloading MDP: State Space

Cloud Model

!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

,% = /

,% = 5

State 6%

The Robot Offloading MDP: Reward

Cloud Model

!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

,% = /

,% = 5

State 6%

Query Cloud

SVM Classifier

Robot Model

!"FaceNet

Embed Face A

90% Conf

Coherence Time

" = $ " = %

RL beats benchmark offloading policies> 2.6x reward of benchmarks

RL: 70 % of oracle reward

All-Robot: today’s de-facto!"

#$%&'()*+,

RL intelligently, but sparingly queries cloud

Hardware Experiments on Live Video + Embedded Compute Platform

RL for Cloud Offloading in Robotics

• Compute model size and sensory data will grow

• Judicious use of Cloud in Robotics

• RL: General Two-Stage Decision Problem

OffloadLogic

Robot ModelCloud Model

Mobile RobotLimited Network

Sensory Input

Offload ComputeLocal ComputeImage, MapDatabases

Query the cloud for better accuracy?Latency vs. Accuracy vs. Power …

Thanks! Please See Sandeep, Eyal, Jenya

Emmons et. al, “Neural Networks Are Networks Too”

Uplink-limited

Sensor Representation for Machine Perception

1. Human Eye -> High Bandwidth2. All-edge/All-cloud restrictive

Can we send fewer, relevant bits for the same accuracy? 7

Google Edge TPU ($150), Nvidia Jetson Nano ($99), TX2 ($600)

Future Directions

Emmons et. al, “Neural Networks Are Networks Too”

Uplink-limited

Network Costs of Cloud Communication

1. Congested Wireless Links2. High Bandwidth: Designed for Human Perception

System Architecture

Edge Cloud

Should we split Vision DNNs between edge/cloud?

Edge Google

Split at Layer 5

PredictPixelsOff-the-shelf

Pixels Intermediates

Do not split rapidly-evolving DNNs!NeuroSurgeon ASPLOS ’17

Google v1 v2

Split at Layer 5 10

Off-the-shelf

Idea: Keep Vision DNNs Intact

Decoder Edge Encoder Google, FB

Black-Box w/ API

PredictPixels

Benefit: Extends beyond video or DNNs (e.g. robotic map-making) 30

Learning-based Approach

Edge Cloud

Decoder

PixelEstimateCoded

FeaturesVideo

Feedback Reward (Training)

Predict

Off-the-Shelf

System Architecture

Many Open Questions

• Machines (DNNs) will watch most future video

• Research Avenues:• Small—scale RL simulations [Hotnets 18]

• Practical systems prototype [Under review]

• Active Learning to query the cloud [Under review]

• Deep RL with Real Vision DNNs – next!DNN

Simplified Systems Prototype

Edge Cloud

Edge Device

Feature Feedback

Coded Features

1. Active Edge Encoders

Dynamically Encode Task-Relevant Content 35

Modify Sub-Image Resolution,Crop Regions,

“Machine” features, …

Code 1, Camera 1

Code 2, Camera 2

2. Centralized Active Decoder

Estimate Edge Scenes, “Fill-in” Missing Pixels w/ memory 36

Predict

State-ful DecoderPixel

Estimates

DNNPredict

Pixels

Edge Device

Feature Feedback

3. Feature Feedback from the Cloud

What content matters?

Content Priorities,Camera Angle,

MobileNet

AI Offloader:

• New Content?

• BW Sufficient?

• Edge Correct?

Low Latency Result

Cloud Model

Accurate

Result

Offload

Don’t Offload

Mobile Offloading for Vision

1.2-2.1x accuracy of all-edge, 60-90% BW savings compared to all-cloud

Should we split Vision DNNs between edge/cloud?

Edge Google

Split at Layer 5

PredictPixelsOff-the-shelf

Pixels Intermediates

Do not split rapidly-evolving DNNs!NeuroSurgeon ASPLOS ’17

Google v1 v2

Split at Layer 5 10

Results: Mobile Offloading for Vision

1. Trade-off Accuracy for BW Savings2. Adapt to edge model accuracy

Results (normalized to all-cloud):1. 60-90% BW savings 2. 80-90% accuracy of oracle3. 1.2-2.1x accuracy of all-edge

Edge MobileNet v1, v2

Accuracy

Insight: Bandwidth and Task-Aware Delivery1. Human Eye -> High Bandwidth

2. All-edge/All-cloud restrictive

3. Use Off-the-Shelf DNNs

Black-boxDecoder / EstimatorFeature Extractor/Filter

Problem Insights1. Human Eye -> High Bandwidth

2. All-edge/All-cloud restrictive

3. Use Off-the-Shelf DNNs

Proposal: Bandwidth and Task-Aware Video Delivery

Machine Perception

Deep-dive into componentsEdge Cloud

Edge Device

Data Center

Feature Feedback

$"#%"#

Wireless Network

1. Distributed Edge Encoders

%"# = ()*+,-)(!"#, $"#, 0&"#)44

Data Center

2. Centralized Active Decoder

Pretrain Predict%#

%# = '()*#)+,-(/0#)

/0# = '2*342*(/0#5", 78#, !#)

Decoder

98# /0#

Pretrain Predict

!"# $%# $&#

%'# "'#('#

Edge Device

Feature Feedback

3. Feature Feedback from the Cloud

Active Decoder*)# = ,-./0-.($%#23, 5"#, (#)

Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep...

Documents